NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Weird network performance problem



  [lots of details]

These things are somewhat tricky to debug.  There can be issues in the
TCP stacks, issues with interfaces, and issues within the network.  I
have a suspicion that there is something not 100% right about NetBSD's
TCP retransmit behavior under fairly rare loss conditions, and you may
be seeing that.  If you can reproduce this reliably we could perhaps
figure it out.

My advice is:

  First figure out what's going on with the ethernet-over-powerline
  taken out of the equation.

  It looks like you are using vlan support on Y.  Try without also.

  Do some iperf3 testing with UDP.  This should more or less separate
  loss from TCP's behavior in response to loss.  I am unclear on how
  iperf3 deals with this, but it seems obvious that it can tell you what
  fraction of the UDP packets it sent ended up arriving.

  [not easy but worth it] install graphics/xplot-devel.  Read the info
  about tcp plots.  Capture the data with tcpdump at the NetBSD server
  end (with -w to a file).  More generally, capture data at the host
  that is slow in transmitting; this gets that host's view of the
  arriving acks.  Process the tcpdump output with tcpdump2xplot,
  probably having to debug and fix the perl script to account for drift
  in tcpdump format over time.  Or perhaps use a netbsd-5 tcpdump to
  decode.  Then, learn how to read the plots, and look at the data.
  This will let you see what packet loss there is, and how the TCP
  sender responds to it.

I can help you offlist with the xplot stuff, as I already understand
this (my grad school officemate's thesis project).  It's on my todo list
to update the parsing code to cope with more modern tcpdump, which I
hope will stop rototilling the formats.

One thing you said seemed odd:

  I test the network speed using iperf3 on all these boxes. The speeds
  upstairs, where all the machines are connected to the gigabit switch,
  are roughly consistent - I get some 930Mbps both ways (there is a bit
  of a speed ramp-up when the server is the NetBSD laptop, but after the
  fifth or so transfer it gets to the same rates). The speeds are also

Can you explain this more precisely, and maybe post a few summary lines?
This doesn't really make sense to me.  Any given TCP connection has to
ramp up the congestion window, but I would't expect a second one 30s
later to benefit from the first -- but maybe there is some caching of
RTT or something else?  After the speeds improve, how long can you wait
before another test that is back to slower?  Going way out on a limb,
this smells like caching of some parameters that leads to better
handling packet loss, and the real issue is that the loss shouldn't be
happening.


Home | Main Index | Thread Index | Old Index