tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Pathological TCP behavior running ls(1) over SSH



Andreas Gustafsson wrote:
I'm suffering from annoying 1-second pauses in my SSH connections,
often while waiting during the output from ls(1).  This happens when I'm
logged in from one NetBSD-current machine to another NetBSD-current
machine.
...
  # tcpdump -n -p -i re0 port 64186
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on re0, link-type EN10MB (Ethernet), capture size 96 bytes
...
  22:08:46.870658 IP 10.0.1.254.64186 > 91.152.94.125.22: . ack 417 win 33580 
<nop,nop,sack 3 {2129:2177}{1985:2081}{1841:1937}>
  22:08:46.872641 IP 10.0.1.254.64186 > 91.152.94.125.22: . ack 417 win 33580 
<nop,nop,sack 3 {2129:2225}{1985:2081}{1841:1937}>
  22:08:46.874649 IP 10.0.1.254.64186 > 91.152.94.125.22: . ack 417 win 33580 
<nop,nop,sack 3 {2273:2321}{2129:2225}{1985:2081}>
  22:08:46.876534 IP 10.0.1.254.64186 > 91.152.94.125.22: . ack 417 win 33580 
<nop,nop,sack 3 {2273:2369}{2129:2225}{1985:2081}>

  [the one-second pause is here]

  22:08:47.800341 IP 91.152.94.125.22 > 10.0.1.254.64186: . 417:1877(1460) ack 
32 win 33580
  22:08:47.805289 IP 10.0.1.254.64186 > 91.152.94.125.22: . ack 1937 win 32060 
<nop,nop,sack 3 {2273:2369}{2129:2225}{1985:2081}>
  22:08:47.805381 IP 91.152.94.125.22 > 10.0.1.254.64186: . 1937:3397(1460) ack 
32 win 33580
  22:08:47.805397 IP 91.152.94.125.22 > 10.0.1.254.64186: P 3397:3889(492) ack 
32 win 33580
  22:08:47.810705 IP 10.0.1.254.64186 > 91.152.94.125.22: . ack 3397 win 32120
  22:08:48.009779 IP 10.0.1.254.64186 > 91.152.94.125.22: . ack 3889 win 33580
  ^C
  104 packets captured
  104 packets received by filter
  0 packets dropped by kernel
  #

This looks similar to what Tomas Svensson reported in 2002 in
<http://mail-index.netbsd.org/tech-net/2002/03/15/0000.html>, but
these tinygrams are even smaller, perhaps because I have enabled
compression in my .ssh/config.

This is a different problem/bug to what you are seeing.

Aside from the question of whether sshd should set TCP_NODELAY or not,
could someone explain why the server waits almost a whole second to
retransmit the segment starting at octet 417; why don't the
38 duplicate ACKs cause a fast retransmit?

I'm going to bet that this is somehow tied up with SACK.
All of the retransmitted (duplicate) ACKs include 1 or more
ranges of bytes from SACK.

So it seems worthwhile testing to see if that is a cause here:
sysctl -w net.inet.tcp.sack.enable=0

Darren



Home | Main Index | Thread Index | Old Index