tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 5.2: something wrong with TCP retransmits?



Mouse <mouse%Rodents-Montreal.ORG@localhost> writes:

(I sent a longer private note, but for others)  Install
pkgsrc/xplot-devel, and grok share/docs/xplot/README*.

> 15:40:28.696730 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34290468 win 
> 32432 <nop,nop,timestamp 2413 2412>
> 15:40:28.697043 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34315332:34316780(1448) ack 43441 win 4197 <nop,nop,timestamp 2413 2413>
> 15:40:28.697164 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34316780:34318228(1448) ack 43441 win 4197 <nop,nop,timestamp 2413 2413>
> 15:40:28.697287 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34318228:34319676(1448) ack 43441 win 4197 <nop,nop,timestamp 2413 2413>
> 15:40:28.916594 IP 216.46.14.122.22 > 10.0.7.14.65521: P 43441:43485(44) ack 
> 34291616 win 33580 <nop,nop,timestamp 2414 2412>
> 15:40:28.916641 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34293912 win 
> 32432 <nop,nop,timestamp 2414 2412>
> 15:40:28.917976 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34319676:34321124(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
> 15:40:28.918104 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34321124:34322572(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
> 15:40:28.918221 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34322572:34324020(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
> 15:40:28.918344 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34324020:34325468(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
> 15:40:29.180893 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 
> 33580 <nop,nop,timestamp 2415 2412>

This is the last regular ack before a sack, and acked seqno is off the
top of your (posted) capture.

> 15:40:29.181292 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34325468:34326916(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2415>
> 15:40:31.190954 IP 216.46.14.122.22 > 10.0.7.14.65521: P 43441:43485(44) ack 
> 34295060 win 33580 <nop,nop,timestamp 2419 2412>
> 15:40:31.191204 IP 10.0.7.14.65521 > 216.46.14.122.22: P 
> 34326916:34327216(300) ack 43485 win 4197 <nop,nop,timestamp 2418 
> 2419,nop,nop,sack sack 1 {43441:43485} >
> 15:40:31.406736 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 
> 33580 <nop,nop,timestamp 2419 2412,nop,nop,sack sack 1 {34326916:34327216} >

So the previous segment is sacked, but the ack seqno is from long ago.

> 15:40:31.407115 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34295060:34296508(1448) ack 43485 win 4197 <nop,nop,timestamp 2419 2419>
> 15:40:33.899361 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34296508:34297656(1148) ack 43485 win 4197 <nop,nop,timestamp 2424 2419>
> 15:40:33.899435 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34297656:34298504(848) ack 43485 win 4197 <nop,nop,timestamp 2424 2419>
> 15:40:33.899517 IP 10.0.7.14.65521 > 216.46.14.122.22: . 
> 34298504:34299440(936) ack 43485 win 4197 <nop,nop,timestamp 2424 2419>

These are retransmits (this is the sort of things that's far easier to
see in xplot).

> 15:40:34.006752 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 
> 33580 <nop,nop,timestamp 2424 2412,nop,nop,sack sack 2 
> {34296508:34297656}{34326916:34327216} >
> 15:40:34.226326 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 
> 33580 <nop,nop,timestamp 2424 2412,nop,nop,sack sack 2 
> {34296508:34298504}{34326916:34327216} >
> 15:40:34.226371 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 
> 33580 <nop,nop,timestamp 2424 2412,nop,nop,sack sack 2 
> {34296508:34299440}{34326916:34327216} >

the main ack seqno is not moving.  So of the 4 above, the first was
lost, and next 3 made it.  The packed sacked at 15:40:31.406 is still
being sacked.

So overall, you are having serious loss problems.  TCP should cope, of
course.

Are you sure all the loss is between the tcpdump host and 216.46.14.122?

I would take a tcpdump on the sending machine.  If you can post 'tcpdump
-S -tt <filter to the connection of interest', I can look at it in xplot.

> I see nothing wrong with the underlying connectivity; ping -n -c 2 -I
> 10.0.7.14 216.46.14.122 produces a quick response, which shows up as
> expected in the tcpdump output:
>
> 15:52:41.989631 IP 10.0.7.14 > 216.46.14.122: icmp 64: echo request seq 0
> 15:52:42.024922 IP 216.46.14.122 > 10.0.7.14: icmp 64: echo reply seq 0

Those are short packets.  I've seen misterminated ethernets that work
for short but not long.

> The obvious question is, since 10.0.7.14 has data to send, why isn't it
> filling in the SACK-indicated holes?  Or at least retransmitting
> _something_?  As an off-the-cuff guess, my first inclination would be
> to suspect the retransmission timers.  But that's such a basic thing
> that if that were what's wrong, it surely would have been noticed long
> since.

Also, do 'netstat -s' on the sender.  Perhaps obvious, but netstat -s,
pause, netstat -s, diff, is highly useful.


I have been looking at TCP xplots from netbsd-5 for a while.  There are
issues, but they are minor failures to be as aggressive as the spec
permits; I've never seen something like this.

Attachment: pgpXEsl3EQ35I.pgp
Description: PGP signature



Home | Main Index | Thread Index | Old Index