tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/55567: tcp-send slows down to slow single byte transfers (analysis and fix)

It is easy to get confused with the many sequence numbers for managing a

bi-directional connection.

The setting is the we as client send a data stream to the server. This is managed by the

SND* variables.

We also receive in this case no data (except for the first SYN) from the server.

The send side keeps in SND.WL1 the sequence number of the *server* when

updating the send window. As the server sends no data it's sequence number

does not increase after accounting for the SYN of the handshake. That's

why SND.WL1 and SEG.SEQ is not moving.

We are, however, getting a stream of ACK only packets from the server (not data)

to acknowledge our send data. The ACK sequence number is stored in

SND.WL2 on window updates to make sure we only pick later ACKs for send window updates.

Usually you expect SND.WL1 to follow the server sequence number stream and SND.WL2

to follow the ACK sequence numbers in the range SND.UNA =< SEG.ACK <= SND.MAX.

Now this bug is in the fast path that ignored to adjust SND.WL2 so it does never

violate the invariant SND.UNA =< SEG.ACK <= SND.MAX. Thus the

SND.NXT could increase (with wraparound) so much that the window update

test for "newer" ACKs fails and no more window updates are done. In that state

the send window closes down to zero and the stack reverts to zero window

probes sending 1 byte every 5 seconds slowing down over time even more.

All the bytes will be acknowledged, but the window will not be opened giving us

a very slow send path where we might not even have the time an data to overcome

this error condition.


The server is not sending data, thus SEG.SEQ/SND.WL1 does not change. SND.WL2

was not updated in the fast path for ACK only packets, thus the valid sequence number window

could move so far away from SND.WL2 that the "greater" test will fail if we have enough fast

path eligible pure ACK packets from the server.

Hope I didn't add too much to the confusion.


On 09/02/20 17:39, Tom Ivar Helbekkmo wrote:

I'm probably missing something - but the data in kern/55567 seems to
show that snd_wl1 is getting left behind, not snd_wl2.  Is that just an
accidental mis-labeling of those numbers?  Your patch obviously works,
and makes sense.  I'm just confused by the seeming discrepancy.


Home | Main Index | Thread Index | Old Index