tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/55567: tcp-send slows down to slow single byte transfers (analysis and fix)
It is easy to get confused with the many sequence numbers for managing a
bi-directional connection.
The setting is the we as client send a data stream to the server. This
is managed by the
SND* variables.
We also receive in this case no data (except for the first SYN) from the
server.
The send side keeps in SND.WL1 the sequence number of the *server* when
updating the send window. As the server sends no data it's sequence number
does not increase after accounting for the SYN of the handshake. That's
why SND.WL1 and SEG.SEQ is not moving.
We are, however, getting a stream of ACK only packets from the server
(not data)
to acknowledge our send data. The ACK sequence number is stored in
SND.WL2 on window updates to make sure we only pick later ACKs for send
window updates.
Usually you expect SND.WL1 to follow the server sequence number stream
and SND.WL2
to follow the ACK sequence numbers in the range SND.UNA =< SEG.ACK <=
SND.MAX.
Now this bug is in the fast path that ignored to adjust SND.WL2 so it
does never
violate the invariant SND.UNA =< SEG.ACK <= SND.MAX. Thus the
SND.NXT could increase (with wraparound) so much that the window update
test for "newer" ACKs fails and no more window updates are done. In that
state
the send window closes down to zero and the stack reverts to zero window
probes sending 1 byte every 5 seconds slowing down over time even more.
All the bytes will be acknowledged, but the window will not be opened
giving us
a very slow send path where we might not even have the time an data to
overcome
this error condition.
tldr;
The server is not sending data, thus SEG.SEQ/SND.WL1 does not change.
SND.WL2
was not updated in the fast path for ACK only packets, thus the valid
sequence number window
could move so far away from SND.WL2 that the "greater" test will fail if
we have enough fast
path eligible pure ACK packets from the server.
Hope I didn't add too much to the confusion.
Frank
On 09/02/20 17:39, Tom Ivar Helbekkmo wrote:
Frank,
I'm probably missing something - but the data in kern/55567 seems to
show that snd_wl1 is getting left behind, not snd_wl2. Is that just an
accidental mis-labeling of those numbers? Your patch obviously works,
and makes sense. I'm just confused by the seeming discrepancy.
-tih
Home |
Main Index |
Thread Index |
Old Index