tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/55567: tcp-send slows down to slow single byte transfers (analysis and fix)



Analysis:

The issue is that on a tcp connection that is only sending data but not receiving data it can happen that the send window closes. In this state bytes are transferred to the receiver
via one by one via the zero window probes but snd_wnd updates are skipped.
The reason for the skipped updates is that SND.WL2 (last window update ACK) has left
the valid range of SND.UNA =< SEG.ACK <= SND.MAXSEND by a large amount.

Data flows again when the receiving side sends some data (that's why you can get
remote login session unstuck by typing.

Tracking a connection finds that SND.WL2 is basically starving in this scenario and reaches a point where the window update code does not update the send window any more:

    /*
     * Update window information.
     * Don't look at window if no ACK: TAC's send garbage on first SYN.
     */
    if ((tiflags & TH_ACK) && (SEQ_LT(tp->snd_wl1, th->th_seq) ||
        (tp->snd_wl1 == th->th_seq && (SEQ_LT(tp->snd_wl2, th->th_ack) ||
        (tp->snd_wl2 == th->th_ack && tiwin > tp->snd_wnd))))) {

It is an ACK, yes, th_seq has not changed (no data from the recieving side).
SEQ_LT(tp->snd_wl2, th->th_ack) returns 0 in the stuck state and it is not a true window update,, thus the send window updates are stuck until the comparison returns 1 again. This can take a very long time at a data rate of 0.2 Bytes/second (the rate will be scaled
back further later on).

So, why are updates to SND.WL2 not happening. The cause is an optimization common
cases of unidirectional transfers:
    /*
     * Fast path: check for the two common cases of a uni-directional
     * data transfer. If:
     *    o We are in the ESTABLISHED state, and
     *    o The packet has no control flags, and
     *    o The packet is in-sequence, and
     *    o The window didn't change, and
     *    o We are not retransmitting
     * It's a candidate.
     *
     * If the length (tlen) is zero and the ack moved forward, we're
     * the sender side of the transfer. Just free the data acked and
     * wake any higher level process that was blocked waiting for
     * space.
     *
     * If the length is non-zero and the ack didn't move, we're the
     * receiver side. If we're getting packets in-order (the reassembly
     * queue is empty), add the data to the socket buffer and note
     * that we need a delayed ack.
     */

The path taken that leads to SND.WL2 being starved out is the pure ACK section which adjusts the send buffer, updates SND.UNA, SND.FACK, SND.HIGH, frees mbuf
, send more data if available and returns.
SND.WL2 is never touched here an so a longer sequence can leave SND.WL2 far enough
behind for the stuck zero window size to occur.

Proposed fix:
diff -u -r1.418 tcp_input.c
--- tcp_input.c    6 Jul 2020 18:49:12 -0000    1.418
+++ tcp_input.c    2 Sep 2020 07:59:46 -0000
@@ -1897,6 +1897,19 @@
                 tp->snd_fack = tp->snd_una;
                 if (SEQ_LT(tp->snd_high, tp->snd_una))
                     tp->snd_high = tp->snd_una;
+                /*
+                 * drag snd_wl2 along so only newer
+                 * ACKs can update the window size.
+                 * also avoids the state where snd_wl2
+                 * is eventually larger than th_ack and thus
+                 * blocking the window update mechanism and
+                 * the connection gets stuck for a loooong
+                 * time in the zero sized send window state.
+                 *
+                 * see PR/kern 55567
+                 */
+                tp->snd_wl2 = tp->snd_una;
+
                 m_freem(m);

                 /*

Frank


Home | Main Index | Thread Index | Old Index