Re: bad interaction between TCP delayed ack and RSTs

To: Ignatios Souvatzis <is%netbsd.org@localhost>
Subject: Re: bad interaction between TCP delayed ack and RSTs
From: jmmikkel%MIT.EDU@localhost (Joanne M Mikkelson)
Date: 18 Jun 2009 01:23:37 -0400

> You'll find the full official explanation from the NetBSD side in our
> security advisory SA2004-006[1].
> 
> As far as I can tell, the behaviour is roughly along the lines of
> draft-ietf-tcpm-tcpsecure-11.txt[2] (was -0.txt back then). I'll
> have to read more code to tell exactly, especialy as [2] has evolved
> and is still not finalized.

Thank you for both of these references! I would definitely have been
less puzzled about the behavior and what to do if I had known about
the I-D.

If NetBSD is trying to follow these tcpsecure recommendations, then I
think my problem is clearly the result of a bug. From that draft:

   2) If the RST bit is set and the sequence number exactly matches the
      next expected sequence number (RCV.NXT), then TCP MUST reset the
      connection.

Version 00 said the same thing, although much less clearly.

NetBSD isn't testing against RCV.NXT, it's testing the last ACK sent.
Following the recommendation, I think the code should simply read:
        if (tiflags & TH_RST) {
                if (th->th_seq != tp->rcv_next) {
                        goto dropafterack_ratelim;

I've tested this version and it works for my test case. Certainly I
like this change better as it eliminates whether delayed ack is a
factor, which does make a lot more sense. And it's implementing the
proposed spec, if I'm understanding this code correctly.

Now I believe that the same test earlier (lines 2099-2105 of revision
1.291) should be changed as well. I haven't looked at the rest of the
RST-handling requirements in detail against the code either, so there
could be more, but these two would kill the problem I've been seeing.

> [1] I suspect we might need a generic defense against blackholed RSTs
> leading to accumulated waiting server processes; but then again, 
> maybe the servers in question should have their own timeout handling
> outside of the kernel.

Yes, my server has had this timeout from the beginning. In fact, the
close() after the timeout actually caused the first symptom I
noticed: "Why am I sending all these FINs? The client disconnected
several minutes ago."

Joanne

Follow-Ups:
- Re: bad interaction between TCP delayed ack and RSTs
  - From: Joanne M Mikkelson

References:
- bad interaction between TCP delayed ack and RSTs
  - From: Joanne M Mikkelson
- Re: bad interaction between TCP delayed ack and RSTs
  - From: Ignatios Souvatzis

Prev by Date: Re: Stale route not flushed/updated
Next by Date: Re: bad interaction between TCP delayed ack and RSTs
Previous by Thread: Re: bad interaction between TCP delayed ack and RSTs
Next by Thread: Re: bad interaction between TCP delayed ack and RSTs
Indexes:

Home | Main Index | Thread Index | Old Index