kern/49390: Unsigned wraparound in tcp sequence number calculations on 64-bit machines

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/49390: Unsigned wraparound in tcp sequence number calculations on 64-bit machines
From: uwe%NetBSD.org@localhost
Date: Thu, 13 Nov 2014 16:35:00 +0000 (UTC)

>Number:         49390
>Category:       kern
>Synopsis:       Unsigned wraparound in tcp sequence number calculations on 64-bit machines
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Nov 13 16:35:00 +0000 2014
>Originator:     Valery Ushakov
>Release:        current
>Organization:
>Environment:
>Description:
tcp_output() mixes "long" and "uint32_t" when doing sequence number
calculations and on 64-bit that's a dangerous mix.

It looks like I forgot to report this when this was still fresh in my
memory, so I'd better file it now before I forgot even more details.
Apologies in advance for a bit of handwaving.

I actually encountered this problem while working on slirp-based NAT
in VirtualBox.  Slirp uses old BSD stack code.  In NetBSD the relevant
code is still mostly the same.  The bug that I did run into in slirp
is fixed in NetBSD and I haven't tried to trigger potentially buggy
code paths with other instances of this problem.  Someone with a
better clue should review it, but I think it should be fixed for
general correctness.

The bug that I ran into in slirp was fixed in NetBSD by rev. 1.112 of
tcp_output.c:

  http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/netinet/tcp_output.c.diff?r1=1.111&r2=1.112

  revision 1.112
  date: 2004-05-08 18:41:47 +0400;  author: chs;  state: Exp;  lines: +4 -4;
  work around an LP64 problem where we report an excessively large window
  due to incorrect mixing of types.

That problem was triggered by zero window probes by remote's persist
timer, when rcv_adv (owned by tcp_output - advertised right end of the
window) was "overtaken" by rvc_nxt by one byte - the probe accepted
into previously closed window (rcv_nxt is owned by tcp_input).  So

  /* 64 bit */ long win = (long)(tp->rcv_adv - tp->rcv_nxt);

became 4294967295L instead of -1 [gory details available upon request].


We still have more instances of code that does the same

  long delta = uint32_seq1 - uint32_seq2;


E.g. earlier in tcp_output() 

                /*
                 * "adv" is the amount we can increase the window,
                 * taking into account that we are limited by
                 * TCP_MAXWIN << tp->rcv_scale.
                 */
                long adv = min(win, (long)TCP_MAXWIN << tp->rcv_scale) -
                        (tp->rcv_adv - tp->rcv_nxt);

with rcv_adv and rcv_nxt again.


We also have it in tcp_seq.h in

  #define SEQ_SUB(a,b)    ((long)((a)-(b)))


I was reminded to file this b/c I accidentally stumbled upon this blog
post:

  http://blogmal.42.org/tidbits/tcp-bug.story

that seems directly relevant to the unfixed "long adv" mentioned
above.

>How-To-Repeat:
Source inspection.
>Fix:
There are two aspects to this.

One is the technicality of doing the correct casts.

The more interesting problem is verifying whether the surrounding code
is prepared to handle negative deltas.  It's obviously orthogonal to
the actual casting bug described here as on 32-bit machines there's no
bug.  We may formally ignore this aspect for now since fixing the
casts will make 64-bit code do the same thing that 32-bit code has
been doing all this time.  But it might be a good pretext for someone
with a clue to actually look at that "while there".

Prev by Date: Re: lib/49387: erfl and erfcl missing
Next by Date: Re: lib/49387: erfl and erfcl missing
Previous by Thread: bin/49389: config(1) is incompatible with previous releases
Next by Thread: Re: kern/49390: Unsigned wraparound in tcp sequence number calculations on 64-bit machines
Indexes:

Home | Main Index | Thread Index | Old Index