Subject: Re: TCP checksum not good enough?
To: Andy Ruhl <acruhl@gmail.com>
From: Greg Troxel <gdt@ir.bbn.com>
List: netbsd-users
Date: 08/02/2006 16:46:15
--=-=-=
Content-Transfer-Encoding: quoted-printable


"Andy Ruhl" <acruhl@gmail.com> writes:

> A very large database was being backed up over a TCP/IP network, and
> the restore of it to a test system would often be corrupt. This
> prompted some very heated conversations, let's say.
>
> The backup application involved had a checksumming feature on the
> payload of the data, and when this was turned on an error was found.
>
> Which means this data passed through whatever hardware checking was
> done (not known to me exactly what, if any, there is) AND TCP
> checksumming.

You didn't say what kind of hardware, but Ethernet has a CRC.

Your hardware is likely bad.  Keep in mind that errors can happen in
host RAM as well; the Ethernet CRC only protects data from NIC to NIC.

> Is this type of thing common? When I ftp my multiple gigabyte, gzipped
> dumps of my NetBSD machine across the TCP/IP network to another
> machine where I store them, should I expect that randomly this stuff
> is going to be corrupt?

No, but it's wise to use a strong checksum on such things.

The basic issue is that TCP checksum is only 16 bits.  If a packet is
randomly corrupted, there's a 1/65536 chance the checksum will check.
Of course, errors can be non-random.
So if there are lots of errors, some will get through.  But, errors
should be near zero to start with.

On a (3.0/i386) system up 59 days, selected 'netstat -s' output:

ip:
        77227514 total packets received
        34 with data size < data length
udp:
        55820776 datagrams received
        3 with bad checksum
tcp:
        19340989 packets received
                308 packets (12266 bytes) of data after window
                1909 packets received after close
                851 discarded for bad checksums
                2 discarded for bad header offset fields

5.3E-8 of the UDP packets have bad checksums.  There's no reason to
expect that bad ones made it through.

4.4E-5 (44 packets in 1E6) of the TCP packets have bad checksums.  851
is close enough to 65536 that I'd guess that 0.013 bad packets were
accepted.

It doesn't surprise me that TCP is worse; this machine receives lots
of connections from spammers and random pokes at its web server in
addition to my own ssh traffic and backups.  So some of the bad
packets may well have been sent incorrectly.  UDP is mostly DNS.

I've seen corruption with a buggy kernel and atheros cards.  In this
case, ssh or bittorrent provided error detection and recovery with
sha1.

What do you see for "netstat -p tcp -s"?

=2D-=20
    Greg Troxel <gdt@ir.bbn.com>

--=-=-=
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (NetBSD)

iD8DBQFE0Q8X+vesoDJhHiURAk5KAJ9IgtNvAcp3CyEudZRyD3CpPRCjuQCfa1Yj
QGjbF9OgyXiWSg/S9u3ZwdI=
=CTi5
-----END PGP SIGNATURE-----
--=-=-=--