Subject: kern/29971: Loopback checksum optimizations cause UDP problems
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Matthias Scheler <tron@zhadum.de>
List: netbsd-bugs
Date: 04/14/2005 13:51:00
>Number:         29971
>Category:       kern
>Synopsis:       Loopback checksum optimizations cause UDP problems
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Apr 14 13:51:00 +0000 2005
>Originator:     tron@zhadum.de
>Release:        NetBSD 3.0_BETA (sources from 2005-04-12)
>Organization:
Matthias Scheler                                  http://scheler.de/~matthias/
>Environment:
System: NetBSD colwyn.zhadum.de 3.0_BETA NetBSD 3.0_BETA (COLWYN) #0: Wed Apr 13 04:09:07 BST 2005 tron@colwyn.zhadum.de:/export/scratch/tron/build.02357a/sys/compile/COLWYN i386
Architecture: i386
Machine: i386
>Description:
After upgrading my server from NetBSD 2.0.2 to 3.0_BETA its clients started
to experience DNS service problems. "netstat -s -p udp" suggested that
the problem is related to UDP packets with bad checksums. So I started
"tcpdump" on the client and captured e.g. this packet:

Frame 62 (82 bytes on wire, 82 bytes captured)
Ethernet II, Src: 00:07:e9:67:18:7b, Dst: 00:07:e9:0e:bb:33
Internet Protocol, Src Addr: 81.187.181.115 (81.187.181.115), Dst Addr: 81.187.181.114 (81.187.181.114)
User Datagram Protocol, Src Port: 1022 (1022), Dst Port: nfs (2049)
    Source port: 1022 (1022)
    Destination port: nfs (2049)
    Length: 48
    Checksum: 0x0e9e (incorrect, should be 0x3a47)
Remote Procedure Call, Type:Call XID:0x00002442
Network File System, NULL Call

My first guess was the hardware checksum support in the wm(4) driver used
on the server. But the problem still occurs after disabling it:

Frame 803 (150 bytes on wire, 150 bytes captured)
Ethernet II, Src: 00:07:e9:0e:bb:33, Dst: 00:07:e9:67:18:7b
Internet Protocol, Src Addr: 81.187.181.114 (81.187.181.114), Dst Addr: 81.187.181.115 (81.187.181.115)
User Datagram Protocol, Src Port: domain (53), Dst Port: 56918 (56918)
    Source port: domain (53)
    Destination port: 56918 (56918)
    Length: 116
    Checksum: 0x0ee2 (incorrect, should be 0xf6ac)
Domain Name System (response)

Wolfgang S. Rupprecht suggested to make the following changes ...

net.inet.ip.do_loopback_cksum=1
net.inet.tcp.do_loopback_cksum=1
net.inet.udp.do_loopback_cksum=1
net.inet6.tcp6.do_loopback_cksum=1
net.inet6.udp6.do_loopback_cksum=1

... with "sysctl" which fixed the problem. So it appears that the kernel
sometimes doesn't calculate UDP checksum although the packet gets
sent out to the wire.

Some more interesting data points:
1.) The server has a single ethernet interface which has an IPv4 address,
    an IPv4 alias an IPv6 alias assigned to it.

2.) The corruption only showed up in IPv4 packets (I've not tested IPv6)
    which uses the primary IPv4 address (and not the IPv4 alias) as
    source address.

>How-To-Repeat:
Run "host" in a loop like this:

while host www.netbsd.org
do
 sleep 1
done

It'll usually fail in less than a minute.

>Fix:
None provide.