NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/40018: Using hw TCP/IPv4 checksums on bge(4) causes connection failures



>Number:         40018
>Category:       kern
>Synopsis:       Using hw TCP/IPv4 checksums on bge(4) causes connection 
>failures
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Nov 24 20:30:00 +0000 2008
>Originator:     Matthias Scheler
>Release:        NetBSD 5.0_BETA 2008-11-22 sources
>Organization:
Matthias Scheler                                  http://zhadum.org.uk/
>Environment:
System: NetBSD colwyn.zhadum.org.uk 5.0_BETA NetBSD 5.0_BETA (COLWYN.64) #0: 
Sat Nov 22 13:29:35 GMT 2008 
tron%lyssa.zhadum.org.uk@localhost:/src/sys/compile/COLWYN.64 amd64
Architecture: x86_64
Machine: amd64
>Description:
Since yesterday I'm using a HP Proliant ML110 G4 as my mail server using the
on-board bge(4) network interface:

bge0 at pci3 dev 0 function 0: Broadcom BCM5721 Gigabit Ethernet
bge0: interrupting at ioapic0 pin 17
bge0: ASIC unknown BCM575x family (0x4201), Ethernet address 00:1c:c4:xx:xx:xx
bge0: setting short Tx thresholds
brgphy0 at bge0 phy 1: BCM5750 1000BASE-T media interface, rev. 0
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto

The machine was setup to use all the hardware offload features provided
by the interface:

bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        
capabilities=3f80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx>
        
enabled=3f80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx>
        address: 00:1c:c4:xx:xx:xx
        media: Ethernet autoselect (1000baseT 
full-duplex,flowcontrol,rxpause,txpause)
        status: active

One of my users started experiencing problems sending e-mails using
Outlook Express on a Windows XP Pro system behind an A-DSL router.
The A-DSL router uses PPPoE and is therefore restricted to a MTU of
1492 bytes.

Here is a "tcpdump" output for a broken SMTP session:

19:18:18.886636 IP (tos 0x0, ttl 114, id 1959, offset 0, flags [DF], proto TCP 
(6), length 48) 192.0.2.1.15415 > 81.187.181.119.587: S, cksum 0xf20b 
(correct), 4076085614:4076085614(0) win 16384 <mss 1452,nop,nop,sackOK>
19:18:18.886672 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 48, bad cksum 0 (->3c92)!) 81.187.181.119.587 > 192.0.2.1.15415: S, 
cksum 0xe852 (correct), 217365675:217365675(0) ack 4076085615 win 32768 <mss 
1460,sackOK,nop,nop>
19:18:19.116434 IP (tos 0x0, ttl 241, id 65259, offset 0, flags [none], proto 
TCP (6), length 40) 192.0.2.1.15415 > 81.187.181.119.587: ., cksum 0x8d17 
(correct), 1:1(0) ack 1 win 2048
19:18:19.117507 IP (tos 0x0, ttl 64, id 52571, offset 0, flags [DF], proto TCP 
(6), length 93, bad cksum 0 (->6f09)!) 81.187.181.119.587 > 192.0.2.1.15415: P, 
cksum 0xfe85 (incorrect (-> 0x858e), 1:54(53) ack 1 win 33580
19:18:19.121950 IP (tos 0x0, ttl 114, id 1960, offset 0, flags [DF], proto TCP 
(6), length 40) 192.0.2.1.15415 > 81.187.181.119.587: ., cksum 0x5107 
(correct), 1:1(0) ack 1 win 17424
19:18:19.353061 IP (tos 0x0, ttl 114, id 1961, offset 0, flags [DF], proto TCP 
(6), length 61) 192.0.2.1.15415 > 81.187.181.119.587: P, cksum 0xa573 
(correct), 1:22(21) ack 54 win 17371
19:18:19.353125 IP (tos 0x0, ttl 64, id 52583, offset 0, flags [DF], proto TCP 
(6), length 215, bad cksum 0 (->6e83)!) 81.187.181.119.587 > 192.0.2.1.15415: 
P, cksum 0xfeff (incorrect (-> 0x39ff), 54:229(175) ack 22 win 33580
19:18:19.600246 IP (tos 0x0, ttl 114, id 1964, offset 0, flags [DF], proto TCP 
(6), length 69) 192.0.2.1.15415 > 81.187.181.119.587: P, cksum 0xe26f 
(correct), 22:51(29) ack 229 win 17196
19:18:19.600305 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 55, bad cksum 0 (->3c8b)!) 81.187.181.119.587 > 192.0.2.1.15415: P, 
cksum 0xfe5f (incorrect (-> 0xf1af), 229:244(15) ack 51 win 33580
19:18:19.832051 IP (tos 0x0, ttl 114, id 1965, offset 0, flags [DF], proto TCP 
(6), length 77) 192.0.2.1.15415 > 81.187.181.119.587: P, cksum 0xf4e8 
(correct), 51:88(37) ack 244 win 17181
19:18:19.832105 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 58, bad cksum 0 (->3c88)!) 81.187.181.119.587 > 192.0.2.1.15415: P, 
cksum 0xfe62 (incorrect (-> 0x31f5), 244:262(18) ack 88 win 33580
19:18:20.065012 IP (tos 0x0, ttl 114, id 1968, offset 0, flags [DF], proto TCP 
(6), length 46) 192.0.2.1.15415 > 81.187.181.119.587: P, cksum 0xab15 
(correct), 88:94(6) ack 262 win 17163
19:18:20.065199 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 66, bad cksum 0 (->3c80)!) 81.187.181.119.587 > 192.0.2.1.15415: P, 
cksum 0xfe6a (incorrect (-> 0xa8f5), 262:288(26) ack 94 win 65535
19:18:20.326382 IP (tos 0x0, ttl 114, id 1969, offset 0, flags [DF], proto TCP 
(6), length 1492) 192.0.2.1.15415 > 81.187.181.119.587: ., cksum 0x3045 
(correct), 94:1546(1452) ack 288 win 17137
19:18:20.326387 IP (tos 0x0, ttl 114, id 1970, offset 0, flags [DF], proto TCP 
(6), length 151) 192.0.2.1.15415 > 81.187.181.119.587: P, cksum 0x86a6 
(correct), 1546:1657(111) ack 288 win 17137
19:18:20.326405 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 52, bad cksum 0 (->3c8e)!) 81.187.181.119.587 > 192.0.2.1.15415: ., 
cksum 0xfe5c (incorrect (-> 0x283b), 288:288(0) ack 94 win 65535 <nop,nop,sack 
1 {1546:1657}>
19:18:20.562681 IP (tos 0x0, ttl 114, id 1973, offset 0, flags [DF], proto TCP 
(6), length 45) 192.0.2.1.15415 > 81.187.181.119.587: P, cksum 0x056b 
(correct), 1657:1662(5) ack 288 win 17137
19:18:20.562704 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 52, bad cksum 0 (->3c8e)!) 81.187.181.119.587 > 192.0.2.1.15415: ., 
cksum 0xfe5c (incorrect (-> 0x2836), 288:288(0) ack 94 win 65535 <nop,nop,sack 
1 {1546:1662}>
19:18:20.821478 IP (tos 0x0, ttl 114, id 1974, offset 0, flags [DF], proto TCP 
(6), length 1492) 192.0.2.1.15415 > 81.187.181.119.587: ., cksum 0x3045 
(correct), 94:1546(1452) ack 288 win 17137
19:18:22.822304 IP (tos 0x0, ttl 114, id 1981, offset 0, flags [DF], proto TCP 
(6), length 1492) 192.0.2.1.15415 > 81.187.181.119.587: ., cksum 0x3045 
(correct), 94:1546(1452) ack 288 win 17137
19:18:26.837216 IP (tos 0x0, ttl 114, id 1985, offset 0, flags [DF], proto TCP 
(6), length 1492) 192.0.2.1.15415 > 81.187.181.119.587: ., cksum 0x3045 
(correct), 94:1546(1452) ack 288 win 17137
19:18:34.859651 IP (tos 0x0, ttl 114, id 1991, offset 0, flags [DF], proto TCP 
(6), length 1492) 192.0.2.1.15415 > 81.187.181.119.587: ., cksum 0x3045 
(correct), 94:1546(1452) ack 288 win 17137
19:18:50.810659 IP (tos 0x0, ttl 114, id 1994, offset 0, flags [DF], proto TCP 
(6), length 1492) 192.0.2.1.15415 > 81.187.181.119.587: ., cksum 0x3045 
(correct), 94:1546(1452) ack 288 win 17137
19:19:20.279884 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), 
length 52, bad cksum 0 (->3c8e)!) 81.187.181.119.587 > 192.0.2.1.15415: R, 
cksum 0xfe5c (incorrect (-> 0x2832), 288:288(0) ack 94 win 65535 <nop,nop,sack 
1 {1546:1662}>

[I can provide a ".pcap" file on request.]

The SMTP daemon terminated the TCP connection after 60 seconds because
it thought that the connection had gone idle.

As I was surprised by the TCP problem I tried to change the interface
settings. I could finally fix the problem by disabling hardware-assisted
TCP/IPv4 checksums with "ifconfig bge0 -tcp4csum" while hardware-assisted
IPv4 (and UDP) checksums were still enabled.

This looks like a problem with TCP checksum offload in the bge(4) driver
or a hardware bug, not sure which one it is. But if it is a hardware
problem then bge(4) shouldn't offer that feature on this chip.

>How-To-Repeat:

>Fix:
Not known.



Home | Main Index | Thread Index | Old Index