NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/40018: Using hw TCP/IPv4 checksums on bge(4) causes connection failures
The following reply was made to PR kern/40018; it has been noted by GNATS.
From: buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow)
To: gnats-bugs%gnats.netbsd.org@localhost
Cc: buhrow%lothlorien.nfbcal.org@localhost
Subject: Re: kern/40018: Using hw TCP/IPv4 checksums on bge(4) causes
connection failures
Date: Wed, 15 Sep 2010 23:27:07 -0700
[If we could get this patch pulled up to 5.x sources, that would be
wonderful!]
hello. Could you try the following patch and let me know if you get
better results? I'm testing it on a couple of machines here, and, so far,
things look promising.
Our driver initializes the Broadcom hardware to peform a tcp and udp
checksum on only the payload of the tcp or udp packet, rather than the
entire packet. The FreeBSD and Linux drivers instruct the hardware to compute
the checksum for the entire packet. I believe the bug is that some revisions
of the BCM hardware, under certain circumstances, revert to doing the
complete checksum calculation, as the FreeBSD and Linux drivers request, while
things are running. As
a result, when we pull the computed checksum from the hardware and pass it
up to the upper layers, we assume the checksum is the more minimal
one, and the upper layers perform the appropriate checks, which, when this
happens, cause the packet to be rejected because the resultant checksum is
decidedly incorrect.
This patch changes the driver to instruct the hardware to perform the
checksum over the entire packet, just as the FreeBSD and Linux drivers do,
and to notify the upper layers appropriately.
If this patch works, I'll amend the bug report to include the patch
and this explanation.
The first patch file below was made against 5.x sources, but applies
cleanly
to 4.x, 3.x and 2.x sources as well.
The second patch file is against -current sources as of September 15, 2010.
Here is a followup explanation as to what and why the patch does what
it does.
Hello. Yes, this asymetry is intentional. Both the Linux and FreeBSD
drivers behave this way. My understanding is that the stack presents a
well formed packet to the driver for transmition to the Net, including a
calculated checksum. My analysis of what I was seeing did not indicate
that we were generating packets with bad checksums, only that we were
discarding a very high percentage of incoming packets for failing the
checksum tests. And, in fact, the known bug is that the BCM hardware
doesn't do the pseudo calculations on outbound packets. There's anotefrom
2003 in our driver's history log which says that an assumption was made
that if the chip couldn't do the proper calculations on outbound
packets, then we shouldn't trust it to do the proper calculation on inbound
packets. A fine assumption, except that over the course of time, no one
else has used this mode of operation, and, apparently, not tested it
either.
So far, the machines running this patch look good. I'm seeing a few
checksum failures, but given that these are name and mail servers which see
traffic from all over the Net, I'm not surprised to see some checksum
issues. Before the patch, checksum counters were incrementing between 1
and 10 times a second. Now, checksum errors are down in the noise and
matching my machines with different drivers and/or their hardware flags
turned off.
-Brian
Here is the note I referenced above. The last paragraph is relevant
to this discussion. The chips apparently do do the pseudo header
calculation on packet reception, even if instructed not to. I can't tell
if the Linux driver ever had to resort to not getting pseudo data on packet
reception, but neither the Linux or FreeBSD drivers indicate that this was
a problem on any chip revisions.
revision 1.46
date: 2003/08/22 03:32:35; author: jonathan; state: Exp; lines: +93 -14
Check in hooks to fix checksum offload on bge devices. Empirical
observation is that some 570x devices can get themselves into a state
where they miscompute off-loaded TCP or UDP checksums on packets so
small that Ethernet padding is required. Further obsevation suggests
that the bge checksum-offload hardware is adding those padding bytes
into its TCP checksum computation. (Once a 5700 gets in this state,
even a warm boot won't fix it: it needs a hard powerdown.)
Work around the problem by padding such runts with zeros: even if the
checksum-offload adds in extra zeros, the resulting sum will be correct.
Also, dont trust the checksum-offload on received packets smaller than
the minimum ethernet frame, in case the Rx-side has a similar bug.
Finally, on packets where we do trust the outboard Rx-side TCP or UDP
checksum, the bge did not include the pseudo-header. Set the
M_CSUM_NO_PSEUDOHDR bit as well as M_CSUM_DATA, and rely on
udp_input() or tcp_input() adding in the sum via in_cksum_phdr().
----------------------------
On Sep 15, 8:55am, Chris Ross wrote:
} Subject: Re: Hardware checksums on bge(4) interfaces
}
} The patch, left below, looks like it removes the =
} BGE_MODECTL_RX_NO_PHDR_CSUM bit, but leaves the =
} BGE_MODECTL_TX_NO_PHDR_CSUM bit in place. By reading your description, =
} this may solve only half of the problem.
}
} I don't know, of course, but it looked worth asking about... Did you =
} mean to leave the TX side without calculating the checksum of the =
} pseudo-header?
}
} - Chris
}
}
>-- End of excerpt from Chris Ross
[...]
As of this writing, I've been runing this patch on a number of
production machines for 24 hours and haven't seen any problems with all
hardware flags enabled. I don't have any machines in production that can
do tso4 operations in hardware, so haven't tested that aspect of things. I
believe, however, that they'll work fine since, again, this patch makes us
drive the chips the same way the Linux and FreeBSD drivers do.
Here is a list of chips I've been runing without issue.
-Brian
bge0 at pci1 dev 2 function 0: Broadcom BCM5704C Dual Gigabit Ethernet
bge0: interrupting at ioapic1 pin 1
bge0: ASIC BCM5704 A3 (0x2003), Ethernet address xx:xx:xx:xx:xx
brgphy0 at bge0 phy 1: BCM5704 1000BASE-T media interface, rev. 0
bge1 at pci1 dev 2 function 1: Broadcom BCM5704C Dual Gigabit Ethernet
bge1: interrupting at ioapic1 pin 2
bge1: ASIC BCM5704 A3 (0x2003), Ethernet address xx:xx:xx:xx:xx
brgphy1 at bge1 phy 1: BCM5704 1000BASE-T media interface, rev. 0
bge0 at pci1 dev 2 function 0: Broadcom BCM5703X Gigabit Ethernet
bge0: interrupting at ioapic1 pin 1
bge0: ASIC BCM5703 A2 (0x1002), Ethernet address xx:xx:xx:xx:xx
brgphy0 at bge0 phy 1: BCM5703 1000BASE-T media interface, rev. 2
bge1 at pci1 dev 3 function 0: Broadcom BCM5703X Gigabit Ethernet
bge1: interrupting at ioapic1 pin 2
bge1: ASIC BCM5703 A2 (0x1002), Ethernet address xx:xx:xx:xx:xx
brgphy1 at bge1 phy 1: BCM5703 1000BASE-T media interface, rev. 2
bge0 at pci6 dev 4 function 0: Broadcom BCM5714/5715 Gigabit Ethernet
bge0: interrupting at ioapic0 pin 16
bge0: ASIC unknown BCM5714 (0x9003), Ethernet address xx:xx:xx:xx:xx
bge0: setting short Tx thresholds
brgphy0 at bge0 phy 1: BCM5714 1000BASE-T media interface, rev. 0
[And finally... The patches]
[Patch against 5.x sources which applies also to 4.x, 3.x and 2.x sources.]
Index: if_bge.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/if_bge.c,v
retrieving revision 1.152.4.2
diff -u -r1.152.4.2 if_bge.c
--- if_bge.c 2 Feb 2009 20:44:16 -0000 1.152.4.2
+++ if_bge.c 15 Sep 2010 07:12:43 -0000
@@ -1444,7 +1444,7 @@
*/
CSR_WRITE_4(sc, BGE_MODE_CTL, BGE_DMA_SWAP_OPTIONS|
BGE_MODECTL_MAC_ATTN_INTR|BGE_MODECTL_HOST_SEND_BDS|
- BGE_MODECTL_TX_NO_PHDR_CSUM|BGE_MODECTL_RX_NO_PHDR_CSUM);
+ BGE_MODECTL_TX_NO_PHDR_CSUM);
/* Get cache line size. */
cachesize = pci_conf_read(sc->sc_pc, sc->sc_pcitag, BGE_PCI_CACHESZ);
@@ -3276,7 +3276,7 @@
cur_rx->bge_tcp_udp_csum;
m->m_pkthdr.csum_flags |=
(M_CSUM_TCPv4|M_CSUM_UDPv4|
- M_CSUM_DATA|M_CSUM_NO_PSEUDOHDR);
+ M_CSUM_DATA);
}
/*
[Patch against if_bge.c -current as of September 15, 2010.]
--- if_bge.c.orig 2010-09-15 00:50:06.000000000 -0700
+++ if_bge.c 2010-09-15 00:53:07.000000000 -0700
@@ -1866,7 +1866,7 @@
*/
CSR_WRITE_4(sc, BGE_MODE_CTL, BGE_DMA_SWAP_OPTIONS |
BGE_MODECTL_MAC_ATTN_INTR | BGE_MODECTL_HOST_SEND_BDS |
- BGE_MODECTL_TX_NO_PHDR_CSUM | BGE_MODECTL_RX_NO_PHDR_CSUM);
+ BGE_MODECTL_TX_NO_PHDR_CSUM);
/*
* BCM5701 B5 have a bug causing data corruption when using
@@ -3444,7 +3444,7 @@
cur_rx->bge_tcp_udp_csum;
m->m_pkthdr.csum_flags |=
(M_CSUM_TCPv4|M_CSUM_UDPv4|
- M_CSUM_DATA|M_CSUM_NO_PSEUDOHDR);
+ M_CSUM_DATA);
}
/*
Home |
Main Index |
Thread Index |
Old Index