NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/40018: Using hw TCP/IPv4 checksums on bge(4) causes connection failures



The following reply was made to PR kern/40018; it has been noted by GNATS.

From: buhrow%lothlorien.nfbcal.org@localhost (Brian Buhrow)
To: gnats-bugs%gnats.netbsd.org@localhost
Cc: buhrow%lothlorien.nfbcal.org@localhost
Subject: Re: kern/40018: Using hw TCP/IPv4 checksums on bge(4) causes 
connection failures
Date: Wed, 15 Sep 2010 23:27:07 -0700

 [If we could get this patch pulled up to 5.x sources, that would be
 wonderful!]
 
        hello.  Could you try the following patch and let me know if you get
 better results?  I'm testing it on a couple of machines here, and, so far,
 things look promising.
        Our driver initializes the Broadcom hardware to peform a tcp and udp
 checksum on only the payload of the tcp or udp packet, rather than the
 entire packet.  The FreeBSD and Linux drivers instruct the hardware to compute
 the checksum for the entire packet.  I believe the bug is that some revisions
 of the BCM hardware, under certain circumstances, revert to doing the
 complete checksum calculation, as the FreeBSD and Linux drivers request, while
 things are running. As
 a result, when we pull the computed checksum from the hardware and pass it
 up to the upper layers, we assume the checksum is the more minimal
 one, and the upper layers perform the appropriate checks, which, when this
 happens, cause the packet to be rejected because the resultant checksum is
 decidedly incorrect.
        This patch changes the driver to instruct the hardware to perform the
 checksum over the entire packet, just as the FreeBSD and Linux drivers do,
 and to notify the upper layers appropriately.  
 
        If this patch works, I'll amend the bug report to include the patch
 and this explanation.
        The first patch file below was made against 5.x sources, but applies 
cleanly
 to 4.x, 3.x  and 2.x sources as well.  
 The second patch file is against -current sources as of September 15, 2010.
 
        Here is a followup explanation as to what and why the patch does what
 it does.
 
 
        Hello.  Yes, this asymetry is intentional.  Both the Linux and FreeBSD
 drivers behave this way.  My understanding is that the stack presents a
 well formed packet to the driver for transmition to the Net, including a
 calculated checksum.  My analysis of what I was seeing did not indicate
 that we were generating packets with bad checksums, only that we were
 discarding a very high percentage of incoming packets for failing the
 checksum tests.  And, in fact, the known bug is that the BCM hardware
 doesn't do the pseudo calculations on outbound packets.  There's anotefrom
 2003 in our driver's history log which says that an assumption was made
 that if the chip couldn't do the proper calculations on outbound
 packets, then we shouldn't trust it to do the proper calculation on inbound
 packets.  A fine assumption, except that over the course of time, no one
 else has used this mode of operation, and, apparently, not tested it
 either.
        So far, the machines running this patch look good.  I'm seeing a few
 checksum failures, but given that these are name and mail servers which see
 traffic from all over the Net, I'm not surprised to see some checksum
 issues.  Before the patch, checksum counters were incrementing between 1
 and 10 times a second.  Now, checksum errors are down in the noise and
 matching my machines with different drivers and/or their hardware flags
 turned off.
 
 -Brian
 
        Here is the note I referenced above.  The last paragraph is relevant
 to this discussion.  The chips apparently do do the pseudo header
 calculation on packet reception, even if instructed not to.  I can't tell
 if the Linux driver ever had to resort to not getting pseudo data on packet
 reception, but neither the Linux or FreeBSD drivers indicate that this was
 a problem on any chip revisions.
 
 
 revision 1.46
 date: 2003/08/22 03:32:35;  author: jonathan;  state: Exp;  lines: +93 -14
 Check in hooks to fix checksum offload on bge devices. Empirical
 observation is that some 570x devices can get themselves into a state
 where they miscompute off-loaded TCP or UDP checksums on packets so
 small that Ethernet padding is required.  Further obsevation suggests
 that the bge checksum-offload hardware is adding those padding bytes
 into its TCP checksum computation. (Once a 5700 gets in this state,
 even a warm boot won't fix it: it needs a hard powerdown.)
 
 Work around the problem by padding such runts with zeros: even if the
 checksum-offload adds in extra zeros, the resulting sum will be correct.
 
 Also, dont trust the checksum-offload on received packets smaller than
 the minimum ethernet frame, in case the Rx-side has a similar bug.
 
 Finally, on packets where we do trust the outboard Rx-side TCP or UDP
 checksum, the bge did not include the pseudo-header. Set the
 M_CSUM_NO_PSEUDOHDR bit as well as M_CSUM_DATA, and rely on
 udp_input() or tcp_input() adding in the sum via in_cksum_phdr().
 ----------------------------
 On Sep 15,  8:55am, Chris Ross wrote:
 } Subject: Re: Hardware checksums on bge(4) interfaces
 } 
 }   The patch, left below, looks like it removes the =
 } BGE_MODECTL_RX_NO_PHDR_CSUM bit, but leaves the =
 } BGE_MODECTL_TX_NO_PHDR_CSUM bit in place.  By reading your description, =
 } this may solve only half of the problem.
 } 
 }   I don't know, of course, but it looked worth asking about...  Did you =
 } mean to leave the TX side without calculating the checksum of the =
 } pseudo-header?
 } 
 }            - Chris
 } 
 } 
 >-- End of excerpt from Chris Ross
 
 [...]
        As of this writing, I've been runing this patch on a number of
 production machines for 24 hours and haven't seen any problems with all
 hardware flags enabled.  I don't have any machines in production that can
 do tso4 operations in hardware, so haven't tested that aspect of things.  I
 believe, however, that they'll work fine since, again, this patch makes us
 drive the chips the same way the Linux and FreeBSD drivers do.
        Here is a list of chips I've been runing without issue.
 
 -Brian
 
 bge0 at pci1 dev 2 function 0: Broadcom BCM5704C Dual Gigabit Ethernet
 bge0: interrupting at ioapic1 pin 1
 bge0: ASIC BCM5704 A3 (0x2003), Ethernet address xx:xx:xx:xx:xx
 brgphy0 at bge0 phy 1: BCM5704 1000BASE-T media interface, rev. 0
 bge1 at pci1 dev 2 function 1: Broadcom BCM5704C Dual Gigabit Ethernet
 bge1: interrupting at ioapic1 pin 2
 bge1: ASIC BCM5704 A3 (0x2003), Ethernet address xx:xx:xx:xx:xx
 brgphy1 at bge1 phy 1: BCM5704 1000BASE-T media interface, rev. 0
 
 bge0 at pci1 dev 2 function 0: Broadcom BCM5703X Gigabit Ethernet
 bge0: interrupting at ioapic1 pin 1
 bge0: ASIC BCM5703 A2 (0x1002), Ethernet address xx:xx:xx:xx:xx
 brgphy0 at bge0 phy 1: BCM5703 1000BASE-T media interface, rev. 2
 bge1 at pci1 dev 3 function 0: Broadcom BCM5703X Gigabit Ethernet
 bge1: interrupting at ioapic1 pin 2
 bge1: ASIC BCM5703 A2 (0x1002), Ethernet address xx:xx:xx:xx:xx
 brgphy1 at bge1 phy 1: BCM5703 1000BASE-T media interface, rev. 2
 
 bge0 at pci6 dev 4 function 0: Broadcom BCM5714/5715 Gigabit Ethernet
 bge0: interrupting at ioapic0 pin 16
 bge0: ASIC unknown BCM5714 (0x9003), Ethernet address xx:xx:xx:xx:xx
 bge0: setting short Tx thresholds
 brgphy0 at bge0 phy 1: BCM5714 1000BASE-T media interface, rev. 0
        
 
 [And finally... The patches]
 
 [Patch against 5.x sources which applies also to 4.x, 3.x and 2.x sources.]
 
 Index: if_bge.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/pci/if_bge.c,v
 retrieving revision 1.152.4.2
 diff -u -r1.152.4.2 if_bge.c
 --- if_bge.c   2 Feb 2009 20:44:16 -0000       1.152.4.2
 +++ if_bge.c   15 Sep 2010 07:12:43 -0000
 @@ -1444,7 +1444,7 @@
         */
        CSR_WRITE_4(sc, BGE_MODE_CTL, BGE_DMA_SWAP_OPTIONS|
                    BGE_MODECTL_MAC_ATTN_INTR|BGE_MODECTL_HOST_SEND_BDS|
 -                  BGE_MODECTL_TX_NO_PHDR_CSUM|BGE_MODECTL_RX_NO_PHDR_CSUM);
 +                  BGE_MODECTL_TX_NO_PHDR_CSUM);
  
        /* Get cache line size. */
        cachesize = pci_conf_read(sc->sc_pc, sc->sc_pcitag, BGE_PCI_CACHESZ);
 @@ -3276,7 +3276,7 @@
                            cur_rx->bge_tcp_udp_csum;
                        m->m_pkthdr.csum_flags |=
                            (M_CSUM_TCPv4|M_CSUM_UDPv4|
 -                           M_CSUM_DATA|M_CSUM_NO_PSEUDOHDR);
 +                           M_CSUM_DATA);
                }
  
                /*
 
 [Patch against if_bge.c -current as of September 15, 2010.]
 
 --- if_bge.c.orig      2010-09-15 00:50:06.000000000 -0700
 +++ if_bge.c   2010-09-15 00:53:07.000000000 -0700
 @@ -1866,7 +1866,7 @@
         */
        CSR_WRITE_4(sc, BGE_MODE_CTL, BGE_DMA_SWAP_OPTIONS |
            BGE_MODECTL_MAC_ATTN_INTR | BGE_MODECTL_HOST_SEND_BDS |
 -          BGE_MODECTL_TX_NO_PHDR_CSUM | BGE_MODECTL_RX_NO_PHDR_CSUM);
 +          BGE_MODECTL_TX_NO_PHDR_CSUM);
  
        /*
         * BCM5701 B5 have a bug causing data corruption when using
 @@ -3444,7 +3444,7 @@
                            cur_rx->bge_tcp_udp_csum;
                        m->m_pkthdr.csum_flags |=
                            (M_CSUM_TCPv4|M_CSUM_UDPv4|
 -                           M_CSUM_DATA|M_CSUM_NO_PSEUDOHDR);
 +                           M_CSUM_DATA);
                }
  
                /*
 


Home | Main Index | Thread Index | Old Index