port-sparc64/46260: gem0 driver fails to recover after RX overflow

To: port-sparc64-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: port-sparc64/46260: gem0 driver fails to recover after RX overflow
From: he%NetBSD.org@localhost
Date: Mon, 26 Mar 2012 22:25:00 +0000 (UTC)

>Number:         46260
>Category:       port-sparc64
>Synopsis:       gem0 driver fails to recover after RX overflow
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-sparc64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 26 22:25:00 +0000 2012
>Originator:     Havard Eidnes
>Release:        NetBSD 6.0_BETA
>Organization:
        None
>Environment:
System: NetBSD betelgeuse.urc.uninett.no 6.0_BETA NetBSD 6.0_BETA (GENERIC) #1: 
Mon Mar 26 20:41:19 UTC 2012 
he%betelgeuse.urc.uninett.no@localhost:/usr/obj/sys/arch/sparc64/compile/GENERIC
 sparc64
Architecture: sparc64
Machine: sparc64
>Description:
        I've currently been upgrading a SunFire V120 from 4.0 via 5.1
        to 6.0_BETA.  The host sometimes gets significant traffic over
        gem0.  With the code in 4.0, it has been rock solid.

        However, both with 5.1 and 6.0_BETA, the gem(4) Ethernet interface
        tends to lock up.  Adding some debugging printf()s reveals that
        the errors which occur right before the interface seizes up is
        an RX overflow, the modified code is:

...
        if (status & GEM_INTR_RX_MAC) {
                int rxstat = bus_space_read_4(t, h, GEM_MAC_RX_STATUS);
                /*
                 * At least with GEM_SUN_GEM and some GEM_SUN_ERI
                 * revisions GEM_MAC_RX_OVERFLOW happen often due to a
                 * silicon bug so handle them silently. Moreover, it's
                 * likely that the receiver has hung so we reset it.
                 */
                if (rxstat & GEM_MAC_RX_OVERFLOW) {
                        ifp->if_ierrors++;
                        aprint_error_dev(sc->sc_dev,
                            "receive error: RX overflow");
                        gem_reset_rxdma(sc);
...

        And this printf() is triggered.

>How-To-Repeat:
        Push lots of traffic through gem0 with either 5.1 or 6.0_BETA.
        Watch it seize up.

>Fix:
        Doing an "ifconfig gem0 down; ifconfig gem0 up" resets the
        interface so that it works again for a while.

Prev by Date: Re: kern/46259 (Use the correct parameter in ufs_extattr.c)
Next by Date: toolchain/46261: NetBSD 6.0_BETA and -current build fails in tools on some Linux systems
Previous by Thread: Re: port-xen/45975
Next by Thread: Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow
Indexes:

Home | Main Index | Thread Index | Old Index