Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow

To: port-sparc64-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,he%NetBSD.org@localhost
Subject: Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow
From: Havard Eidnes <he%NetBSD.org@localhost>
Date: Thu, 12 Apr 2012 07:50:03 +0000 (UTC)

The following reply was made to PR port-sparc64/46260; it has been noted by 
GNATS.

From: Havard Eidnes <he%NetBSD.org@localhost>
To: gnats-bugs%NetBSD.org@localhost, jdc%coris.org.uk@localhost
Cc: port-sparc64-maintainer%netbsd.org@localhost
Subject: Re: port-sparc64/46260: gem0 driver fails to recover after RX
 overflow
Date: Thu, 12 Apr 2012 09:47:19 +0200 (CEST)

 >  > +                 aprint_error_dev(sc->sc_dev,
 >  > +                     "receive error: RX no buffer space\n");
 >
 >  I wonder if we should print out these extra diagnostics (we don't do=
  it
 >  in other drivers).  I think that we would be better off by updating =
 extra
 >  counters, and using them as per the suggestions in "Using event coun=
 ters
 >  with network interfaces, is there a reason they're all ifdefed out o=
 f
 >  mainline use?" thread, starting at:
 >
 >    http://mail-index.NetBSD.org/tech-kern/2011/12/10/msg012122.html

 That's true, I added the printf()s only as a debugging aid to more
 easily see if there was anything else going on which might trigger
 the problem.  I agree that doing printf()s on input errors is not
 appropriate for production code.

 >  I think that the reason that you're seeing the extra resets is the d=
 ifference
 >  between checking GEM_MAC_RX_OVERFLOW in gem_intr():
 >
 >  >           if (rxstat & GEM_MAC_RX_OVERFLOW) {
 >  >                   ifp->if_ierrors++;
 >  > +                 aprint_error_dev(sc->sc_dev,
 >  > +                     "receive error: RX overflow\n");
 >  >                   gem_reset_rxdma(sc);
 >
 >  but checking GEM_MAC_STATE_OVERFLOW in gem_rx_watchdog():
 >
 >  > + if ((state & GEM_MAC_STATE_OVERFLOW) =3D=3D GEM_MAC_STATE_OVERFL=
 OW &&

 Well, actually, no...  If GEM_MAC_RX_OVERFLOW is flagged, it appears
 that in my case a gem_reset_rxdma() is *not* sufficient to kick the
 receiver back to life.  Also, the GEM_MAC_STATE_OVERFLOW test in the
 gem_rx_watchdog() function in my case never kicks in, so that's why
 I added the extra code in the else clause, doing gem_init() there as
 well, doing a full unconditional reset in the watchdog function.
 Each and every time this has happened in my case, it's always been
 the code in the else clause in gem_rx_watchdog() which has kicked
 in.

 >  GEM_MAC_RX_OVERFLOW in gem_intr().  Maybe we can check GEM_MAC_STATE=
 _OVERFLOW
 >  if GEM_MAC_RX_OVERFLOW is set and fire the callout only then.  Alter=
 natively,
 >  we might only need to check GEM_MAC_STATE_OVERFLOW (and not bother w=
 ith
 >  GEM_MAC_RX_OVERFLOW at all).

 The change I added is adapted from the OpenBSD driver, from this
 diff:

 http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/ic/gem.c.diff?r1=3D1.=
 88;r2=3D1.89;f=3Dh

 Do we have any documentation anywhere which douments the bit fields
 in the GEM_MAC_MAC_STATE register?  In my case I always read back
 0x10400.

 However, what worries me is the ease with which this problem can now
 be triggered.  It doesn't take particularly heavy network traffic to
 make it happen.  And, furthermore, this appears to be a regression
 compared to the release I was running earlier, 4.0.1.

 Regards,

 - H=E5vard

Prev by Date: Re: port-i386/45704 (Floppy install fails to load //ffs/ffs.kmod and reboots)
Next by Date: kern/46325: wapbl + disk io = temporary system freeze
Previous by Thread: Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow
Next by Thread: Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow
Indexes:

Home | Main Index | Thread Index | Old Index