Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow

To: port-sparc64-maintainer%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,he%NetBSD.org@localhost
Subject: Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow
From: Julian Coleman <jdc%coris.org.uk@localhost>
Date: Thu, 12 Apr 2012 13:00:06 +0000 (UTC)

The following reply was made to PR port-sparc64/46260; it has been noted by 
GNATS.

From: Julian Coleman <jdc%coris.org.uk@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow
Date: Thu, 12 Apr 2012 13:57:48 +0100

 Hi,

 >  That's true, I added the printf()s only as a debugging aid to more
 >  easily see if there was anything else going on which might trigger
 >  the problem.  I agree that doing printf()s on input errors is not
 >  appropriate for production code.

 OK.  Sorry.

 >  Well, actually, no...  If GEM_MAC_RX_OVERFLOW is flagged, it appears
 >  that in my case a gem_reset_rxdma() is *not* sufficient to kick the
 >  receiver back to life.  Also, the GEM_MAC_STATE_OVERFLOW test in the
 >  gem_rx_watchdog() function in my case never kicks in, so that's why
 >  I added the extra code in the else clause, doing gem_init() there as
 >  well, doing a full unconditional reset in the watchdog function.
 >  Each and every time this has happened in my case, it's always been
 >  the code in the else clause in gem_rx_watchdog() which has kicked
 >  in.

 Right - our original code reset the whole chip when we saw GEM_MAC_RX_OVERFLOW
 and we didn't check for GEM_MAC_STATE_OVERFLOW.  So, I'd expect the behaviour
 to be different if we now check both.  And, from your previous message, it
 seems that we mainly end up resetting when GEM_MAC_STATE_OVERFLOW isn't set
 (with a spurious reset when the read pointer changed).

 >  Do we have any documentation anywhere which douments the bit fields
 >  in the GEM_MAC_MAC_STATE register?  In my case I always read back
 >  0x10400.

 It seems that the GEM document is available again.  See:

   ge.pdf  GEM (First Generation PCI Gigabit Ethernet) User's Manual

 from:

   http://sosc-dr.sun.com/processors/documentation.html

 but there doesn't seem to be any information on the bits in the MAC state
 machine register though.

 >  However, what worries me is the ease with which this problem can now
 >  be triggered.  It doesn't take particularly heavy network traffic to
 >  make it happen.  And, furthermore, this appears to be a regression
 >  compared to the release I was running earlier, 4.0.1.

 Yes.  This is worrying.  See the last paragraph of 2.6.1 "RxFIFO overflow"
 and also 2.3.2 "Frame Reception".  An increase in overflows implies that the
 RX FIFO is not emptying fast enough, which implies that we are not reading
 and emptying packets from the ring buffer quickly enough when an interrupt
 occurs.  Are you able to check earlier kernels (e.g. 5.0) to get a rough
 indication of when the increased resets problem started?  I'm now unsure if
 this aspect is a gem(4) problem, or something else.

 Thanks,

 J

 -- 
   My other computer also runs NetBSD    /        Sailing at Newbiggin
         http://www.netbsd.org/        /   http://www.newbigginsailingclub.org/

Prev by Date: Re: kern/46325: wapbl + disk io = temporary system freeze
Next by Date: PR/46286 CVS commit: [netbsd-6] src
Previous by Thread: Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow
Next by Thread: Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow
Indexes:

Home | Main Index | Thread Index | Old Index