[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow
The following reply was made to PR port-sparc64/46260; it has been noted by
From: Julian Coleman <jdc%coris.org.uk@localhost>
Subject: Re: port-sparc64/46260: gem0 driver fails to recover after RX overflow
Date: Thu, 12 Apr 2012 13:57:48 +0100
> That's true, I added the printf()s only as a debugging aid to more
> easily see if there was anything else going on which might trigger
> the problem. I agree that doing printf()s on input errors is not
> appropriate for production code.
> Well, actually, no... If GEM_MAC_RX_OVERFLOW is flagged, it appears
> that in my case a gem_reset_rxdma() is *not* sufficient to kick the
> receiver back to life. Also, the GEM_MAC_STATE_OVERFLOW test in the
> gem_rx_watchdog() function in my case never kicks in, so that's why
> I added the extra code in the else clause, doing gem_init() there as
> well, doing a full unconditional reset in the watchdog function.
> Each and every time this has happened in my case, it's always been
> the code in the else clause in gem_rx_watchdog() which has kicked
Right - our original code reset the whole chip when we saw GEM_MAC_RX_OVERFLOW
and we didn't check for GEM_MAC_STATE_OVERFLOW. So, I'd expect the behaviour
to be different if we now check both. And, from your previous message, it
seems that we mainly end up resetting when GEM_MAC_STATE_OVERFLOW isn't set
(with a spurious reset when the read pointer changed).
> Do we have any documentation anywhere which douments the bit fields
> in the GEM_MAC_MAC_STATE register? In my case I always read back
It seems that the GEM document is available again. See:
ge.pdf GEM (First Generation PCI Gigabit Ethernet) User's Manual
but there doesn't seem to be any information on the bits in the MAC state
machine register though.
> However, what worries me is the ease with which this problem can now
> be triggered. It doesn't take particularly heavy network traffic to
> make it happen. And, furthermore, this appears to be a regression
> compared to the release I was running earlier, 4.0.1.
Yes. This is worrying. See the last paragraph of 2.6.1 "RxFIFO overflow"
and also 2.3.2 "Frame Reception". An increase in overflows implies that the
RX FIFO is not emptying fast enough, which implies that we are not reading
and emptying packets from the ring buffer quickly enough when an interrupt
occurs. Are you able to check earlier kernels (e.g. 5.0) to get a rough
indication of when the increased resets problem started? I'm now unsure if
this aspect is a gem(4) problem, or something else.
My other computer also runs NetBSD / Sailing at Newbiggin
http://www.netbsd.org/ / http://www.newbigginsailingclub.org/
Main Index |
Thread Index |