Port-macppc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

lockups on 6.0.2 - progress?



I have been chasing lockups of NetBSD 6.0.1, and recently tried 6.0.2, and
have found that it locks up, too.  My problem is that this is intermittent,
so the first task is to find a failing test case.

I have a second machine set up that has hung up 3 times, twice with 6.0.2, and
once with 6.0.1.  The interesting difference is this i the log:

May 29 13:00:00 charm syslogd[151]: restart
May 29 21:52:13 charm /netbsd: arp info overwritten for 71.39.101.62 by 
20:76:00:10:7f:14
May 30 14:44:08 charm /netbsd: gem0: receive error: RX overflow sc->rxptr 75, 
complete 82
May 30 14:44:12 charm /netbsd: gem0: rx_watchdog: not in overflow state: 
0x810400
May 30 14:44:12 charm /netbsd: gem0: rx_watchdog: wr pointer != saved
May 30 14:44:12 charm /netbsd: gem0: rx_watchdog: rd pointer != saved
May 30 14:44:12 charm /netbsd: gem0: resetting anyway
May 30 15:01:45 charm /netbsd: gem0: receive error: RX overflow sc->rxptr 20, 
complete 30
May 30 15:01:49 charm /netbsd: gem0: rx_watchdog: not in overflow state: 
0x810400
May 30 15:01:49 charm /netbsd: gem0: rx_watchdog: wr pointer != saved
May 30 15:01:49 charm /netbsd: gem0: rx_watchdog: rd pointer != saved
May 30 15:01:49 charm /netbsd: gem0: resetting anyway
May 30 18:15:30 charm /netbsd: gem0: receive error: RX overflow sc->rxptr 58, 
complete 70
May 30 18:15:34 charm /netbsd: gem0: rx_watchdog: not in overflow state: 
0x810400
May 30 18:15:34 charm /netbsd: gem0: rx_watchdog: wr pointer != saved
May 30 18:15:34 charm /netbsd: gem0: rx_watchdog: rd pointer != saved
May 30 18:15:34 charm /netbsd: gem0: resetting anyway
May 31 20:51:35 charm syslogd[151]: restart


I take this as a clue, and I am going to put in a PCI ethernet card, (SMC)
and see if that behaves differently.

Note that this message the "watchdog" thing with the reset is new in 6.0.2,
so I'm guessing that someone changed the gem driver - just a guess....

I'll report back.

It takes a day or two or three for the failure to occur.  I originally
thought it was a failure that happened under heavy disk load, but it
turns out that at least with the last couple of failures, it happens
on an almost idle machine.  The only "load" I have on it is a script that
does two wget's in a loop.  One wget is of a small index file, and the other is
of a 1 Meg file.  It does the wgets as fast as it can.  It seems to cause the
problem in a couple of days.

I have now swapped in the SMC ethernet card.  Let's see if it still fails.
If not, then I have a workaround, and we have a possible driver bug to
fix.

-dgl-



Home | Main Index | Thread Index | Old Index