Re: frequent 5.0_RC2 lockups cvs co / upd'ing to RF volume on SunFire v100

On Thu, Mar 19, 2009 at 04:34:45PM -0400, Rafal Boni wrote:
> On Thu, Mar 19, 2009 at 08:39:17PM +0100, Manuel Bouyer wrote:
> > > Some interesting tidbits that might be related:
> > >   * In at least 2 or 3 of the hangs, RF parity rebuild was likely
> > >     running.
> > Hum, remind me of what I did see on the old
> > any significant I/O with a SMP kernel while parity rebuild was running
> > would cause a hang. Using a UP kernel for parity rebuilds and switching
> > back to MP after a clean reboot would make the system work.
> > This was on i386 though, and with SCSI disks. I've never been able to
> > reproduce it on another system.
> > Anyway please make sure you have sys/dev/raidframe/rf_netbsdkintf.c 
> > it may help ...
> The box I used to build the last kernel is currently off, but it was a
> clean checkout of netbsd-5 as of ~ 200903182200Z, which should have had
> Greg's changes.  That was my first thought since the system had been
> running a netbsd-5 kernel from mid-January, hence I grabbed the latest
> sources and installed a new kernel... the problem still happened.

Ok, I've verified that that change is in my tree.

> I guess this is additional incentive to polish off those aceride(4)
> changes to keep this machine from spewing all the IDE errors in the
> first place... Will have to dig them out of my old -current tree
> and polish them off a bit ;)

And another test run showed that the machine does indeed go catatonic
as soon as / very shortly after the aceride(4) interface downgrades
to PIO 4 (ie, the first downgrade to a non-DMA access mode).  I guess
I have even more ammo to work on those aceride(4) changes.  The good
news is this seems to be just about 100% reproducible on my v100 (good
for tracking it down, anyway; not so good for that machine's usage).

Just another couple of data points,

