tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: netbsd-5 NFS(?) lock up



On Mon, Mar 30, 2009 at 04:14:47PM +0200, Manuel Bouyer wrote:

> On Sun, Mar 29, 2009 at 09:49:58PM +0200, Manuel Bouyer wrote:
> > Hi,
> > trying to upgrade a x86 NFS server from netbsd-3 to netbsd-5 has been
> > a fiasco. The kernel looks up within seconds after going multiuser, even
> > with SMP disabled in the BIOS (the kernel indeed sees only one CPU).
> > LOCKDEBUG doesn't help, the kernel is just dead, all I can do is enter
> > ddb on console.
> > 
> > Here's what I've been able to collect so far (this is with hyperthreading
> > enabled in BIOS so kernel sees 2 CPUs). Hardware is a Intel X86 with 3Ghz
> > Xeon CPU (one of the first EM64T xeons I think), 1G RAM. Disk drives are
> > 2 wd(4) behind a piixide and 6 sd(4) behind two esiop(4), raid-1 raidframe 
> > on
> > all disks. raid-1 parity reconstruct is running when the lockup occurs;
> > and I suspect some NFS activity too (maybe several 100s of requests/s). 
> > There is also samba running, but this one should be almost idle.
> 
> One detail that may be relevant is that esiop(4) may call biodone2() for
> another request than the one being queued by the current thread through:
> sdstrategy() -> sdstart() -> scsipi_execute_xs() -> scsipi_run_queue() ->
> scsipi_adapter_request() -> esiop_scsipi_request() -> esiop_checkdone() ->
> esiop_scsicmd_end() -> scsipi_done() -> scsipi_complete() -> sddone() ->
> biodone() -> biodone2()
> 
> As we're not in hardware interrupt context biodone will call biodone2
> directly. esiop_checkdone() checks for commands already queued and
> completed but not yet handled by hardware interrupt (this is an optimisation)
> so we'll end up calling biodone() with a buffer that was not the one
> given to sdstrategy(), in the same thread context. Most drivers don't do
> this (maybe some advanced hardware raid drivers can do this too).
> 
> Maybe some upper level I/O subsystems are not prepared to deal with this.

I think it's unlikely. Note that biodone2() can block. Can it cause a
problem for scsipi or the driver?


Home | Main Index | Thread Index | Old Index