Re: netbsd-5 NFS(?) lock up

To: tech-kern%NetBSD.org@localhost
Subject: Re: netbsd-5 NFS(?) lock up
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Mon, 30 Mar 2009 16:14:47 +0200

On Sun, Mar 29, 2009 at 09:49:58PM +0200, Manuel Bouyer wrote:
> Hi,
> trying to upgrade a x86 NFS server from netbsd-3 to netbsd-5 has been
> a fiasco. The kernel looks up within seconds after going multiuser, even
> with SMP disabled in the BIOS (the kernel indeed sees only one CPU).
> LOCKDEBUG doesn't help, the kernel is just dead, all I can do is enter
> ddb on console.
> 
> Here's what I've been able to collect so far (this is with hyperthreading
> enabled in BIOS so kernel sees 2 CPUs). Hardware is a Intel X86 with 3Ghz
> Xeon CPU (one of the first EM64T xeons I think), 1G RAM. Disk drives are
> 2 wd(4) behind a piixide and 6 sd(4) behind two esiop(4), raid-1 raidframe on
> all disks. raid-1 parity reconstruct is running when the lockup occurs;
> and I suspect some NFS activity too (maybe several 100s of requests/s). 
> There is also samba running, but this one should be almost idle.

One detail that may be relevant is that esiop(4) may call biodone2() for
another request than the one being queued by the current thread through:
sdstrategy() -> sdstart() -> scsipi_execute_xs() -> scsipi_run_queue() ->
scsipi_adapter_request() -> esiop_scsipi_request() -> esiop_checkdone() ->
esiop_scsicmd_end() -> scsipi_done() -> scsipi_complete() -> sddone() ->
biodone() -> biodone2()

As we're not in hardware interrupt context biodone will call biodone2
directly. esiop_checkdone() checks for commands already queued and
completed but not yet handled by hardware interrupt (this is an optimisation)
so we'll end up calling biodone() with a buffer that was not the one
given to sdstrategy(), in the same thread context. Most drivers don't do
this (maybe some advanced hardware raid drivers can do this too).

Maybe some upper level I/O subsystems are not prepared to deal with this.

-- 
Manuel Bouyer, LIP6, Universite Paris VI.           
Manuel.Bouyer%lip6.fr@localhost
     NetBSD: 26 ans d'experience feront toujours la difference
--

Follow-Ups:
- Re: netbsd-5 NFS(?) lock up
  - From: Andrew Doran

References:
- netbsd-5 NFS(?) lock up
  - From: Manuel Bouyer

Prev by Date: Re: raid-on-raid during shutdown
Next by Date: Re: netbsd-5 NFS(?) lock up
Previous by Thread: netbsd-5 NFS(?) lock up
Next by Thread: Re: netbsd-5 NFS(?) lock up
Indexes:

Home | Main Index | Thread Index | Old Index