[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: netbsd-5 NFS(?) lock up
On Sun, Mar 29, 2009 at 09:49:58PM +0200, Manuel Bouyer wrote:
> trying to upgrade a x86 NFS server from netbsd-3 to netbsd-5 has been
> a fiasco. The kernel looks up within seconds after going multiuser, even
> with SMP disabled in the BIOS (the kernel indeed sees only one CPU).
> LOCKDEBUG doesn't help, the kernel is just dead, all I can do is enter
> ddb on console.
> Here's what I've been able to collect so far (this is with hyperthreading
> enabled in BIOS so kernel sees 2 CPUs). Hardware is a Intel X86 with 3Ghz
> Xeon CPU (one of the first EM64T xeons I think), 1G RAM. Disk drives are
> 2 wd(4) behind a piixide and 6 sd(4) behind two esiop(4), raid-1 raidframe on
> all disks. raid-1 parity reconstruct is running when the lockup occurs;
> and I suspect some NFS activity too (maybe several 100s of requests/s).
> There is also samba running, but this one should be almost idle.
One detail that may be relevant is that esiop(4) may call biodone2() for
another request than the one being queued by the current thread through:
sdstrategy() -> sdstart() -> scsipi_execute_xs() -> scsipi_run_queue() ->
scsipi_adapter_request() -> esiop_scsipi_request() -> esiop_checkdone() ->
esiop_scsicmd_end() -> scsipi_done() -> scsipi_complete() -> sddone() ->
biodone() -> biodone2()
As we're not in hardware interrupt context biodone will call biodone2
directly. esiop_checkdone() checks for commands already queued and
completed but not yet handled by hardware interrupt (this is an optimisation)
so we'll end up calling biodone() with a buffer that was not the one
given to sdstrategy(), in the same thread context. Most drivers don't do
this (maybe some advanced hardware raid drivers can do this too).
Maybe some upper level I/O subsystems are not prepared to deal with this.
Manuel Bouyer, LIP6, Universite Paris VI.
NetBSD: 26 ans d'experience feront toujours la difference
Main Index |
Thread Index |