Subject: Re: Strange lockmgr panic
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Martin Husemann <martin@duskware.de>
List: current-users
Date: 07/23/2004 19:40:23
On Fri, Jul 23, 2004 at 07:33:33PM +0200, Manuel Bouyer wrote:
> > lockmgr(18388a8, 0, 0, 3, a, 1860400) at netbsd:lockmgr+0x1ec
> > uvmfault_lookup(e0017970, 0, cc0db60, 6, fffffffffffffffc, 0) at netbsd:uvmfault_lookup+0x1b4
> > uvm_fault(18388a0, 0, 1, 1, 1809510, 1875080) at netbsd:uvm_fault+0x70
> > data_access_fault(e0017b70, 30, 108f004, 0, 50, 800809) at netbsd:data_access_fault+0x418
> > ?(11d7400, 0, 331be458, 6, fffffffffffffffb, 1) at 0x100871c
> 
> I wonder what could be at 0x100871c

0x100871c <Ldatafault_internal+196>:    call  0x11b997c <data_access_fault>

It's inside the assembler part of the trap handler.

> __wdccommand_done() can do bus_space_read_1() operations. I guess this
> can trigger a fault if the controller or drive fails, i don't know if
> this could generate a data_access_fault on i386.

Sorry, forgot to mention: this is on sparc64.

So your theory is that the failing hardware causes an unexpected data
access fault and this makes the kernel die? The puzzling thing is that
I can reproducably boot netbsd.old in this situation and the kernel
never fails.

Martin