current-users: Re: Strange lockmgr panic

Subject: Re: Strange lockmgr panic
To: Martin Husemann <martin@duskware.de>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: current-users
Date: 07/23/2004 19:49:58

On Fri, Jul 23, 2004 at 07:40:23PM +0200, Martin Husemann wrote:
> > > uvm_fault(18388a0, 0, 1, 1, 1809510, 1875080) at netbsd:uvm_fault+0x70
> > > data_access_fault(e0017b70, 30, 108f004, 0, 50, 800809) at netbsd:data_access_fault+0x418
> > > ?(11d7400, 0, 331be458, 6, fffffffffffffffb, 1) at 0x100871c
> > 
> > I wonder what could be at 0x100871c
> 
> 0x100871c <Ldatafault_internal+196>:    call  0x11b997c <data_access_fault>
> 
> It's inside the assembler part of the trap handler.

OK

> 
> > __wdccommand_done() can do bus_space_read_1() operations. I guess this
> > can trigger a fault if the controller or drive fails, i don't know if
> > this could generate a data_access_fault on i386.
> 
> Sorry, forgot to mention: this is on sparc64.
> 
> So your theory is that the failing hardware causes an unexpected data
> access fault and this makes the kernel die? The puzzling thing is that
> I can reproducably boot netbsd.old in this situation and the kernel
> never fails.

How old is the kernel ? Maybe something changed in the wdc driver which cause
the registers to be read in a different way ...

But it may also be something completely different.
As this seems reproductible at boot, maybe you can try to single-step in ddb,
to see the exact place where it fails ?

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--