Subject: Re: More amd64 instability
To: Greg Oster <oster@cs.usask.ca>
From: Nicolas Joly <njoly@pasteur.fr>
List: current-users
Date: 11/20/2007 23:48:22
On Tue, Nov 20, 2007 at 04:19:26PM -0600, Greg Oster wrote:
> Paul Goyette writes:
> > On Tue, 20 Nov 2007, Juan RP wrote:
> > 
> > > I see that sysmon_envsys* files in your kernel are not the newest ones,
> > > could you please try updating? the locking in sme_events_check() was change
> > d
> > > recently.
> > 
> > Well, I did a 'cvs update' in /usr/src/sys then rebuilt my kernel.  It's 
> > worse than before!
> > 
> > Right after it successfully probes the azalia0, I get
> > 
> > uvm_fault(0xffffffff805e8c80, 0x0, 2) -> e
> > kernel: page fault trap, code=0
> > Stopped in pid 0.1 (system) at netbsd:softintr_schedule+0x60:  movq %r12, 0(%
> > rax)
> 
> here's what I just got:

Me too :-(

With an up to date DIAGNOSTIC+LOCKDEBUG kernel:

uvm_fault(0xffffffff80c027a0, 0x0, 2) -> e
kernel: page fault trap, code=0
Stopped in pid 0.1 (system) at  netbsd:softintr_schedule+0x60:  movq    %r12,0(%rax)
db{0}> Kernel lock error: _kernel_lock: spinout

lock address : 0xffffffff80c00bf0 type     :               spin
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  1
current cpu  :                  1 last held:                  0
current lwp  : 0xffff800047dc72e0 last held: 0xffffffff80b56460
last locked  : 0xffffffff804df199 unlocked : 0xffffffff804c979f
initialized  : 0xffffffff803ee649
curcpu holds :                  0 wanted by: 0xffff800047dc72e0

panic: LOCKDEBUG

db{0}> mach cpu 0
using CPU 0
db{0}> bt
softintr_schedule() at netbsd:softintr_schedule+0x60
Xresume_lapic_ltimer() at netbsd:Xresume_lapic_ltimer+0x21
--- interrupt ---
bus_space_read_stream_1() at netbsd:bus_space_read_stream_1+0xe
config_process_deferred() at netbsd:config_process_deferred+0x59
configure() at netbsd:configure+0x62
main() at netbsd:main+0x175
db{0}> mach cpu 1
using CPU 1
db{0}> bt  
breakpoint() at netbsd:breakpoint+0x1
lockdebug_abort1() at netbsd:lockdebug_abort1+0x7f
lockdebug_abort() at netbsd:lockdebug_abort+0xa5
_kernel_lock() at netbsd:_kernel_lock+0x12d
trap() at netbsd:trap+0x6f9
uvm_fault(0xffffffff80c027a0, 0x0, 1) -> e
kernel: page fault trap, code=0
Faulted in DDB; continuing...

-- 
Nicolas Joly

Biological Software and Databanks.
Institut Pasteur, Paris.