Subject: Re: -current amd64 panic, _kernel_unlock: assertion failed: olocks == 1
To: Nicolas Joly <njoly@pasteur.fr>
From: Andrew Doran <ad@netbsd.org>
List: current-users
Date: 02/20/2007 16:38:05
Hi,

On Tue, Feb 20, 2007 at 03:39:30PM +0100, Nicolas Joly wrote:

> Since a few days, i'm experiencing kernel hard lockup. The problem
> arise when the Symantec (was Veritas) NetBackup server tries to backup
> my up-to-date -current NetBSD/amd64 workstation using linux 32-bits
> binaries (which worked perfectly during the last 5 monthes).
> 
> When stuck, the machine does not respond to anything ... I can't even
> access the kernel debugger using the special key sequence. This is for
> GENERIC kernel +MULTIPROCESSOR +DIAGNOSTIC + LOCKDEBUG.
> 
> This morning, i made some experiments with all the options to isolate
> the problem and got the most useful results while removing the
> DIAGNOSTIC option.
> 
> Kernel lock error: _kernel_unlock: assertion failed: olocks == 1
> 
> lock address : 0xffffffff80ce4ea0 type     :               spin
> shared holds :                  0 exclusive:                  1
> shares wanted:                  0 exclusive:                183
> current cpu  :                  1 last held:                  1
> current lwp  : 0xffff80004c93a900 last held: 0xffff80004c93a900
> last locked  : 0xffffffff807d6801 unlocked : 0xffffffff807d682d
> curcpu holds :                  2 wanted by: 000000000000000000
> 
> panic: LOCKDEBUG
> Stopped in pid 360.1 (bpcd) at  netbsd:breakpoint+0x5:  leave
> db{1}> mach cpu 0
> using CPU 0
> db{1}> bt
> splclock() at netbsd:splclock
> _kernel_lock() at netbsd:_kernel_lock+0x1a3
> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x23
> Xintr_ioapic_level11() at netbsd:Xintr_ioapic_level11+0xdb
> --- interrupt ---
> Xspllower() at netbsd:Xspllower+0xe
> x86_softintlock() at netbsd:x86_softintlock+0x13
> DDB lost frame for netbsd:Xsoftclock+0x1a, trying 0xffff80004b4eff20
> Xsoftclock() at netbsd:Xsoftclock+0x1a
> --- interrupt ---
> 0x246:
> db{1}> mach cpu 1
> using CPU 1
> db{1}>  bt
> breakpoint() at netbsd:breakpoint+0x5
> cpu_Debugger() at netbsd:cpu_Debugger+0x9
> panic() at netbsd:panic+0x1bd
> lockdebug_lock_print() at netbsd:lockdebug_lock_print
> lockdebug_abort() at netbsd:lockdebug_abort+0x47
> _kernel_unlock() at netbsd:_kernel_unlock+0x126
> trap() at netbsd:trap+0x9f6
> --- trap (number -2134027712) ---
> 0xffff:

I think this one should be fixed now, sorry. I also fixed an issue with
LOCKDEBUG kernels, where lots of file system activity would eventually
provoke a panic.

Cheers,
Andrew