Subject: kernel panics: lock error (mutex)
To: None <current-users@netbsd.org>
From: Arto Selonen <arto@selonen.org>
List: current-users
Date: 12/13/2007 14:34:10
Hi!

Recently our NetBSD-current system (i386) has become fairly unstable (from 
uptimes of months, interrupted by current upgrades, down to kernel panics 
every few days). I'm still tracking down the upgrade after which the 
panics appeared/increased, as I'm not sure that this is merely something 
introduced recently, but rather feel that some recent change has started 
to trigger this more frequently. Specifics pending...

Anyway, for roughly the past month or so, the system has crashed with 
kernel panics, appearing as fast as one day after a reboot, up to running 
about a week or so. Most of the panics have been mutex-related, but I 
haven't written down details as I've expected it to be some transient 
problem with current, that goes away in a future upgrade, and so have 
simply grabbed the latest sources, upgraded and tried again. Here is the 
latest panic report, copied from screen (no serial console), that 
appeared while building yet another upgrade from a couple of days ago:
(this one was on a 3.99.39)

Mutex error: mutex_spin_retry: locking against myself

lock address: .....
current cpu :       0
current lwp : .....
owner field : ..... wait/spin  0/1

panic: lock error
Stopped in pid 272.1 (squid)

And here is just the function name trace, in case it might be enough to 
give you ideas what could be causing this:

db> tr
breakpoint
lockdebug_abort
mutex_abort
mutex_owner
cv_timedwait_sig
pollcommon
sys_poll
syscall

db> reboot 0x104

So, I have a crash dump available. Should I file a PR and continue from 
there with more data (like kernel config, etc), or is there something 
simple that I might have missed, regarding updating kernel config, or 
something that a normal simple current upgrade would miss (cvs 
update, build tools, build kernel, build world, boot kernel, install 
world) ?

Any suggestions for disabling/enabling specific debugging or similar 
kernel config options for tracking this down? (Already should have most 
enabled from previous problems in the past couple of years).

Would like to see the system stability increased before the longish 
holiday season. ;-) (It's a firewall/gateway/web proxy).

All comments welcome. :)


Artsi
-- 
#######======------  http://www.selonen.org/arto/  --------========########
Everstinkuja 5 B 35                               Don't mind doing it.
FI-02600 Espoo         arto@selonen.org         Don't mind not doing it.
Finland              tel +358 50 560 4826     Don't know anything about it.