Subject: Re: Panic in simple_lock_switchcheck
To: None <current-users@netbsd.org>
From: Sverre Froyen <sverre@viewmark.com>
List: current-users
Date: 05/25/2007 14:19:39
On Monday 21 May 2007, you wrote:
> Hi,
>
> It looks like a new locking issue has been introduced some time after 22
> March.   I use bogofilter to detect spam and with recent kernels I get
> panics when marking emails as spam or ham (running bogofilter -s and
> bogofilter -n). After reboot, the bogofilter database is always corrupted. 
> The database resides on an LFS file system.

I retrieved various versions of common and sys using cvs update with the -D 
option.   For each version, I build GENERIC_LAPTOP + LOCKDEBUG kernel 
(computer is i386, single processor).  I then copied a saved known bad 
version of the bogofilter database file to its default location, rebooted, 
and ran "bogofilter -n < <mail message file>".  The reboot was necessary to 
get consistent results. Userland is from 22 March.

Here's what I find.

Kernels before and including 2007-04-16 do not panic.

The 2007-04-17 kernel panics with a different message and causes major file 
system corruption (had to run fsck manually on the LFS partition). I discount 
this test since there were a flurry of LFS related commits that day.

The 2007-04-18 kernel paniced initially, then, after the manual fsck, did not 
panic until this morning when I had a vnlock/tstile deadlock.  Now, after the 
deadlock, it again panics consistently.

Kernels after and including 2007-04-19 panic consistently.

The panic messages (except for 2007-04-17) are:

switching with held simple_lock 0x... CPU 0 lfs_vnops.c: 1746
_prop_dictionary_keysym32_pool(...) at 0x...
Bad frame pointer: 0x...

Ideas anyone?  I can easily get more information since this is perfectly 
reproducible.

Thanks,

Sverre