Subject: Re: kern/29670: lockmgr: locking against myself
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Ken Raeburn <raeburn@raeburn.org>
List: netbsd-bugs
Date: 06/01/2005 15:35:02
The following reply was made to PR kern/29670; it has been noted by GNATS.
From: Ken Raeburn <raeburn@raeburn.org>
To: gnats-bugs@netbsd.org
Cc: christos@netbsd.org, kern-bug-people@netbsd.org
Subject: Re: kern/29670: lockmgr: locking against myself
Date: Wed, 1 Jun 2005 11:33:36 -0400
[Second attempt to send this morning's update, I think my first got
eaten by my mail client.]
Today's update: The troublesome cron job has run again without
incident. I also rebooted the machine this morning, and while the
locked-vnode problems had often prevented the null mounts from getting
unmounted, hanging the shutdown sequence, there was no such problem
this time.
So my main difficulty now seems to be finding an email address which
actually gets this info recorded in the PR. :-)
Ken
On May 31, 2005, at 11:49, I wrote to gnats-bugs@gnats.netbsd.org:
> Looks like this didn't get into the PR.
>
> Update: The patch has survived another couple nights' cron jobs. The
> message I inserted keeps getting displayed, so it's hitting that new
> code path quite a bit. No processes are getting stuck in disk wait
> (presumably vnode lock wait), which I would see before after the
> problem was triggered. So, it looks good.
>
> At least, unless this code path isn't *supposed* to be triggered that
> often...
>
> Begin forwarded message:
>
>> From: Ken Raeburn <raeburn@raeburn.org>
>> Date: May 29, 2005 06:43:27 EDT
>> To: christos@netbsd.org
>> Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org
>> Subject: Re: kern/29670
>>
>> On Apr 24, 2005, at 00:41, christos@netbsd.org wrote:
>>> Synopsis: "release of unlocked lock" panic with null fs
>>>
>>> State-Changed-From-To: open->feedback
>>> State-Changed-By: christos@netbsd.org
>>> State-Changed-When: Sun, 24 Apr 2005 00:41:34 -0400
>>> State-Changed-Why:
>>> Can you please try this patch?
>>> It will look in the hold queue, if nothing in the free queue was
>>> successful.
>>
>> (BTW, the mail I received included no patch, but I pulled it out of
>> the problem report.)
>>
>> I finally got this installed Friday night, tweaked to log a message
>> when it switches over from the free list to the hold list. The daily
>> cron job (standard, plus daily.local includes
>> download-vulnerability-list and a propagation of my KDC database)
>> seems to trigger it:
>>
>> May 29 03:17:08 raeburn /netbsd: getcleanvnode: switching from free
>> list to hold list
>> May 29 03:17:39 raeburn last message repeated 5895 times
>>
>> ... but at least Friday and Saturday nights' test builds of Kerberos
>> (the cron job that frequently triggered the problem) didn't cause a
>> crash or hang, even if they do continue to trigger this code path:
>
> [the cron job starts with an "rm -rf" of the previous night's source
> and build trees in background while it downloads and unpacks a new
> source tarball, all in a null fs, so there's a flurry of file system
> activity right at the start]
>
>> May 29 05:21:04 raeburn /netbsd: getcleanvnode: switching from free
>> list to hold list
>> May 29 05:21:35 raeburn last message repeated 2872 times
>>
>> Though even if it does fix my problem (and it'd be a few more days
>> before I can really be confident of that, but it looks good so far),
>> unless it's now impossible to fail to allocate a vnode, there's still
>> another bug, that failure to allocate, at least when dealing with a
>> null file system, can leave a node locked.
>>
>> Ken
>
>