Subject: Re: kern/35728: repeated kernel panics: free: duplicated free (NFS-related)
To: None <gnats-bugs@NetBSD.org>
From: Antti Kantee <pooka@cs.hut.fi>
List: netbsd-bugs
Date: 02/20/2007 19:06:14
On Tue Feb 20 2007 at 15:40:03 +0000, Arto Selonen wrote:
>  With both suggested patches by Antti Kantee, the problem went away.
>  Unfortunately, I can't confirm that it was because of the patches,
>  as an unpatched kernel now stays up, too.
>  
>  So, either the offending NFS client went away, or the patch fixed the 
>  problematic state causing the panics. I've left an unpatched kernel 
>  running, in case I can reproduce the error state, and at least 
>  statistically show that the patched kernel fixes the problem whenever it 
>  occurs.

I think the likely case is that some client was retrying until it
managed to get past the point causing the crash.  You mentioned that
it took several weeks to get to the point of the server starting to
crash, so it might be a while until you can repeat this this.  If I can
correctly decypher the readdir code, it looks like this could happen if
directory entries were being removed from the server in the middle of
an nfs readdir operation.

>  That is, until you guys have analytically found the problem in the code
>  and fixed it there. :) Again, thanks for looking into this with such short 
>  notice!

Both were potential problem-causers and I've committed both fixes.

>  I'll let you know, if the situation changes, if this PR is still open at 
>  that time. If you think you've fixed it properly in the tree and close 
>  this PR, I'd be happy to just update my sources and continue tracking 
>  -current and file another PR (or request this opened or whatever) if 
>  needed.

I'll close the PR now.  Just drop me an email or open a new PR if the
problem resurfaces.

-- 
Antti Kantee <pooka@iki.fi>                     Of course he runs NetBSD
http://www.iki.fi/pooka/                          http://www.NetBSD.org/
    "la qualité la plus indispensable du cuisinier est l'exactitude"