Subject: Re: nfsd: locking botch in op %d
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Frank van der Linden <fvdl@wasabisystems.com>
List: tech-kern
Date: 03/08/2001 14:43:07
On Thu, Mar 08, 2001 at 06:40:16AM -0500, der Mouse wrote:
> The NFS server on my house LAN's NFS subnet fell over with "nfsd:
> locking botch in op 3".  Investigating, I find this comes from
> nfs_syscalls.c, where there's a recommendation to audit the relevant
> entry in nfsrv3_procs[], which in this case is nfsrv_lookup.

[...]

Yes, this has been seen before. The case that was reported before
was a netbsd-1-5 branch kernel as a server, and a Linux client,
running 'du -a'. It also crashed when doing a lookup for a device
node ("sd0a" in your case), curiously enough, so there may be a problem
there.

Data collected from the other report showed that the locking problem
was not inside the NFS server code (nfs_namei() in nfs_subs.c) itself.
There was a lock mismatch already when lookup() called from there
returned. So there must be a deeper problem somewhere, possibly
related to looking up a device node.

Unfortunately, tracking this down basically means either reading
through a lot of code, or changing every vnode lock call into
a debug statement, saving the current line and file, as well
as maintaining a linked list of locked vnodes for each process.

If you could look into this one as well, that'd be great. Are
you using softdeps on the server, btw?

- Frank