Subject: Re: nfsd: locking botch in op %d
To: Frank van der Linden <fvdl@wasabisystems.com>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 03/08/2001 12:06:33
>> The NFS server on my house LAN's NFS subnet fell over with "nfsd:
>> locking botch in op 3".

> Yes, this has been seen before.  The case that was reported before
> was a netbsd-1-5 branch kernel as a server, and a Linux client,
> running 'du -a'.  It also crashed when doing a lookup for a device
> node ("sd0a" in your case), curiously enough, so there may be a
> problem there.

I pulled in just the change to make it call printlockedvnodes() when
this happened (and just print, not panic, despite the comment).  I
built that kernel overnight, and today I find....

Locked vnodes
tag 1 type VDIR, usecount 1, writecount 0, refcount 1,
	tag VT_UFS, ino 11726, on dev 7, 0 flags 0x0, effnlink 2, nlink 2
	mode 040755, owner 101, group 0, size 512 lock type vnlock: EXCL (count 1) by pid 246
nfsd: locking botch in op 3 (before 0, after 1)

This is very interesting on three counts:

(1) dev 7,0 is not NFS-exported; that's the root filesystem (which is
    on the server's sd0a; I note the lookup which fell over was for
    sd0a in the client's filesystem, which has the same major/minor
    numbers, and that makes me wonder if this may have something to do
    with checkalias()).

(2) inode 11726 on 7,0 has nothing whatever to do with NFS; it's
    /home/mouse/.prompt/, a directory that's nowhere near anything an
    NFS client could be touching (only /nfs is exported).

(3) pid 246 is the shell in one of my windows; again, nothing whatever
    to do with anything an NFS client could be going near.

How p->p_locks could be 1 for the server process when the only locked
vnode is locked by a completely unrelated process is a mystery to me.
Perhaps it's holding something other than a vnode locked?

> Unfortunately, tracking this down basically means either reading
> through a lot of code, or changing every vnode lock call into a debug
> statement, saving the current line and file, as well as maintaining a
> linked list of locked vnodes for each process.

If the implication above that it's not a vnode that the server process
has locked is correct, changing vnode lock calls wouldn't help much.

> If you could look into this one as well, that'd be great.

I'll be looking into it further and will report anything I find.  As it
stands, it means I can't boot one of my diskless clients, which gives
me an incentive. :-)

> Are you using softdeps on the server, btw?

No.  I've never even tried to use softdeps, anywhere.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B