Subject: Re: nfsd: locking botch in op %d
To: Frank van der Linden <firstname.lastname@example.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
Date: 03/08/2001 12:06:33
>> The NFS server on my house LAN's NFS subnet fell over with "nfsd:
>> locking botch in op 3".
> Yes, this has been seen before. The case that was reported before
> was a netbsd-1-5 branch kernel as a server, and a Linux client,
> running 'du -a'. It also crashed when doing a lookup for a device
> node ("sd0a" in your case), curiously enough, so there may be a
> problem there.
I pulled in just the change to make it call printlockedvnodes() when
this happened (and just print, not panic, despite the comment). I
built that kernel overnight, and today I find....
tag 1 type VDIR, usecount 1, writecount 0, refcount 1,
tag VT_UFS, ino 11726, on dev 7, 0 flags 0x0, effnlink 2, nlink 2
mode 040755, owner 101, group 0, size 512 lock type vnlock: EXCL (count 1) by pid 246
nfsd: locking botch in op 3 (before 0, after 1)
This is very interesting on three counts:
(1) dev 7,0 is not NFS-exported; that's the root filesystem (which is
on the server's sd0a; I note the lookup which fell over was for
sd0a in the client's filesystem, which has the same major/minor
numbers, and that makes me wonder if this may have something to do
(2) inode 11726 on 7,0 has nothing whatever to do with NFS; it's
/home/mouse/.prompt/, a directory that's nowhere near anything an
NFS client could be touching (only /nfs is exported).
(3) pid 246 is the shell in one of my windows; again, nothing whatever
to do with anything an NFS client could be going near.
How p->p_locks could be 1 for the server process when the only locked
vnode is locked by a completely unrelated process is a mystery to me.
Perhaps it's holding something other than a vnode locked?
> Unfortunately, tracking this down basically means either reading
> through a lot of code, or changing every vnode lock call into a debug
> statement, saving the current line and file, as well as maintaining a
> linked list of locked vnodes for each process.
If the implication above that it's not a vnode that the server process
has locked is correct, changing vnode lock calls wouldn't help much.
> If you could look into this one as well, that'd be great.
I'll be looking into it further and will report anything I find. As it
stands, it means I can't boot one of my diskless clients, which gives
me an incentive. :-)
> Are you using softdeps on the server, btw?
No. I've never even tried to use softdeps, anywhere.
7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B