Subject: Re: nfs locking panic
To: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: current-users
Date: 11/17/1999 09:09:03
> I am trying to netboot an SE/30 from a Sun IPX. Frequently, the SE/30 only
> gets to "Building databases..." and then the server panics:
> 
> login: panic: nfsd: locking botch in op 3
> Stopped in nfsd at      Debugger+0x4:   jmpl            [%o7 + 0x8], %g0
> db> t
> nfssvc_nfsd(0x0, 0x2, 0xf0387c78, 0xf0190528, 0xf019c040, 0xf9a0edc0) at nfssvc
> _nfsd+0x6a4
> sys_nfssvc(0x0, 0xf9a0ef28, 0xf9a0ef20, 0xf00b1474, 0xeffffa50, 0xf9a0efb0)
> at s
> ys_nfssvc+0x5b8
> syscall(0x9b, 0xf9a0efb0, 0x0, 0x1, 0x0, 0xf9a0efb0) at syscall+0x1fc
> _syscall(0x4, 0x21b10, 0x18, 0x10c60, 0x217b0, 0x10108) at _syscall+0x120
> db>
> 
> Userland is from an 8/99 binary snapshot, kernel is "NetBSD 1.4K (SPARKLE)
> #6: Fri Oct  1 23:29:06 CEST 1999".

This is a panic from a diagnostic check I added recently.

As the comment above the panic says:
			/*
			 * NFS server procs should neither release
			 * locks already held, nor leave things
			 * locked.  Catch this sooner, rather than
			 * later (when we try to relock something we
			 * already have locked).  Careful inspection
			 * of the failing routine usually turns up the
			 * lock leak.. once we know what it is..
			 */

and later..

				/*
				 * If you see this panic, audit
				 * nfsrv3_procs[nd->nd_procnum] for vnode
				 * locking errors (usually, it's due to
				 * forgetting to vput() something).
				 */

op #3 is NFSPROC_LOOKUP

I added this check to -current after we spent a fair amount of time
tearing our hair out looking for problems of this form (one or another
NFS ops leaving vnodes locked).

knowing the value of p->p_locks and lockcount at this point will be
useful (just to verify that something was left locked, as opposed to
something extra being unlocked); also, in a DEBUG kernel, try calling
printlockedvnodes() at this point..

					- Bill