Subject: kern/7116: fault in nqsrv_send_eviction on shutdown
To: None <gnats-bugs@gnats.netbsd.org>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: netbsd-bugs
Date: 03/09/1999 15:50:43
>Number:         7116
>Category:       kern
>Synopsis:       fault in nqsrv_send_eviction on shutdown
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Mar  9 08:05:01 1999
>Last-Modified:
>Originator:     Bill Sommerfeld
>Organization:
	none
>Release:        19990307
>Environment:
	
System: NetBSD orchard.arlington.ma.us 1.3K NetBSD 1.3K (ORCHARDII) #22: Sun Mar 7 02:56:43 EST 1999 sommerfeld@orchard.arlington.ma.us:/usr/src/sys/arch/i386/compile/ORCHARDII i386


>Description:

I'm using several clients mounting with the nqnfs extensions (option
`-q' from the fstab)

Recently on a reboot, the server faulted into DDB with a
zero-pointer dereference at nqsrv_send_eviction+0x1e3
(the faulting instruction was "movswl (%edi),%edi"; %edi contained 0).

	gdb says this is line 469 of nfs/nfs_nqlease.c:
			sotype = so->so_type;

I'm willing to bet that there's a race involved in nfsd shutdown and
lease cleanup.

(The process which faulted was a shell.  I suspect it was trying to
write out a .history file on shutdown).

>How-To-Repeat:
	have client mount server with -q option.
	access some files from both server and client.
	reboot server, get unlucky.

>Fix:
	Unknown.

	Not knowing anything about the nqnfs cache coherance
	protocol, I'm not sure whether the right thing here is to gracefully
	handle the null socket pointer (and not send the eviction), or whether
	the leases have to be cleaned up as part of a clean shutdown
	of nfsd before the socket can be freed.
>Audit-Trail:
>Unformatted: