Subject: Re: 3.99.7 system crashed while shutting down
To: None <tech-kern@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: current-users
Date: 08/20/2005 17:01:17
--IS0zKkzwUGydFO0o
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

I looked into this, the problem is a race between VOP_REVOKE() and VOP_IOCTL().
there's another thread in the dump doing:

#0  0xca62b948 in ?? ()
#1  0xc02a151b in bpendtsleep ()
#2  0xc02cd007 in biowait ()
#3  0xc022eaf3 in ffs_update ()
#4  0xc02dd39c in VOP_UPDATE ()
#5  0xc025b7c3 in ufs_reclaim ()
#6  0xc0243b39 in ffs_reclaim ()
#7  0xc02dd08c in VOP_RECLAIM ()
#8  0xc02d2557 in vclean ()
#9  0xc02d2972 in vgonel ()
#10 0xc02de417 in genfs_revoke ()
#11 0xc02dcd80 in VOP_REVOKE ()
#12 0xc028d338 in exit1 ()
#13 0xc029c7d2 in postsig ()
#14 0xc0339ba0 in syscall_plain ()


vclean() frees the vp->v_specinfo data (and clears the pointer) before
calling VOP_RECLAIM(), which can sleep.  VOP_IOCTL() doesn't take any lock
before trying to dereference the (NULL, in this case) pointer.

this problem was introduced in this revision of kern/vfs_subr.c:

----------------------------
revision 1.231
date: 2004/08/13 22:48:06;  author: mycroft;  state: Exp;  lines: +59 -54
There is an annoying deadlock that goes like this:
* Process A is closing one file descriptor belonging to a device.  In doing so,
  ffs_update() is called and starts writing a block synchronously.  (Note: This
  leaves the vnode locked.  It also has other instances -- stdin, et al -- of
  the same device open, so v_usecount is definitely non-zero.)
* Process B does a revoke() on the device.  The revoke() has to wait for the
  vnode to be unlocked because ffs_update() is still in progress.
* Process C tries to open() the device.  It wedges in checkalias() repeatedly
  calling vget() because it returns EBUSY immediately.

To fix, this:
* checkalias() now uses LK_SLEEPFAIL rather than LK_NOWAIT.  Therefore it will
  wait for the vnode to become unlocked, but it will recheck that it is on the
  hash list, in case it was in the process of being revoke()d or was revoke()d
  again before we were woken up.
* Since we're relying on the vnode lock to tell us that the vnode hasn't been
  removed from the hash list *anyway*, I have moved the code to remove it into
  the DOCLOSE section of vclean(), inside the vnode lock.

In the example at hand, process A was sh(1), process B was a child of init(8),
and process C was syslogd(8).
----------------------------


prior to this revision, we would only free the specinfo after the vnode's
ops pointer had been changed (to dead_vnodeop_p, which wouldn't try to access
the specinfo).

I've attached a patch that should fix this particular problem, but I haven't
thought much about what other similar problems might exist.  can anyone
think of a better way to fix this?  if not, I'll go ahead with this way.

-Chuck


On Mon, Jul 11, 2005 at 06:07:21PM +0200, Jukka Salmi wrote:
> Hi,
> 
> a i386 system running -current (3.99.7) crashed after I issued a
> 'shutdown -h now'. Unfortunately I didn't have enough time to write down
> kernel output, but IIRC it crashed in ksh. After a 'tr /l' and a reboot
> the message buffer contained the following:
> 
> uvm_fault(0xcaa48d20, 0, 0, 1) -> 0xe
> spec_ioctl(cb82bce4,cb82bd0c,0,c0933000,c062d7a0) at netbsd:spec_ioctl+0xe
> VOP_IOCTL(cb4e0a84,802c7415,cb82bea4,3,ffffffff) at netbsd:VOP_IOCTL+0x40
> cttyioctl(100,802c7415,cb82bea4,3,cb7fc1a4) at netbsd:cttyioctl+0x52
> spec_ioctl(cb82bd84,1e3ae000,cb82bdfc,c044cb32,c062d7a0) at netbsd:spec_ioctl+0x46
> VOP_IOCTL(cb4ef89c,802c7415,cb82bea4,3,cb7f6100) at netbsd:VOP_IOCTL+0x40
> vn_ioctl(cb7f5c0c,802c7415,cb82bea4,cb7fc1a4,0) at netbsd:vn_ioctl+0x9a
> sys_ioctl(cb840088,cb82bf64,cb82bf5c,0,c1338948) at netbsd:sys_ioctl+0x122
> syscall_plain() at netbsd:syscall_plain+0x7e
> --- syscall (number 54) ---
> 0xbdbc7647:
> syncing disks... 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 
> Just in case this is useful to someone... However, I couldn't reproduce
> the crash so far.
> 
> 
> Regards, Jukka
> 
> -- 
> bashian roulette:
> $ ((RANDOM%6)) || rm -rf ~

--IS0zKkzwUGydFO0o
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="diff.spec_ioctl"

Index: src/sys/miscfs/specfs/spec_vnops.c
===================================================================
RCS file: /home/chs/netbsd/cvs/src/sys/miscfs/specfs/spec_vnops.c,v
retrieving revision 1.80
diff -u -p -r1.80 spec_vnops.c
--- src/sys/miscfs/specfs/spec_vnops.c	26 Feb 2005 22:59:00 -0000	1.80
+++ src/sys/miscfs/specfs/spec_vnops.c	20 Aug 2005 23:57:13 -0000
@@ -460,9 +460,20 @@ spec_ioctl(v)
 	} */ *ap = v;
 	const struct bdevsw *bdev;
 	const struct cdevsw *cdev;
-	dev_t dev = ap->a_vp->v_rdev;
+	struct vnode *vp;
+	dev_t dev;
 
-	switch (ap->a_vp->v_type) {
+	/*
+	 * Extract the dev_t from the vnode.  vgonel() may have already
+	 * freed the specinfo, so check the pointer before traversing it.
+	 */
+
+	vp = ap->a_vp;
+	simple_lock(&vp->v_interlock);
+	dev = (vp->v_flag & VXLOCK) == 0 && vp->v_specinfo ? vp->v_rdev : NODEV;
+	simple_unlock(&vp->v_interlock);
+
+	switch (vp->v_type) {
 
 	case VCHR:
 		cdev = cdevsw_lookup(dev);

--IS0zKkzwUGydFO0o--