NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/53096: netbsd-8 crash on heavy disk I/O



The following reply was made to PR kern/53096; it has been noted by GNATS.

From: Roy Bixler <rcbixler%nyx.net@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/53096: netbsd-8 crash on heavy disk I/O
Date: Mon, 19 Mar 2018 06:08:21 -0600

 On Sun, Mar 18, 2018 at 04:50:01PM +0000, J. Hannken-Illjes wrote:
 > The following reply was made to PR kern/53096; it has been noted by GNATS.
 > 
 > From: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>
 > To: gnats-bugs%NetBSD.org@localhost
 > Cc: 
 > Subject: Re: kern/53096: netbsd-8 crash on heavy disk I/O
 > Date: Sun, 18 Mar 2018 17:45:41 +0100
 > 
 >  The backtrace is a bit misleading, it really is:
 >  
 >  sys_chdir() -> vrele() -> vrelel() -> vstate_assert_change() -> vnpanic()
 >  
 >  This matches the panic from dmesg:
 >  
 >  ...
 >  cpu 0: ucode 0x1a->0x29
 >  cpu 1: ucode 0x1a->0x29
 >  cpu 2: ucode 0x1a->0x29
 >  cpu 3: ucode 0x1a->0x29
 >  vnode 0xfffffe82137bde70 flags 0x30<MPSAFE,LOCKSWORK>
 >    tag VT_UFS(1) type VDIR(2) mount 0xfffffe823dbb2008 typedata 0x0
 >    usecount 1 writecount 0 holdcount 1
 >    size 200 writesize 200 numoutput 0
 >    data 0xfffffe8213cce900 lock 0xfffffe82137bdfa0
 >    state BLOCKED key(0xfffffe823dbb2008 8) b1 c8 3a 00 00 00 00 00
 >    lrulisthd 0xffffffff814c6400
 >    tag VT_UFS, ino 3852465, on dev 0, 0 flags 0x0, nlink 3
 >    mode 040755, owner 1001, group 0, size 512
 >  panic: BLOCKED to LOADED with usecount 2 at vrelel:783
 >  
 >  Here vrelel() is:
 >  
 >  767  VSTATE_CHANGE(vp, VS_LOADED, VS_BLOCKED);
 >  768  mutex_exit(vp->v_interlock);
 >  ...
 >  778  recycle = false;
 >  779  VOP_INACTIVE(vp, &recycle);
 >  780  if (!recycle)
 >  781          VOP_UNLOCK(vp);
 >  782  mutex_enter(vp->v_interlock);
 >  783  VSTATE_CHANGE(vp, VS_BLOCKED, VS_LOADED);
 >  
 >  and VSTATE_CHANGE() expands to vstate_assert_change(), which is:
 >  
 >  315  KASSERTMSG(mutex_owned(vp->v_interlock), "at %s:%d", func, line);
 >  
 >  328  if ((from == VS_BLOCKED || to == VS_BLOCKED) && vp->v_usecount != 1)
 >  329          vnpanic(vp, "%s to %s with usecount %d at %s:%d",
 >  
 >  So the usecount of a blocked vnode with interlock held changed from 1,
 >  it is "2" on the call to vnpanic() and "1" when vnpanic prints
 >  the vnode.
 >  
 >  As vcache_vget() and vcache_tryvget() either error out or wait if the current
 >  state is BLOCKED it could be a vref() without a prior reference.
 >  
 >  Please try the attached patch to see if one of these assertions fire.
 >  
 >  diff -r 13173af16202 -r 0a76936d2ed0 sys/kern/vfs_vnode.c
 >  --- sys/kern/vfs_vnode.c
 >  +++ sys/kern/vfs_vnode.c
 >  @@ -670,11 +670,22 @@ static inline bool
 >   vtryrele(vnode_t *vp)
 >   {
 >   	u_int use, next;
 >  +	vnode_impl_t *vip = VNODE_TO_VIMPL(vp);
 >   
 >   	for (use = vp->v_usecount;; use = next) {
 >   		if (use == 1) {
 >   			return false;
 >   		}
 >  +
 >  +		membar_enter();
 >  +		if (vip->vi_state == VS_BLOCKED) {
 >  +			mutex_enter(vp->v_interlock);
 >  +			if (vip->vi_state == VS_BLOCKED) {
 >  +				vnpanic(vp, "vtryrele on BLOCKED vnode");
 >  +			}
 >  +			mutex_exit(vp->v_interlock);
 >  +		}
 >  +
 >   		KASSERT(use > 1);
 >   		next = atomic_cas_uint(&vp->v_usecount, use, use - 1);
 >   		if (__predict_true(next == use)) {
 >  @@ -865,6 +876,16 @@ vrele_async(vnode_t *vp)
 >   void
 >   vref(vnode_t *vp)
 >   {
 >  +	vnode_impl_t *vip = VNODE_TO_VIMPL(vp);
 >  +
 >  +	membar_enter();
 >  +	if (vip->vi_state == VS_BLOCKED) {
 >  +		mutex_enter(vp->v_interlock);
 >  +		if (vip->vi_state == VS_BLOCKED) {
 >  +			vnpanic(vp, "vref on BLOCKED vnode");
 >  +		}
 >  +		mutex_exit(vp->v_interlock);
 >  +	}
 >   
 >   	KASSERT(vp->v_usecount != 0);
 >   
 
 Should I apply the patch to current netbsd-8 or the version on which I
 could reproduce the crashes?  I ask because I've updated a couple of
 times since my report and I haven't seen the crashes since the
 updates.
 
 -- 
 Roy Bixler <rcbixler%nyx.net@localhost>
 "The fundamental principle of science, the definition almost, is this: the
 sole test of the validity of any idea is experiment."
 -- Richard P. Feynman
 



Home | Main Index | Thread Index | Old Index