NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/53096: netbsd-8 crash on heavy disk I/O



On Sun, Mar 18, 2018 at 04:50:01PM +0000, J. Hannken-Illjes wrote:
> The following reply was made to PR kern/53096; it has been noted by GNATS.
> 
> From: "J. Hannken-Illjes" <hannken%eis.cs.tu-bs.de@localhost>
> To: gnats-bugs%NetBSD.org@localhost
> Cc: 
> Subject: Re: kern/53096: netbsd-8 crash on heavy disk I/O
> Date: Sun, 18 Mar 2018 17:45:41 +0100
> 
>  The backtrace is a bit misleading, it really is:
>  
>  sys_chdir() -> vrele() -> vrelel() -> vstate_assert_change() -> vnpanic()
>  
>  This matches the panic from dmesg:
>  
>  ...
>  cpu 0: ucode 0x1a->0x29
>  cpu 1: ucode 0x1a->0x29
>  cpu 2: ucode 0x1a->0x29
>  cpu 3: ucode 0x1a->0x29
>  vnode 0xfffffe82137bde70 flags 0x30<MPSAFE,LOCKSWORK>
>    tag VT_UFS(1) type VDIR(2) mount 0xfffffe823dbb2008 typedata 0x0
>    usecount 1 writecount 0 holdcount 1
>    size 200 writesize 200 numoutput 0
>    data 0xfffffe8213cce900 lock 0xfffffe82137bdfa0
>    state BLOCKED key(0xfffffe823dbb2008 8) b1 c8 3a 00 00 00 00 00
>    lrulisthd 0xffffffff814c6400
>    tag VT_UFS, ino 3852465, on dev 0, 0 flags 0x0, nlink 3
>    mode 040755, owner 1001, group 0, size 512
>  panic: BLOCKED to LOADED with usecount 2 at vrelel:783
>  
>  Here vrelel() is:
>  
>  767  VSTATE_CHANGE(vp, VS_LOADED, VS_BLOCKED);
>  768  mutex_exit(vp->v_interlock);
>  ...
>  778  recycle = false;
>  779  VOP_INACTIVE(vp, &recycle);
>  780  if (!recycle)
>  781          VOP_UNLOCK(vp);
>  782  mutex_enter(vp->v_interlock);
>  783  VSTATE_CHANGE(vp, VS_BLOCKED, VS_LOADED);
>  
>  and VSTATE_CHANGE() expands to vstate_assert_change(), which is:
>  
>  315  KASSERTMSG(mutex_owned(vp->v_interlock), "at %s:%d", func, line);
>  
>  328  if ((from == VS_BLOCKED || to == VS_BLOCKED) && vp->v_usecount != 1)
>  329          vnpanic(vp, "%s to %s with usecount %d at %s:%d",
>  
>  So the usecount of a blocked vnode with interlock held changed from 1,
>  it is "2" on the call to vnpanic() and "1" when vnpanic prints
>  the vnode.
>  
>  As vcache_vget() and vcache_tryvget() either error out or wait if the current
>  state is BLOCKED it could be a vref() without a prior reference.
>  
>  Please try the attached patch to see if one of these assertions fire.
>  
>  diff -r 13173af16202 -r 0a76936d2ed0 sys/kern/vfs_vnode.c
>  --- sys/kern/vfs_vnode.c
>  +++ sys/kern/vfs_vnode.c
>  @@ -670,11 +670,22 @@ static inline bool
>   vtryrele(vnode_t *vp)
>   {
>   	u_int use, next;
>  +	vnode_impl_t *vip = VNODE_TO_VIMPL(vp);
>   
>   	for (use = vp->v_usecount;; use = next) {
>   		if (use == 1) {
>   			return false;
>   		}
>  +
>  +		membar_enter();
>  +		if (vip->vi_state == VS_BLOCKED) {
>  +			mutex_enter(vp->v_interlock);
>  +			if (vip->vi_state == VS_BLOCKED) {
>  +				vnpanic(vp, "vtryrele on BLOCKED vnode");
>  +			}
>  +			mutex_exit(vp->v_interlock);
>  +		}
>  +
>   		KASSERT(use > 1);
>   		next = atomic_cas_uint(&vp->v_usecount, use, use - 1);
>   		if (__predict_true(next == use)) {
>  @@ -865,6 +876,16 @@ vrele_async(vnode_t *vp)
>   void
>   vref(vnode_t *vp)
>   {
>  +	vnode_impl_t *vip = VNODE_TO_VIMPL(vp);
>  +
>  +	membar_enter();
>  +	if (vip->vi_state == VS_BLOCKED) {
>  +		mutex_enter(vp->v_interlock);
>  +		if (vip->vi_state == VS_BLOCKED) {
>  +			vnpanic(vp, "vref on BLOCKED vnode");
>  +		}
>  +		mutex_exit(vp->v_interlock);
>  +	}
>   
>   	KASSERT(vp->v_usecount != 0);
>   

Should I apply the patch to current netbsd-8 or the version on which I
could reproduce the crashes?  I ask because I've updated a couple of
times since my report and I haven't seen the crashes since the
updates.

-- 
Roy Bixler <rcbixler%nyx.net@localhost>
"The fundamental principle of science, the definition almost, is this: the
sole test of the validity of any idea is experiment."
-- Richard P. Feynman



Home | Main Index | Thread Index | Old Index