Subject: kern/7954: nullfs panic when accessing front and back layers simultaneously
To: None <gnats-bugs@gnats.netbsd.org>
From: None <apb@iafrica.com>
List: netbsd-bugs
Date: 07/10/1999 04:56:55
>Number:         7954
>Category:       kern
>Synopsis:       nullfs panic when accessing front and back layers simultaneously
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jul 10 04:35:01 1999
>Last-Modified:
>Originator:     Alan Barrett
>Organization:
not much
>Release:        NetBSD-current 1999-07-09
>Environment:
System: NetBSD apb.iafrica.com 1.4E NetBSD 1.4E (APB) #0: Fri Jul 9 17:13:53 SAST 1999 apb@apb.iafrica.com:/b/USR/src/sys/arch/i386/compile/APB i386
>Description:

Despite Bill Studenmund's recent report that "nullfs now works", it
still doesn't seem to handle simultaneous access to the front and back
layers.

My /usr filesystem is actually a nullfs mount from /b/USR, where /b
is an ordinary ffs mount from a disk.  I am able to trigger a panic
"locking against myself" quite easily by simultaneously running "ls -lR
/usr" and "ls -lR /b/USR".

Here's some information about the panic (copied by hand, so
there might be transcription errors):

  panic: lockmgr: locking against myself
  Stopped in ls at        Debugger+0x4:   leave
  db> t
  Debugger(f9258e10,0,f92d8c8c,f92d7c04,f012b922) at Debugger+0x4
  panic(f0237660,10002,f044b800,0,0) at panic+0x55
  lockmgr(f9258e10,10002,f921345c,f92133d0,f044b800) at lockmgr+0x2ee
  layer_lock(f92d7c64) at layer_lock+0x4c
  vclean(f92133d0,8,f92d8c8c) at vclean+0x55
  vgonel(f92133d0,f92d8c8c) at vgonel+0x3b
  getnewvnode(1,f044b800,f0442e00,f92d7d00,f9258d80) at getnewvnode+0x11d
  ffs_vget(f044b800,1403de,f92d7d98,f92d7ea0,3) at ffs_vget+0x68
  ufs_lookup(f92d7e00,f9258d80,f92d7eb4,f92d7e90,f015533f) at ufs_lookup+0xd3e
  lookup(f92d7e90,f92d7f88,f92d8c8c,f92d7f88,f9144288) at lookup+0x24c
  namei(f92d7e90,f92d7f88,f92d8c8c,f92d7f80,80b3840) at namei+0x313
  sys___lstat13(f92d8c8c,f92d7f88,f92d7f80,0,80a2984) at sys___lstat13+0x44
  syscall() at syscall+0x23a
  --- syscall (number 280) ---
  0x8061805:
  db> ps/w
   PID      COMMAND    EMUL  PRI UTIME STIME WAIT-MSG    WAIT-CHANNEL
  >15872         ls  netbsd   73   0.5   0.6
   15871         ls  netbsd   26   0.5   0.8 ttyout      0xf91103d8
   [... more processes not shown ...]
  db> ps
   PID        PPID     PGRP    UID S   FLAGS       COMMAND    WAIT
  >15872     15463    15872      0 2  0x4006            ls
   15871     15463    15871      0 3  0x4086            ls  ttyout
   [... more processes not shown ...]

>How-To-Repeat:

    : let /b be the mountpoint of an ordinary FFS filesystem.
    : let /usr be an empty directory on the root filesystem.
    : let /b/USR be a directory tree that contains everything \
	    that one would normally expect to live in /usr.

    mount -t null /b/USR /usr

    ls -lR /usr & ls -lR /b/USR

    : wait for it to panic

>Fix:
    Fix the locking protocol or implementation?
>Audit-Trail:
>Unformatted: