Subject: Re: kern/29670
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Ken Raeburn <raeburn@raeburn.org>
List: netbsd-bugs
Date: 03/18/2005 21:19:01
The following reply was made to PR kern/29670; it has been noted by GNATS.

From: Ken Raeburn <raeburn@raeburn.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/29670
Date: Fri, 18 Mar 2005 16:17:56 -0500

 On Mar 15, 2005, at 16:56, wrstuden@netbsd.org wrote:
 > Please try sys/kern/vfs_lookup.c revision 1.57.
 
 After I remembered to install the new kernel *outside* the chroot 
 environment in which I built it :-), my machine hung this morning with 
 a different failure.  This is with 2.0 sources with vfs_lookup.c 
 updated to 1.57, with LOCKDEBUG defined.
 
 [top messages lost]
 VOP_FSYNC+0x4c
    vnode_if.c line 662
 ffs_sync+0x29a
 sys_sync+0x105
 vfs_shutdown+0x5f
 cpu_reboot+0x103
 panic+0x108
 _lockmgr+0xeb
    kern_lock.c line 566
 genfs_lock+0x25
    genfs_vnops.c line 325
 VOP_LOCK+0x28
    vnode_if.c line 1146
 vn_lock+0x8d
 vget+0x9c
 cache_lookup+0x2de
    vfs_cache.c line 283
 ufs_lookup+0xc1
    ufs_lookup.c line 169
 layer_lookup+0x57
    layer_vnops.c line 435
 VOP_LOOKUP+0x2e
 lookup+0x201
 namei+0x138
 sys___start13+0x58
 syscall_plain+0x7e
 
 panicstr points to: lockmgr: locking against myself
 
 It hung at this point.  I was able to get into ddb, which showed 
 aiodoned as the current process and an entirely different stack trace, 
 and while I was able to get a crash dump, gdb also shows the backtrace 
 through ddb and the keyboard interrupt handler, not the above stack.  
 However, the full details of the stack trace I copied down above are 
 also in the kernel message buffer so I can pull out the arguments if 
 you want them.
 
 Looking at the lock that was passed to _lockmgr according to the 
 displayed traceback, I see:
 
 $18 = {lk_interlock = {lock_data = 1,
      lock_file = 0xc06c4814 "../../../../kern/kern_lock.c",
      unlock_file = 0xc06c4814 "../../../../kern/kern_lock.c", lock_line 
 = 512,
      unlock_line = 858, list = {tqe_next = 0xc07e7ac8, tqe_prev = 
 0xc07625c0},
      lock_holder = 0}, lk_flags = 1024, lk_sharecount = 0,
    lk_exclusivecount = 1, lk_recurselevel = 0, lk_waitcount = 0,
    lk_wmesg = 0xc06c701b "vnlock", lk_un = {lk_un_sleep = {
        lk_sleep_lockholder = 16383, lk_sleep_locklwp = 1, lk_sleep_prio 
 = 20,
        lk_sleep_timo = 0}, lk_un_spin = {lk_spin_cpu = 16383, 
 lk_spin_list = {
          tqe_next = 0x1, tqe_prev = 0x14}}},
    lk_lock_file = 0xc071e6c0 "../../../../miscfs/genfs/genfs_vnops.c",
    lk_unlock_file = 0xc071e6c0 "../../../../miscfs/genfs/genfs_vnops.c",
    lk_lock_line = 324, lk_unlock_line = 340}
 
 
 Before all of this, also, 11 instances of "vnode: table is full - 
 increase kern.maxvnodes or NVNODE" were logged, according to the 
 current contents of *msgbufp.  They started just after the cron job 
 (with the rm -rf in a 'null' mounted fs) started, but the machine 
 managed to stagger on for at least 7 more minutes before locking up 
 entirely, as racoon did log some messages 7 minutes and 6 seconds 
 later.  Recommendations for how I can find out where all those vnodes 
 are going would be appreciated, otherwise I'll keep poking at that side 
 of it.
 
 Ken