Subject: Re: kern/29670
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Ken Raeburn <raeburn@raeburn.org>
List: netbsd-bugs
Date: 03/18/2005 21:19:01
The following reply was made to PR kern/29670; it has been noted by GNATS.
From: Ken Raeburn <raeburn@raeburn.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/29670
Date: Fri, 18 Mar 2005 16:17:56 -0500
On Mar 15, 2005, at 16:56, wrstuden@netbsd.org wrote:
> Please try sys/kern/vfs_lookup.c revision 1.57.
After I remembered to install the new kernel *outside* the chroot
environment in which I built it :-), my machine hung this morning with
a different failure. This is with 2.0 sources with vfs_lookup.c
updated to 1.57, with LOCKDEBUG defined.
[top messages lost]
VOP_FSYNC+0x4c
vnode_if.c line 662
ffs_sync+0x29a
sys_sync+0x105
vfs_shutdown+0x5f
cpu_reboot+0x103
panic+0x108
_lockmgr+0xeb
kern_lock.c line 566
genfs_lock+0x25
genfs_vnops.c line 325
VOP_LOCK+0x28
vnode_if.c line 1146
vn_lock+0x8d
vget+0x9c
cache_lookup+0x2de
vfs_cache.c line 283
ufs_lookup+0xc1
ufs_lookup.c line 169
layer_lookup+0x57
layer_vnops.c line 435
VOP_LOOKUP+0x2e
lookup+0x201
namei+0x138
sys___start13+0x58
syscall_plain+0x7e
panicstr points to: lockmgr: locking against myself
It hung at this point. I was able to get into ddb, which showed
aiodoned as the current process and an entirely different stack trace,
and while I was able to get a crash dump, gdb also shows the backtrace
through ddb and the keyboard interrupt handler, not the above stack.
However, the full details of the stack trace I copied down above are
also in the kernel message buffer so I can pull out the arguments if
you want them.
Looking at the lock that was passed to _lockmgr according to the
displayed traceback, I see:
$18 = {lk_interlock = {lock_data = 1,
lock_file = 0xc06c4814 "../../../../kern/kern_lock.c",
unlock_file = 0xc06c4814 "../../../../kern/kern_lock.c", lock_line
= 512,
unlock_line = 858, list = {tqe_next = 0xc07e7ac8, tqe_prev =
0xc07625c0},
lock_holder = 0}, lk_flags = 1024, lk_sharecount = 0,
lk_exclusivecount = 1, lk_recurselevel = 0, lk_waitcount = 0,
lk_wmesg = 0xc06c701b "vnlock", lk_un = {lk_un_sleep = {
lk_sleep_lockholder = 16383, lk_sleep_locklwp = 1, lk_sleep_prio
= 20,
lk_sleep_timo = 0}, lk_un_spin = {lk_spin_cpu = 16383,
lk_spin_list = {
tqe_next = 0x1, tqe_prev = 0x14}}},
lk_lock_file = 0xc071e6c0 "../../../../miscfs/genfs/genfs_vnops.c",
lk_unlock_file = 0xc071e6c0 "../../../../miscfs/genfs/genfs_vnops.c",
lk_lock_line = 324, lk_unlock_line = 340}
Before all of this, also, 11 instances of "vnode: table is full -
increase kern.maxvnodes or NVNODE" were logged, according to the
current contents of *msgbufp. They started just after the cron job
(with the rm -rf in a 'null' mounted fs) started, but the machine
managed to stagger on for at least 7 more minutes before locking up
entirely, as racoon did log some messages 7 minutes and 6 seconds
later. Recommendations for how I can find out where all those vnodes
are going would be appreciated, otherwise I'll keep poking at that side
of it.
Ken