current-users: Re: 1.4.1+ (almost 1.4.2) lockups

Subject: Re: 1.4.1+ (almost 1.4.2) lockups
To: None <explorer@flame.org>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: current-users
Date: 10/05/1999 14:08:47

> I'm running the netbsd-1-4 branch on two machines, and both lock up in the following
> manner at least once every two weeks:
> 
> (from ddb's ps /w)
> 
>  PID          COMMAND     EMUL  PRI UTIME STIME WAIT-MSG    WAIT-CHANNEL
>  9010           inetd   netbsd    8   0.0   0.0 inode       0xfdb7e034
>  9009           inetd   netbsd    8   0.0   0.0 inode       0xfdb7e034
>  9008           inetd   netbsd    8   0.0   0.0 inode       0xfdb7e034
> ...
> 
> Has anyone else seen this?

This looks like an inode-locking deadlock of some sort.

The next step in analyzing this is to figure out who holds the locks
that all these processes are waiting for.  Fortunately, kernel sleep
locks include an indication of which pid currently holds them.

Getting tracebacks of the various processes in ddb will probably help.
(t/t 0t9009, t/t 0t9008, ...)

Ideally, you also want to figure out (a) which inode they're waiting
for, and (b) which process is currently holding the lock.  In
-current, you'd look at vp->v_vnlock->lk_lockholder to get the pid
holding the lock.  If you're sufficiently motivated, you can hunt down
appropriate structure offsets and pull this out using ddb, though it's
less painful with kgdb..

For 1.4.x, the vnode lock is in ip->i_lock.lk_lockholder in the inode,
where ip = vp->v_data.

						- Bill