Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Anyone having trouble with nfs locking up lately?



On Sun, Nov 13, 2011 at 02:58:38PM -0800, Hisashi T Fujinaka wrote:
 > >> >I'm on amd64-current and perhaps it's foolish to use it as an nfs
 > >> >server for my sources. It seems to lock up after compiling, etc.
 > >> >
 > >> >Is there something I can do to help debug this?
 > >>
 > >> Lock up as in crash unresponsive, or tstile? Mine tstiles
 > >> (filesystem accesses get stuck tstile).
 > >> Do you have a DIAGNOSTIC/DEBUG/LOCKDEBUG kernel?
 > >
 > >Any chance of being able to dig out the deadlocking processes (and
 > >their locksets) with ddb or crash?
 > 
 > I'm willing to do run experiments if you're able to give me any
 > suggestions.

Basically what you need to do is run crash(8) on the live kernel when
it's stuck (or ddb, but crash is less likely to wedge it even further)
and find the cycle of lwps and mutexes that's creating a deadlock.

A locked mutex (except for spin-mutexes) is a pointer to the lwp that
is holding it, so basically what you do is pick a lwp that's tstiling,
use "show lwp" to find what mutex it's waiting for (this is in the
wchan information), dereference the pointer to the mutex to get its
contents, which is a pointer to the lwp the tstiler is waiting for.
You'll eventually find either a cycle (points back to a lwp you've
already seen) or a lwp that isn't tstiling but is blocked on something
else.

It is a pain.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index