tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Finding out where biowait is stuck



On Mon, Feb 23, 2009 at 09:38:13AM +0200, Stuart Brooks wrote:
> I have a problem where a process is getting stuck in a biowait and I was 
> wondering if there is any way to find out where in the code (or in the 
> kernel) this is happening. Once it is in this state it is unkillable so 
> it doesn't process signal 11 to get a core dump. I am 90% sure it is the 
> access to an external disk array which is causing it but I can't find a 
> way to verify this. I have control of the source code so can compile it 
> with symbols etc.

It's happening in the kernel.  Do you have access to the console?
If so, when it gets stuck, you can enter DDB and get a process
listing with both the wait message ("biowait"--labelled "WCHAN" in
the userland ps output) and the "wait channel".  The wait channel
is a kernel virtual address of the object being waited on.  In this
case, a "struct buf", and you can enter "show buf <addr>" to see
the contents of that buf.  The process is in src/sys/vfs_bio.c:biowait(),
but the question is why isn't it getting woken up--or if it's
getting woken up, why aren't B_DONE or B_DELWRI set?

It would be useful to know what version you're running, and what disk
drivers you're using.  A dmesg (or /var/run/dmesg.boot) would show both,
although it won't tell us which device "owns" the buf that gets stuck.

Another option is to get a kernel core dump (if that's possible in this
state).

> PS. Would this be better sent to Users?

I think this is the right place.

-allen

-- 
Allen Briggs  |  http://www.ninthwonder.com/~briggs/  |  
briggs%ninthwonder.com@localhost


Home | Main Index | Thread Index | Old Index