[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Finding out where biowait is stuck
On Mon, Feb 23, 2009 at 09:38:13AM +0200, Stuart Brooks wrote:
> I have a problem where a process is getting stuck in a biowait and I was
> wondering if there is any way to find out where in the code (or in the
> kernel) this is happening. Once it is in this state it is unkillable so
> it doesn't process signal 11 to get a core dump. I am 90% sure it is the
> access to an external disk array which is causing it but I can't find a
> way to verify this. I have control of the source code so can compile it
> with symbols etc.
It's happening in the kernel. Do you have access to the console?
If so, when it gets stuck, you can enter DDB and get a process
listing with both the wait message ("biowait"--labelled "WCHAN" in
the userland ps output) and the "wait channel". The wait channel
is a kernel virtual address of the object being waited on. In this
case, a "struct buf", and you can enter "show buf <addr>" to see
the contents of that buf. The process is in src/sys/vfs_bio.c:biowait(),
but the question is why isn't it getting woken up--or if it's
getting woken up, why aren't B_DONE or B_DELWRI set?
It would be useful to know what version you're running, and what disk
drivers you're using. A dmesg (or /var/run/dmesg.boot) would show both,
although it won't tell us which device "owns" the buf that gets stuck.
Another option is to get a kernel core dump (if that's possible in this
> PS. Would this be better sent to Users?
I think this is the right place.
Allen Briggs | http://www.ninthwonder.com/~briggs/ |
Main Index |
Thread Index |