Re: Finding out where biowait is stuck

To: tech-kern%netbsd.org@localhost
Subject: Re: Finding out where biowait is stuck
From: Stuart Brooks <stuartb%cat.co.za@localhost>
Date: Mon, 23 Feb 2009 17:39:07 +0200

I have a problem where a process is getting stuck in a biowait and I waswondering if there is any way to find out where in the code (or in thekernel) this is happening. Once it is in this state it is unkillable soit doesn't process signal 11 to get a core dump. I am 90% sure it is theaccess to an external disk array which is causing it but I can't find away to verify this. I have control of the source code so can compile itwith symbols etc.
It's happening in the kernel.  Do you have access to the console?
If so, when it gets stuck, you can enter DDB and get a process
listing with both the wait message ("biowait"--labelled "WCHAN" in
the userland ps output) and the "wait channel".  The wait channel
is a kernel virtual address of the object being waited on.  In this
case, a "struct buf", and you can enter "show buf <addr>" to see
the contents of that buf.  The process is in src/sys/vfs_bio.c:biowait(),
but the question is why isn't it getting woken up--or if it's
getting woken up, why aren't B_DONE or B_DELWRI set?

It would be useful to know what version you're running, and what disk
drivers you're using.  A dmesg (or /var/run/dmesg.boot) would show both,
although it won't tell us which device "owns" the buf that gets stuck.

Another option is to get a kernel core dump (if that's possible in this
state).

Thanks for the reply. Unfortunately the system is in another timezone sono easy way for me to get at the console. That would have been firstprize. I was hoping for a way to do a one-shot query to get that kind ofinformation.

I'm running NetBSD 3 with an Adaptec SCSI controller attached to aneasyRaid device (dnesg.boot attached). I've been thinking that maybewhen the easyRaid is under heavy load it might be giving problems. I'mnot sure how I can prove that. Interesting thing is that the last 2times this problem has occurred it's been within a minute or so of thedaily script running (circa 03h15). I'm not sure this is a coincidence...


Stuart

Attachment: dmesg.boot.gz
Description: application/gzip

References:
- Finding out where biowait is stuck
  - From: Stuart Brooks
- Re: Finding out where biowait is stuck
  - From: Allen Briggs

Prev by Date: Re: Autotuning of kern.ipc.shmmaxpgs
Next by Date: Re: Autotuning of kern.ipc.shmmaxpgs
Previous by Thread: Re: Finding out where biowait is stuck
Next by Thread: Re: Finding out where biowait is stuck
Indexes:

Home | Main Index | Thread Index | Old Index