Subject: Re: deadlocks, many processes in sleepq_block
To: Anthony Mallet <anthony.mallet@laas.fr>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 09/20/2007 17:08:08
On Thu, Sep 20, 2007 at 06:01:27PM +0200, Anthony Mallet wrote:

> Since a few weeks (let's say since ~ august 20th), my -very-current with
> option MULTIPROCESSOR freezes _very_ often.
> I can trigger this problem in a few seconds under X11 with 1000baseT
> network and nfs activity, but it takes some minutes under X11 with
> 100baseT network and nfs activity. The most strange is that I triggered
> it once without option MULTIPROCESSOR (i.e. mono processor).
> But it never happened in console mode.
> 
> The symptoms are the following:
> - no network, no filesystem acces.
> - kernel is still alive, processes that do no IO at all continue to work
>   (e.g. I can switch my virtual desktops under X).
> - I can kill X (ctrl-alt-bkspace) and enter ddb.

Sounds like a classic deadlock :-)
 
> A ps in ddb always shows the same three active processes: 
> firefox-bin & gkrellm & nfsd.
> Every single process seems to be hung in sleepq_block() and the three
> above were doing some nfs-related operations before ending up in
> sleepq_block().
> 
> Is there something I should try/investigate in order to find the source
> of the problem? Is it an already known problem?
> Any help appreciated :)

Can you get backtraces from the hung LWPs - not processes? Do 'ps/l' which
will show you all the LWPs in the system, then one by one pass the LWP
addresses of the hung threads to 't/a'.

If you are running amd64 you'll need to compile the kernel with
-fno-omit-frame-pointer. LOCKDEBUG and DIAGNOSTIC will help to track down
the problem.

Thanks,
Andrew