Subject: Re: deadlocks, many processes in sleepq_block
To: Greg Oster <oster@cs.usask.ca>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 09/21/2007 01:20:59
On Thu, Sep 20, 2007 at 04:01:23PM -0600, Greg Oster wrote:

> Anthony Mallet writes:
> > Andrew Doran writes:
> > | If you are running amd64 you'll need to compile the kernel with
> > | -fno-omit-frame-pointer. LOCKDEBUG and DIAGNOSTIC will help to track down
> > | the problem.
> > 
> > Actually DIAGNOSTIC triggered an assert:
> > mutex_owned(&l->l_swaplock) failed: file uvm/uvm_glue.c, line 482
> 
> > I see that the last commit in this file is dated from Aug 18, which is
> > more or less the date at which I started to have trouble...
> >  
> > Since my kernel panic'ed, I was not able to use my keyboard to type
> > anything in ddb (it was still attached to the X session). I was able to
> > see the panic message by switching to VT1 before the panic.
> > Of course I have no serial line on this machine to run kgdb.
> > 
> > Any idea ? :)

Shame, be nice to have seen a backtrace from that one.
 
> Of the two places that call uvm_swapin(), the one that doesn't hold 
> l_swaplock is this one:
> 
> uvm_lwp_hold(struct lwp *l)
> {
> 
>         /* XXXSMP mutex_enter(&l->l_swaplock); */
>         if (l->l_holdcnt++ == 0 && (l->l_flag & LW_INMEM) == 0)
>                 uvm_swapin(l);
>         /* XXXSMP mutex_exit(&l->l_swaplock); */
> }

There are a number of places where uvm_lwp_hold() / rele() are called with
(l == curlwp), and the callers expect them not to block in that case. The
locks are fully enabled on the vmlocking branch. I've commented out the
asssertion for now with uvm_glue.c rev 1.112.

Thanks,
Andrew