tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/41923: assertion "cur != owner" failed



On Thu, Aug 27, 2009 at 10:00:45PM +0200, Manuel Bouyer wrote:

> On Thu, Aug 27, 2009 at 01:05:50AM +0200, Manuel Bouyer wrote:
> > > It seems that in turnstile_block() it's always protected by the
> > > lwp's lock.
> > > 
> > > To me it looks like we should never release cur->l_mutex in 
> > > turnstile_block.
> > > Any comments on this ?
> > 
> > The attached patch implements this. I've tested it on a 4-ways box
> > with several ls -lR and chmod -R running in parralel. It's now also 
> > installed
> > on the system that caused me to look at this in the first place.
> 
> I got a deadlock with this; it seems that a lwp can call turnstile_lookup()
> with its l_mutex held

Should only happen according to the lock hierarchy, which is that turnstile
locks are almost leafs with only runqueue locks further down the chain. 
Regular sleep queue locks can be taken before turnstile locks.

> while another one in turnstile_block() tries to get
> its l_mutex while holding the tc lock.


 
> So I changed the patch to the attached one instead. It tries to deal with
> the fact that owner can change, but I'm not sure it gets all the
> cases right. What if the owner is changed to a lwp that we already looked 
> at in the for() loop ?
> Also there's still a possible livelock when l == cur and we fail to get
> the owner's lock. We need to be lucky for the owner's lwp to grab the
> tc_lock between the lwp_unlock() and lwp_lock() here; but I didn't find
> a way to safely grap owner's mutex without l's mutex held.
> Any comment/idea welcome here.
> 
> -- 
> Manuel Bouyer <bouyer%antioche.eu.org@localhost>
>      NetBSD: 26 ans d'experience feront toujours la difference
> --

> Index: kern_turnstile.c
> ===================================================================
> RCS file: /cvsroot/src/sys/kern/kern_turnstile.c,v
> retrieving revision 1.23
> diff -u -r1.23 kern_turnstile.c
> --- kern_turnstile.c  12 Aug 2008 14:13:34 -0000      1.23
> +++ kern_turnstile.c  27 Aug 2009 19:52:54 -0000
> @@ -253,6 +253,7 @@
>       sq = &ts->ts_sleepq[q];
>       ts->ts_waiters[q]++;
>       sleepq_enter(sq, l, &tc->tc_mutex);
> +     /* now tc->tc_mutex is also cur->l_mutex and l->l_mutex */
>       LOCKDEBUG_BARRIER(&tc->tc_mutex, 1);
>       l->l_kpriority = true;
>       obase = l->l_kpribase;
> @@ -275,6 +276,7 @@
>        * compiling a kernel with LOCKDEBUG to pinpoint the problem.
>        */
>       prio = lwp_eprio(l);
> +
>       for (;;) {
>               bool dolock;
>  
> @@ -285,9 +287,24 @@
>               if (owner == NULL)
>                       break;
>  
> -             KASSERT(l != owner);
> -             KASSERT(cur != owner);
> +             /* The owner may have changed as we have dropped the tc lock */
> +             if (cur == owner) {
> +                     /*
> +                      * we own the lock: stop here, sleepq_block()
> +                      * should wake up immediatly
> +                      */
> +                     break;
> +             }

Can turnstile_wakeup() deal with this condition?  If so it looks safe to me.

> +             if (l == owner) {
> +                     /* owner has changed, restart from curlwp */
> +                     lwp_unlock(l);
> +                     l = cur;
> +                     lwp_lock(l);
> +                     prio = lwp_eprio(l);
> +                     continue;
> +             }
> +

This looks safe, haven't tought too much about what's doing functionally in
the greater scheme of things though..

>               if (l->l_mutex != owner->l_mutex)
>                       dolock = true;
>               else
> @@ -295,6 +312,10 @@
>               if (dolock && !lwp_trylock(owner)) {
>                       /*
>                        * restart from curlwp.
> +                      * Note that there may be a livelock here:
> +                      * the owner may try grabing cur's lock (which is
> +                      * the tc lock) while we're trying to grab
> +                      * the owner's lock.

Perhaps a small spinloop based on SPINLOCK_BACKOFF_HOOK and ((uintptr_t)l ^
cpu_counter()) could be good?


Home | Main Index | Thread Index | Old Index