tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/41923: assertion "cur != owner" failed



On Thu, Aug 27, 2009 at 01:05:50AM +0200, Manuel Bouyer wrote:
> > It seems that in turnstile_block() it's always protected by the
> > lwp's lock.
> > 
> > To me it looks like we should never release cur->l_mutex in turnstile_block.
> > Any comments on this ?
> 
> The attached patch implements this. I've tested it on a 4-ways box
> with several ls -lR and chmod -R running in parralel. It's now also installed
> on the system that caused me to look at this in the first place.

I got a deadlock with this; it seems that a lwp can call turnstile_lookup()
with its l_mutex held while another one in turnstile_block() tries to get
its l_mutex while holding the tc lock.

So I changed the patch to the attached one instead. It tries to deal with
the fact that owner can change, but I'm not sure it gets all the
cases right. What if the owner is changed to a lwp that we already looked 
at in the for() loop ?
Also there's still a possible livelock when l == cur and we fail to get
the owner's lock. We need to be lucky for the owner's lwp to grab the
tc_lock between the lwp_unlock() and lwp_lock() here; but I didn't find
a way to safely grap owner's mutex without l's mutex held.
Any comment/idea welcome here.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--
Index: kern_turnstile.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_turnstile.c,v
retrieving revision 1.23
diff -u -r1.23 kern_turnstile.c
--- kern_turnstile.c    12 Aug 2008 14:13:34 -0000      1.23
+++ kern_turnstile.c    27 Aug 2009 19:52:54 -0000
@@ -253,6 +253,7 @@
        sq = &ts->ts_sleepq[q];
        ts->ts_waiters[q]++;
        sleepq_enter(sq, l, &tc->tc_mutex);
+       /* now tc->tc_mutex is also cur->l_mutex and l->l_mutex */
        LOCKDEBUG_BARRIER(&tc->tc_mutex, 1);
        l->l_kpriority = true;
        obase = l->l_kpribase;
@@ -275,6 +276,7 @@
         * compiling a kernel with LOCKDEBUG to pinpoint the problem.
         */
        prio = lwp_eprio(l);
+
        for (;;) {
                bool dolock;
 
@@ -285,9 +287,24 @@
                if (owner == NULL)
                        break;
 
-               KASSERT(l != owner);
-               KASSERT(cur != owner);
+               /* The owner may have changed as we have dropped the tc lock */
+               if (cur == owner) {
+                       /*
+                        * we own the lock: stop here, sleepq_block()
+                        * should wake up immediatly
+                        */
+                       break;
+               }
 
+               if (l == owner) {
+                       /* owner has changed, restart from curlwp */
+                       lwp_unlock(l);
+                       l = cur;
+                       lwp_lock(l);
+                       prio = lwp_eprio(l);
+                       continue;
+               }
+                       
                if (l->l_mutex != owner->l_mutex)
                        dolock = true;
                else
@@ -295,6 +312,10 @@
                if (dolock && !lwp_trylock(owner)) {
                        /*
                         * restart from curlwp.
+                        * Note that there may be a livelock here:
+                        * the owner may try grabing cur's lock (which is
+                        * the tc lock) while we're trying to grab
+                        * the owner's lock.
                         */
                        lwp_unlock(l);
                        l = cur;


Home | Main Index | Thread Index | Old Index