tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/41923: assertion "cur != owner" failed



On Tue, Aug 25, 2009 at 01:41:53AM +0200, Manuel Bouyer wrote:
> Hi,
> I may have found the cause of this problem:
> assertion "cur != owner" failed: file "src/sys/kern/kern_turnstile.c", line 
> 289
> 
> In turnstile_block(): we couldn't get a rwlock, so we're about to
> sleep. When entering the for(;;) loop we hold the tschain_t mutex, which
> is also "cur" lwp's lock since sleepq_enter(). At this point we also have
> l == cur.
> We have a ownder from (*l->l_syncobj->sobj_owner)(l->l_wchan) whose lock
> is not the same as l->l_mutex. So we enter the dolock case. Later, we'll do
> a lwp_unlock(l) and as we are "cur", we release the tschain_t mutex.
> The owner in rw_vector_exit() can now grab the turnstile lock and
> change the owner, eventually to the one in turnstile_block().
> The thread in turnstile_block() continue the for();; loop,
> and hit the KASSERT(cur != owner). Bom.
> 
> If you're lucky the thread in turnstile_block() won't notice the owner change
> and process to sleepq_block(). I wonder if this could explain the
> "processes stuck on tstile" problems peoples have been noticing,
> by corruption of the l_pi_lenders queue, or something like that
> In turnstile_wakeup(), we grab a lock only if the lwp's lock is
> ci_schedstate.spc_lwplock - what protects the queue in other cases ?

This question remains open

> It seems that in turnstile_block() it's always protected by the
> lwp's lock.
> 
> To me it looks like we should never release cur->l_mutex in turnstile_block.
> Any comments on this ?

The attached patch implements this. I've tested it on a 4-ways box
with several ls -lR and chmod -R running in parralel. It's now also installed
on the system that caused me to look at this in the first place.
Peoples experiencing the tstile syndrom may also want to give it a try.

There's a possible deadlock if owner holds its l_mutex and wants to
grab the turnstile lock. I don't know if mutex_vector_exit() or
rw_vector_exit() can be called with curlwp()->l_mutex held.
In such a case the previous code would not deadlock by luck, the owner
would have had to grab the lock in the "l = cur;" windows.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--
Index: kern_turnstile.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_turnstile.c,v
retrieving revision 1.23
diff -u -r1.23 kern_turnstile.c
--- kern_turnstile.c    12 Aug 2008 14:13:34 -0000      1.23
+++ kern_turnstile.c    26 Aug 2009 22:53:55 -0000
@@ -253,6 +253,7 @@
        sq = &ts->ts_sleepq[q];
        ts->ts_waiters[q]++;
        sleepq_enter(sq, l, &tc->tc_mutex);
+       /* now tc->tc_mutex is also l->l_mutex and cur->l_mutex */
        LOCKDEBUG_BARRIER(&tc->tc_mutex, 1);
        l->l_kpriority = true;
        obase = l->l_kpribase;
@@ -273,6 +274,9 @@
         * NOTE: if you get a panic in this code block, it is likely that
         * a lock has been destroyed or corrupted while still in use.  Try
         * compiling a kernel with LOCKDEBUG to pinpoint the problem.
+        * Also note that we have to be carefull to not release tc_mutex
+        * in this loop, or another thread could give the rwlock away in
+        * our back.
         */
        prio = lwp_eprio(l);
        for (;;) {
@@ -294,11 +298,14 @@
                        dolock = false;
                if (dolock && !lwp_trylock(owner)) {
                        /*
-                        * restart from curlwp.
+                        * restart from curlwp, but make sure we're not
+                        * already at curlwp. curlwp is always locked.
                         */
-                       lwp_unlock(l);
-                       l = cur;
-                       lwp_lock(l);
+                       if (l != cur) {
+                               if (cur->l_mutex != l->l_mutex)
+                                       lwp_unlock(l);
+                               l = cur;
+                       }
                        prio = lwp_eprio(l);
                        continue;
                }
@@ -318,14 +325,12 @@
                        ts->ts_eprio = prio;
                        lwp_lendpri(owner, prio);
                }
-               if (dolock)
+               if (dolock && cur->l_mutex != l->l_mutex)
                        lwp_unlock(l);
                l = owner;
        }
-       LOCKDEBUG_BARRIER(l->l_mutex, 1);
        if (cur->l_mutex != l->l_mutex) {
                lwp_unlock(l);
-               lwp_lock(cur);
        }
        LOCKDEBUG_BARRIER(cur->l_mutex, 1);
 


Home | Main Index | Thread Index | Old Index