Re: assertion "solocked(sb->sb_so)" failed [Re: diagnostic assertion "solocked2(so, so2)" failed]

To: tech-kern%NetBSD.org@localhost
Subject: Re: assertion "solocked(sb->sb_so)" failed [Re: diagnostic assertion "solocked2(so, so2)" failed]
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Date: Mon, 17 Aug 2009 13:44:15 +0200

On Mon, Aug 17, 2009 at 12:19:15AM +0200, Manuel Bouyer wrote:
> > > but I suspect other assertions using solocked2() are also broken; this 
> > > needs
> > > to be analysed. For now I'm just going to change solocked2() to always 
> > > return
> > > true, as suggected by ad.
> > 
> > After doing so I got this panic:
> > 
> > panic: kernel diagnostic assertion "solocked(sb->sb_so)" failed: file 
> > "/home/bouyer/src-5/src/sys/kern/uipc_socket2.c", line 726                  
> >               
> > [...]
> > this sbappend() is called a few lines after the KASSERT(solocked2(so, so2))
> > that was triggered before. m is not NULL. At this point this KASSERT
> > doens't look bogus, it doesn't look safe to get there without a lock held.

On second though it's also an issue with solocked(), but the code
should be safe. Here's a possible senario: when we do unp_setpeerlocks(),
we do unp_resetlock(so); at this point the thread doing a write on the socket
can grab the lock and go on. When we call solocked() on the remote end,
so_lock still points to uipc_lock because the first thread has not done the
unp_resetlock(so2) yet. This threads loads so_lock in a register and
is preempted. When it comes back, the first threads has completed the
unp_resetlock(so2) and released uipc_lock. the KASSERT fires but at this
time so_lock doens't point to the checked lock any more.

this means that because of this race, solocked() and solocked2() are
not reliable.

> 
> I think I found a problem with the way socket locks are changed
> in uipc_usrreq.c: unp_setpeerlocks() takes both sockets locked by the
> uipc_lock. But on return the sockets may be unlocked: a
> KASSERT(mutex_owned(lock) at the end of unp_setpeerlocks() fired as
> soon as sendmail started talking with milters:
> __kernassert(c05f97b1,c062a938,fe,c062a7a7,0,d1149000,c2ecbb80,0,c2ecbb00,0) 
> at netbsd:__kernassert+0x39
> unp_setpeerlocks(c2ee7158,c2cca000,0,80,14,0,d1a5bc2c,c2ee7158,c2ee7158,4) at 
> netbsd:unp_setpeerlocks+0x11a
> uipc_usrreq(c2ee7158,5,0,c2e65a00,0,0,0,d1862d00,d1862d00,d1862d00) at 
> netbsd:uipc_usrreq+0x294
> soaccept(c2ee7158,c2e65a00,d1a5bc8c,c0319ec7,d1a5bc9c,c2e65a00,c2ee73d8,c0340e8f,d1a5bccc,7)
>  at netbsd:soaccept+0x60
> do_sys_accept(d1870580,4,d1a5bccc,d1a5bd28,d1a5bccc,bfbfed54,8,0,0,0) at 
> netbsd:do_sys_accept+0x1e6
> sys_accept(d1870580,d1a5bd00,d1a5bd28,bb904000,d1a7fe1c,d1a7fe1c,2,4,0,0) at 
> netbsd:sys_accept+0x31                   
> syscall(d1a5bd48,ffff00b3,ab,bfbf001f,bbbc001f,804c23c,4,bfbfecb8,bb905040,0) 
> at netbsd:syscall+0xc8
> 
> 
> After calling unp_setpeerlocks() uipc_usrreq(PRU_ACCEPT) will call
> unp_setaddr() which, I think, expects the socket to be locked as
> it does a sounlock()/solock() in a (probably rarely used) code path.
> However, if unp_setpeerlocks() takes the new lock I'm can't find where
> it would be released as do_sys_accept() releases the lock on so and not
> so2.
> 
> Also, unp_setpeerlocks() is called from unp_connect2(PRU_CONNECT),
> and the caller, unp_connect(), expects the sockets to be locked on
> return as it does a sounlock(). If I got it right, this is true because
> we do solock(so); unp_resetlock(so); in unp_connect(), thus
> unp->unp_streamlock points to a locked lock.
> 
> I'm not sure how to sort this. Should uipc_usrreq(PRU_ACCEPT)
> grab the unp->unp_streamlock before calling unp_setpeerlocks() and release
> it after ?

The attached patch does this, and seems to work fine. It should make
solocked() and solocked2() reliable, and also fix the unlock/lock issue
in unp_setaddr() for the PRU_ACCEPT case.

Any comment on this ?

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--

Index: uipc_usrreq.c
===================================================================
RCS file: /cvsroot/src/sys/kern/uipc_usrreq.c,v
retrieving revision 1.119.4.2
diff -u -r1.119.4.2 uipc_usrreq.c
--- uipc_usrreq.c       18 Mar 2009 05:33:23 -0000      1.119.4.2
+++ uipc_usrreq.c       17 Aug 2009 11:32:37 -0000
@@ -251,6 +251,11 @@
        unp->unp_streamlock = NULL;
        mutex_obj_hold(lock);
        membar_exit();
+       /*
+        * possible race if lock is not held - see comment in
+        * uipc_usrreq(PRU_ACCEPT).
+        */
+       KASSERT(mutex_owned(lock));
        solockreset(so, lock);
        solockreset(so2, lock);
 }
@@ -327,6 +332,7 @@
        struct unpcb *unp;
        bool ext;
 
+       KASSERT(solocked(so));
        unp = sotounpcb(so);
        ext = false;
 
@@ -443,7 +449,17 @@
                 * If the connection is fully established, break the
                 * association with uipc_lock and give the connected
                 * pair a seperate lock to share.
+                * There is a race here: sotounpcb(so2)->unp_streamlock
+                * is not locked, so when changing so2->so_lock
+                * another thread can grab it while so->so_lock is still
+                * pointing to the (locked) uipc_lock.
+                * this should be harmless, exept that this makes
+                * solocked2() and solocked() unreliable.
+                * Another problem is that unp_setaddr() expects the
+                * the socket locked. Grabing sotounpcb(so2)->unp_streamlock
+                * fixes both issues.
                 */
+               mutex_enter(sotounpcb(so2)->unp_streamlock);
                unp_setpeerlocks(so2, so);
                /*
                 * Only now return peer's address, as we may need to
@@ -454,6 +470,7 @@
                 * error == 0 and sun_noname as the peer address.
                 */
                unp_setaddr(so, nam, true);
+               mutex_exit(so2->so_lock);
                break;
 
        case PRU_SHUTDOWN:

References:
- diagnostic assertion "solocked2(so, so2)" failed
  - From: Manuel Bouyer
- Re: diagnostic assertion "solocked2(so, so2)" failed
  - From: Manuel Bouyer
- assertion "solocked(sb->sb_so)" failed [Re: diagnostic assertion "solocked2(so, so2)" failed]
  - From: Manuel Bouyer
- Re: assertion "solocked(sb->sb_so)" failed [Re: diagnostic assertion "solocked2(so, so2)" failed]
  - From: Manuel Bouyer

Prev by Date: Re: Reclaiming vnodes
Next by Date: Re: Reclaiming vnodes
Previous by Thread: Re: assertion "solocked(sb->sb_so)" failed [Re: diagnostic assertion "solocked2(so, so2)" failed]
Next by Thread: ZFS, ARC, and UBC
Indexes:

Home | Main Index | Thread Index | Old Index