Re: ongoing major problems with NetBSD-5 and LOCKDEBUG on multi-core system (mfi(4) related?)

To: NetBSD Users's Discussion List <netbsd-users%netbsd.org@localhost>, NetBSD/i386 Discussion List <port-i386%NetBSD.org@localhost>
Subject: Re: ongoing major problems with NetBSD-5 and LOCKDEBUG on multi-core system (mfi(4) related?)
From: "Greg A. Woods" <woods%planix.ca@localhost>
Date: Tue, 17 Jan 2012 20:28:28 -0800

After digging in the mfi(4) code a bit more, and poking through the
current OpenBSD code (from which NetBSD's mfi(4) was long ago derived),
I thought I had discovered a possible problem (as well as a few other
bug fixes not yet imported to NetBSD).  The changes I made to mfi(4) are
appended below my signature.  I completely removed the kernel_lock and
reverted to using splbio() around the only the same code OpenBSD uses it
around (w.r.t. the code paths previously protected by the kernel_lock).

Unfortunately although it seemed to get a bit further along with some of
my tests, the machine still crashed soon enough again.

(I did manage to get a full sysinst onto my CF card for my Soekris box,
and a bit further along in a "build.sh -j 4" than last time....)

Unfortunately some stray output hung the telnet session through the
terminal server I'm using for serial ports and so I wasn't able to get
any more from DDB (I think it was the garbage being spewed by savecore,
see below):

Reader / writer lock error: lockdebug_unlocked: no shared locks held by LWP

lock address : 0x00000000c0d52cc0 type     :     sleep/adaptive
initialized  : 0x00000000c04e0a73
shared holds :                  0 exclusive:                  0
shares wanted:                  0 exclusive:                  0
current cpu  :                  7 last held:              65535
current lwp  : 0x00000000deac9340 last held: 000000000000000000
last locked  : 0x00000000c04e0111 unlocked : 0x00000000c04e0302
owner/count  : 000000000000000000 flags    : 0x0000000000000008

Turnstile chain at 0xc0d53200.
=> No active turnstile for this lock.

panic: LOCKDEBUGWARNING: SPL NOT LOWERED ON TRAP EXIT

WARNING: SPL NOT LOWERED ON SYSCALL EXIT
fatal breakpoint trapWARNING: SPL NOT LOWERED ON SYSCALL EXIT
 in supervisor mode
WARNING: SPL NOT LOWERED ON SYSCALL EXIT
trap type 1 code 0 eip c05cbffc cs 8 eflags 246 cr2 bbb30010 ilevel 0
Stopped in pid 4737.1 (systat) at       netbsd:breakpoint+0x4:  popl    %ebp
db{7}> 



GDB on the crash dump isn't much more useful -- I have not yet learned
how to show the stack backtrace for other CPUs:


# cp ~woods/tmp/netbsd-mfi.gdb /var/crash/netbsd.11.gdb
# ls -l /var/crash/netbsd.11*
-rw-------  1 root  wheel  161826836 Jan 17 17:43 /var/crash/netbsd.11.core
-rwxr-xr-x  1 root  wheel   59213588 Jan 17 20:09 /var/crash/netbsd.11.gdb
-rw-------  1 root  wheel         10 Jan 17 17:43 /var/crash/netbsd.11.gz
# gdb /var/crash/netbsd.11.gdb /var/crash/netbsd.11.core
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386--netbsdelf"...
"/var/crash/netbsd.11.core" is not a core dump: File format not recognized
(gdb) target kvm
#0  0xc04de819 in mi_switch (l=0xc0c3ad80) at 
/rest/work/woods/m-NetBSD-5/sys/kern/kern_synch.c:771
771                     prevlwp = cpu_switchto(l, newl, returning);
(gdb) where
#0  0xc04de819 in mi_switch (l=0xc0c3ad80) at 
/rest/work/woods/m-NetBSD-5/sys/kern/kern_synch.c:771
#1  0xc04daacb in sleepq_block (timo=0, catch=false) at 
/rest/work/woods/m-NetBSD-5/sys/kern/kern_sleepq.c:269
#2  0xc04b4f5c in cv_wait (cv=0xc0d4f524, mtx=0xc0d4f8b8) at 
/rest/work/woods/m-NetBSD-5/sys/kern/kern_condvar.c:201
#3  0xc04627f7 in uvm_scheduler () at 
/rest/work/woods/m-NetBSD-5/sys/uvm/uvm_glue.c:550
#4  0xc04a96aa in main () at 
/rest/work/woods/m-NetBSD-5/sys/kern/init_main.c:682
(gdb) 



Savecore is spewing garbage when it tries to tell me there's a version
mismatch between the booted kernel and the version in the core dump (and
unfortunately the text of the message gives no hint as to which is
which!)  (I replaced all the garbage binary characters with '*'s):


savecore: warning: (null) version mismatch:
        NetBSD 5.1_STABLE (GENERIC) #2: Tue Jan 17 16:19:39 PST 2012
        
woods@once:/rest/build/woods/once/netbsd-5-i386-i386-ppro-obj/rest/work/woods/m-NetBSD-5/sys/arch/i386/compile/GENERIC

and     *******************
***************************
**********************      ***************** ***
                     ***************

savecore: reboot after panic: panic: LOCKDEBUG
Jan 17 17:43:43 more savecore: reboot after panic: panic: LOCKDEBUG
savecore: system went down at Sun Jan 15 19:23:29 2012
savecore: writing compressed core to /var/crash/netbsd.11.core.gz
savecore: writing compressed kernel to /var/crash/netbsd.11.gz
savecore: (null): Bad address
Jan 17 17:43:49 more savecore: (null): Bad address


-- 
                                                Greg A. Woods
                                                Planix, Inc.

<woods%planix.com@localhost>       +1 250 762-7675        http://www.planix.com/


Index: sys/dev/ic/mfi.c
===================================================================
RCS file: /cvs/master/m-NetBSD/main/src/sys/dev/ic/mfi.c,v
retrieving revision 1.19.4.4
diff -u -r1.19.4.4 mfi.c
--- sys/dev/ic/mfi.c    28 Mar 2010 15:03:22 -0000      1.19.4.4
+++ sys/dev/ic/mfi.c    18 Jan 2012 00:19:07 -0000
@@ -155,13 +155,13 @@
        struct mfi_ccb          *ccb;
        int                     s;
 
-       s = splbio();
+       s = splbio();                   /* OpenBSD 1.97 uses 
mtx_enter(&sc->sc_ccb_mtx) here */
        ccb = TAILQ_FIRST(&sc->sc_ccb_freeq);
        if (ccb) {
                TAILQ_REMOVE(&sc->sc_ccb_freeq, ccb, ccb_link);
                ccb->ccb_state = MFI_CCB_READY;
        }
-       splx(s);
+       splx(s);                        /* OpenBSD 1.97 uses 
mtx_leave(&sc->sc_ccb_mtx) here */
 
        DNPRINTF(MFI_D_CCB, "%s: mfi_get_ccb: %p\n", DEVNAME(sc), ccb);
 
@@ -176,7 +176,7 @@
 
        DNPRINTF(MFI_D_CCB, "%s: mfi_put_ccb: %p\n", DEVNAME(sc), ccb);
 
-       s = splbio();
+       s = splbio();                   /* OpenBSD 1.97 uses 
mtx_enter(&sc->sc_ccb_mtx) here */
        ccb->ccb_state = MFI_CCB_FREE;
        ccb->ccb_xs = NULL;
        ccb->ccb_flags = 0;
@@ -188,7 +188,7 @@
        ccb->ccb_data = NULL;
        ccb->ccb_len = 0;
        TAILQ_INSERT_TAIL(&sc->sc_ccb_freeq, ccb, ccb_link);
-       splx(s);
+       splx(s);                        /* OpenBSD 1.97 uses 
mtx_leave(&sc->sc_ccb_mtx) here */
 }
 
 static int
@@ -640,6 +640,7 @@
                return 1;
 
        TAILQ_INIT(&sc->sc_ccb_freeq);
+       /* OpenBSD 1.97 adds mtx_init(&sc->sc_ccb_mtx, IPL_BIO); here */
 
        status = mfi_fw_state(sc);
        sc->sc_max_cmds = status & MFI_STATE_MAXCMD_MASK;
@@ -1237,6 +1238,16 @@
                if (mfi_poll(ccb))
                        goto done;
        } else {
+               /*
+                * OpenBSD revision 1.90
+                * date: 2009/03/29 01:02:35;  author: dlg;  state: Exp;  
lines: +3 -0
+                * fix a small race in mfi_mgmt between the checking of a ccbs 
completion and
+                * the sleep waiting for the completion. it is possible to get 
the interrupt
+                * completing the command just before the tsleep, which will 
never get a
+                * wakeup because the interrupt with the wakeup has already 
happened.
+                */
+               int s = splbio();
+
                mfi_post(sc, ccb);
 
                DNPRINTF(MFI_D_MISC, "%s: mfi_mgmt_internal sleeping\n",
@@ -1244,6 +1255,8 @@
                while (ccb->ccb_state != MFI_CCB_DONE)
                        tsleep(ccb, PRIBIO, "mfi_mgmt", 0);
 
+               splx(s);
+
                if (ccb->ccb_flags & MFI_CCB_F_ERR)
                        goto done;
        }
@@ -1336,10 +1349,12 @@
 {
        struct mfi_softc *sc = device_private(dev);
        int error = 0;
+#if 0
        int s;
 
        KERNEL_LOCK(1, curlwp);
        s = splbio();
+#endif
 
        DNPRINTF(MFI_D_IOCTL, "%s: mfi_ioctl ", DEVNAME(sc));
 
@@ -1378,8 +1393,10 @@
                DNPRINTF(MFI_D_IOCTL, " invalid ioctl\n");
                error = EINVAL;
        }
+#if 0
        splx(s);
        KERNEL_UNLOCK_ONE(curlwp);
+#endif
 
        DNPRINTF(MFI_D_IOCTL, "%s: mfi_ioctl return %x\n", DEVNAME(sc), error);
        return error;
@@ -1428,6 +1445,13 @@
            sizeof(sc->sc_ld_list), &sc->sc_ld_list, NULL))
                goto done;
 
+       /* OpenBSD 1.86 */
+       if (bv->bv_volid >= sc->sc_ld_list.mll_no_ld) {
+               /* go do hotspares */
+               rv = mfi_bio_hs(sc, bv->bv_volid, MFI_MGMT_VD, bv);
+               goto done;
+       }
+
        i = bv->bv_volid;
        mbox[0] = sc->sc_ld_list.mll_list[i].mll_ld.mld_target;
        DNPRINTF(MFI_D_IOCTL, "%s: mfi_ioctl_vol target %#x\n",
@@ -1437,12 +1461,6 @@
            sizeof(sc->sc_ld_details), &sc->sc_ld_details, mbox))
                goto done;
 
-       if (bv->bv_volid >= sc->sc_ld_list.mll_no_ld) {
-               /* go do hotspares */
-               rv = mfi_bio_hs(sc, bv->bv_volid, MFI_MGMT_VD, bv);
-               goto done;
-       }
-
        strlcpy(bv->bv_dev, sc->sc_ld[i].ld_dev, sizeof(bv->bv_dev));
 
        switch(sc->sc_ld_list.mll_list[i].mll_state) {
@@ -1513,8 +1531,8 @@
        struct mfi_pd_details   *pd;
        struct scsipi_inquiry_data *inqbuf;
        char                    vend[8+16+4+1];
-       int                     i, rv = EINVAL;
-       int                     arr, vol, disk;
+       int                     rv = EINVAL;
+       int                     arr, vol, disk, span;
        uint32_t                size;
        uint8_t                 mbox[MFI_MBOX_SIZE];
 
@@ -1540,11 +1558,6 @@
 
        ar = cfg->mfc_array;
 
-       /* calculate offset to ld structure */
-       ld = (struct mfi_ld_cfg *)(
-           ((uint8_t *)cfg) + offsetof(struct mfi_conf, mfc_array) +
-           cfg->mfc_array_size * cfg->mfc_no_array);
-
        vol = bd->bd_volid;
 
        if (vol >= cfg->mfc_no_ld) {
@@ -1553,20 +1566,29 @@
                goto freeme;
        }
 
-       /* find corresponding array for ld */
-       for (i = 0, arr = 0; i < vol; i++)
-               arr += ld[i].mlc_parm.mpa_span_depth;
+       /* OpenBSD 1.85 */
+       /* calculate offset to ld structure */
+       ld = (struct mfi_ld_cfg *)(
+           ((uint8_t *)cfg) + offsetof(struct mfi_conf, mfc_array) +
+           cfg->mfc_array_size * cfg->mfc_no_array);
+
+       /* use span 0 only when raid group is not spanned */
+       if (ld[vol].mlc_parm.mpa_span_depth > 1)
+               span = bd->bd_diskid / ld[vol].mlc_parm.mpa_no_drv_per_span;
+       else
+               span = 0;
+       arr = ld[vol].mlc_span[span].mls_index;
 
        /* offset disk into pd list */
        disk = bd->bd_diskid % ld[vol].mlc_parm.mpa_no_drv_per_span;
 
-       /* offset array index into the next spans */
-       arr += bd->bd_diskid / ld[vol].mlc_parm.mpa_no_drv_per_span;
-
        bd->bd_target = ar[arr].pd[disk].mar_enc_slot;
+
+       /* get status */
        switch (ar[arr].pd[disk].mar_pd_state){
        case MFI_PD_UNCONFIG_GOOD:
-               bd->bd_status = BIOC_SDUNUSED;
+       case MFI_PD_FAILED:             /* OpenBSD 1.82, OpenBSD PR#5645 */
+               bd->bd_status = BIOC_SDFAILED; /* OpenBSD 1.82, OpenBSD PR#5645 
*/
                break;
 
        case MFI_PD_HOTSPARE: /* XXX dedicated hotspare part of array? */
@@ -1577,10 +1599,11 @@
                bd->bd_status = BIOC_SDOFFLINE;
                break;
 
+#if 0 /* OpenBSD 1.82, OpenBSD PR#5645 */
        case MFI_PD_FAILED:
                bd->bd_status = BIOC_SDFAILED;
                break;
-
+#endif
        case MFI_PD_REBUILD:
                bd->bd_status = BIOC_SDREBUILD;
                break;
@@ -1600,8 +1623,11 @@
        *((uint16_t *)&mbox) = ar[arr].pd[disk].mar_pd.mfp_id;
        memset(pd, 0, sizeof(*pd));
        if (mfi_mgmt_internal(sc, MR_DCMD_PD_GET_INFO, MFI_DATA_IN,
-           sizeof *pd, pd, mbox))
+           sizeof *pd, pd, mbox)) {
+               /* disk is missing but succeed command */
+               rv = 0; /* OpenBSD 1.82, OpenBSD PR#5645 */
                goto freeme;
+       }
 
        bd->bd_size = pd->mpd_size * 512; /* bytes per block */
 
@@ -1948,7 +1974,9 @@
 {
        struct mfi_softc        *sc = sme->sme_cookie;
        struct bioc_vol         bv;
+#if 0
        int s;
+#endif
        int error;
 
        if (edata->sensor >= sc->sc_ld_cnt)
@@ -1956,11 +1984,15 @@
 
        bzero(&bv, sizeof(bv));
        bv.bv_volid = edata->sensor;
+#if 0
        KERNEL_LOCK(1, curlwp);
        s = splbio();
+#endif
        error = mfi_ioctl_vol(sc, &bv);
+#if 0
        splx(s);
        KERNEL_UNLOCK_ONE(curlwp);
+#endif
        if (error)
                return;

Attachment: pgpD4CRptkIJ1.pgp
Description: PGP signature

Follow-Ups:
- Re: ongoing major problems with NetBSD-5 and LOCKDEBUG on multi-core system (mfi(4) related?)
  - From: Thor Lancelot Simon

References:
- ongoing major problems with NetBSD-5 and LOCKDEBUG on multi-core system
  - From: Greg A. Woods
- Re: ongoing major problems with NetBSD-5 and LOCKDEBUG on multi-core system
  - From: Thor Lancelot Simon
- Re: ongoing major problems with NetBSD-5 and LOCKDEBUG on multi-core system (mfi(4) related?)
  - From: Greg A. Woods

Prev by Date: alc(4) for NetBSD-5.1.1
Next by Date: CentOS DomU under NetBSD51 Dom0 XEN 41
Previous by Thread: Re: ongoing major problems with NetBSD-5 and LOCKDEBUG on multi-core system (mfi(4) related?)
Next by Thread: Re: ongoing major problems with NetBSD-5 and LOCKDEBUG on multi-core system (mfi(4) related?)
Indexes:

Home | Main Index | Thread Index | Old Index