Subject: kern/10948: locking problems with raidframe and LOCK_DEBUG
To: None <gnats-bugs@gnats.netbsd.org>
From: None <g.mcgarry@ieee.org>
List: netbsd-bugs
Date: 09/04/2000 14:53:20
>Number:         10948
>Category:       kern
>Synopsis:       simple_lock held over mi_switch with 1.5ALPHA2
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Sep 04 14:54:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Gregory McGarry
>Release:        1.5_ALPHA2 <NetBSD-current source date>
>Organization:
>Environment:
	

NetBSD/i386 1.5_ALPHA2

>Description:

Raidframe was been working flawlessly for quite some time now.  After
re-enabling the LOCK_DEBUG kernel option, the following simple_lock
consistency check fails on every reboot:

switch with held simple_lock 0xc06be188 CPU0 /sys/dev/raidframe/rf_engine.c:
uvm_fault(0xc03172a0, 0x0, 0, 1) -> 1
kernel: page fault trap, code 0
Stopped in raid at kprintf+0x90b: repne scasb (%esi)
db> t
kprintf()
vprintf()
lock_printf()
simple_lock_switchcheck()
mi_switch()
bpendtsleep()
DAGExecutionThread()
db> call simple_lock_dump
all simple locks:
0xc06be188 CPU0 /sys/dev/raidframe/rf_engine.c: 724
	0x4003


>How-To-Repeat:

Compile raidframe support and enable kernel options LOCK_DEBUG.  I
am confused by this, since I am sure that others would be using this
combination, particularly to support MULTIPROCESSOR.

>Fix:

There are changes to sys/dev/raidframe/rf_engine.c on the trunk which
touch the locking macros, but AFAICS there is no function change.  I guess
there is two problems here.  The first is that DAGExecutionThread is
yielding with a held simple_lock.  The following patch should fix the
uvm_fault problem:

ck.c.orig    Tue Sep  5 07:51:05 2000
--- kern_lock.c Tue Sep  5 07:50:48 2000
***************
*** 999,1005 ****
             alp = TAILQ_NEXT(alp, list)) {
                if (alp->lock_holder == cpu_id) {
                        lock_printf("switching with held simple_lock %p "
!                           "CPU %lu %s:%s\n",
                            alp, alp->lock_holder, alp->lock_file,
                            alp->lock_line);
                        SLOCK_DEBUGGER();
--- 999,1005 ----
             alp = TAILQ_NEXT(alp, list)) {
                if (alp->lock_holder == cpu_id) {
                        lock_printf("switching with held simple_lock %p "
!                           "CPU %lu %s:%d\n",
                            alp, alp->lock_holder, alp->lock_file,
                            alp->lock_line);
                        SLOCK_DEBUGGER();
>Release-Note:
>Audit-Trail:
>Unformatted: