Subject: kern/10948: locking problems with raidframe and LOCK_DEBUG
To: None <gnats-bugs@gnats.netbsd.org>
From: None <g.mcgarry@ieee.org>
List: netbsd-bugs
Date: 09/04/2000 14:53:20
>Number: 10948
>Category: kern
>Synopsis: simple_lock held over mi_switch with 1.5ALPHA2
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Sep 04 14:54:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator: Gregory McGarry
>Release: 1.5_ALPHA2 <NetBSD-current source date>
>Organization:
>Environment:
NetBSD/i386 1.5_ALPHA2
>Description:
Raidframe was been working flawlessly for quite some time now. After
re-enabling the LOCK_DEBUG kernel option, the following simple_lock
consistency check fails on every reboot:
switch with held simple_lock 0xc06be188 CPU0 /sys/dev/raidframe/rf_engine.c:
uvm_fault(0xc03172a0, 0x0, 0, 1) -> 1
kernel: page fault trap, code 0
Stopped in raid at kprintf+0x90b: repne scasb (%esi)
db> t
kprintf()
vprintf()
lock_printf()
simple_lock_switchcheck()
mi_switch()
bpendtsleep()
DAGExecutionThread()
db> call simple_lock_dump
all simple locks:
0xc06be188 CPU0 /sys/dev/raidframe/rf_engine.c: 724
0x4003
>How-To-Repeat:
Compile raidframe support and enable kernel options LOCK_DEBUG. I
am confused by this, since I am sure that others would be using this
combination, particularly to support MULTIPROCESSOR.
>Fix:
There are changes to sys/dev/raidframe/rf_engine.c on the trunk which
touch the locking macros, but AFAICS there is no function change. I guess
there is two problems here. The first is that DAGExecutionThread is
yielding with a held simple_lock. The following patch should fix the
uvm_fault problem:
ck.c.orig Tue Sep 5 07:51:05 2000
--- kern_lock.c Tue Sep 5 07:50:48 2000
***************
*** 999,1005 ****
alp = TAILQ_NEXT(alp, list)) {
if (alp->lock_holder == cpu_id) {
lock_printf("switching with held simple_lock %p "
! "CPU %lu %s:%s\n",
alp, alp->lock_holder, alp->lock_file,
alp->lock_line);
SLOCK_DEBUGGER();
--- 999,1005 ----
alp = TAILQ_NEXT(alp, list)) {
if (alp->lock_holder == cpu_id) {
lock_printf("switching with held simple_lock %p "
! "CPU %lu %s:%d\n",
alp, alp->lock_holder, alp->lock_file,
alp->lock_line);
SLOCK_DEBUGGER();
>Release-Note:
>Audit-Trail:
>Unformatted: