NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/39993: lockup on i386 SMP (raidframe related ?)
>Number: 39993
>Category: kern
>Synopsis: lockup on i386 SMP (raidframe related ?)
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Nov 21 11:35:00 +0000 2008
>Originator: Manuel Bouyer
>Release: NetBSD 5.0_BETA
>Organization:
>Environment:
System: NetBSD antioche.lip6.fr 5.0_BETA NetBSD 5.0_BETA (ANTIOCHE5) #2: Thu
Nov 20 23:55:28 CET 2008
bouyer@roll:/dsk/l1/misc/bouyer/tmp/i386/obj/dsk/l1/misc/bouyer/netbsd-5/src/sys/arch/i386/compile/ANTIOCHE5
i386
Architecture: i386
Machine: i386
>Description:
This system is a dual-CPU PIII system, with several SCSI disks on
multiple esiop controllers. Some of them are part of raid-1
raidframe volume (2 disks per volume). A SMP kernel will lookup
within minutes after boot, under I/O load. The system is unresponsive
to network or console (no ping, and no characters echoed on serial
console) but I could enter ddb using cnmagic sequence. Here's
what I found from ddb:
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c03aabec cs 8 eflags 202 cr2 bb504000 ilevel 8
Stopped in pid 0.4 (system) at netbsd:breakpoint+0x4: popl %ebp
db{0}> tr
breakpoint(0,3f8,0,6,ca7953c0,cbd01900,cbb5beb0,c1540010,c1541000,7fa) at
netbsd:breakpoint+0x4
comintr(cbd017f4,cbb5bec0,6,10,c04f0030,cbb50010,c04f0010,c04f4800,1,cbb5bf4c)
at netbsd:comintr+0x566
Xintr_ioapic_edge4() at netbsd:Xintr_ioapic_edge4+0xa9
--- interrupt ---
fatal page fault in supervisor mode
trap type 6 code 0 eip c03ad04f cs 8 eflags 10206 cr2 3e ilevel 8
kernel: supervisor trap page fault, code=0
Faulted in DDB; continuing...
db{0}> mach cpu 1
using CPU 1
db{0}> tr
__cpu_simple_lock(c1702c40,c15a99f8,0,0,c15560d8,c15560d0,0,c1556000,c01c52c0,
cc5a2d20) at netbsd:__cpu_simple_lock+0x1c
rf_RaidIOThread(c1556000,0,c01002a7,0,c01002a7,0,0,0,0,0) at
netbsd:rf_RaidIOThread+0x7f
rf_RaidIOThread+0x7f is:
0xc01c533f is in rf_RaidIOThread (/dsk/l1/misc/bouyer/netbsd-5/src/sys/dev/raidf
rame/rf_engine.c:863).
858 /* See what I/Os, if any, have arrived */
859 while ((req = TAILQ_FIRST(&(raidPtr->iodone))) != NULL)
{
860 TAILQ_REMOVE(&(raidPtr->iodone), req,
iodone_entries);
861 simple_unlock(&(raidPtr->iodone_lock));
862 rf_DiskIOComplete(req->queue, req, req->error);
863 (req->CompleteFunc) (req->argument, req->error);
864 simple_lock(&(raidPtr->iodone_lock));
865 }
0xc01c5324 <rf_RaidIOThread+100>: call 0xc010cd90 <__cpu_simple_unlock>
0xc01c5329 <rf_RaidIOThread+105>: mov 0x70(%ebx),%eax
0xc01c532c <rf_RaidIOThread+108>: mov 0x58(%ebx),%edx
0xc01c532f <rf_RaidIOThread+111>: mov %ebx,0x4(%esp)
0xc01c5333 <rf_RaidIOThread+115>: mov %eax,0x8(%esp)
0xc01c5337 <rf_RaidIOThread+119>: mov %edx,(%esp)
0xc01c533a <rf_RaidIOThread+122>: call 0xc01c1780 <rf_DiskIOComplete>
0xc01c533f <rf_RaidIOThread+127>: mov 0x2c(%ebx),%edx
0xc01c5342 <rf_RaidIOThread+130>: mov 0x70(%ebx),%eax
0xc01c5345 <rf_RaidIOThread+133>: mov 0x28(%ebx),%ecx
0xc01c5348 <rf_RaidIOThread+136>: mov %edx,(%esp)
0xc01c534b <rf_RaidIOThread+139>: mov %eax,0x4(%esp)
0xc01c534f <rf_RaidIOThread+143>: call *%ecx
0xc01c5351 <rf_RaidIOThread+145>: mov %esi,(%esp)
0xc01c5354 <rf_RaidIOThread+148>: call 0xc010cd70 <__cpu_simple_lock>
I also have a core dump. The stack traces were indentical in all hangs I got.
disabling SMP at boot (boot -1) work around the problem.
LOCKDEBUG+DEBUG+DIAGNOSTIC does't give any additionnal info.
This hardware was running without issues under 3.1 with SMP.
>How-To-Repeat:
boot a SMP system with raidframe and generate I/O ?
>Fix:
unknown
Home |
Main Index |
Thread Index |
Old Index