NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/39993: lockup on i386 SMP (raidframe related ?)
On Fri, Nov 21, 2008 at 11:35:00AM +0000, bouyer%antioche.eu.org@localhost
wrote:
> >Number: 39993
> >Category: kern
> >Synopsis: lockup on i386 SMP (raidframe related ?)
> >Confidential: no
> >Severity: critical
> >Priority: high
> >Responsible: kern-bug-people
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Fri Nov 21 11:35:00 +0000 2008
> >Originator: Manuel Bouyer
> >Release: NetBSD 5.0_BETA
> >Organization:
> >Environment:
> System: NetBSD antioche.lip6.fr 5.0_BETA NetBSD 5.0_BETA (ANTIOCHE5) #2: Thu
> Nov 20 23:55:28 CET 2008
> bouyer@roll:/dsk/l1/misc/bouyer/tmp/i386/obj/dsk/l1/misc/bouyer/netbsd-5/src/sys/arch/i386/compile/ANTIOCHE5
> i386
> Architecture: i386
> Machine: i386
> >Description:
> This system is a dual-CPU PIII system, with several SCSI disks on
> multiple esiop controllers. Some of them are part of raid-1
> raidframe volume (2 disks per volume). A SMP kernel will lookup
> within minutes after boot, under I/O load. The system is unresponsive
> to network or console (no ping, and no characters echoed on serial
> console) but I could enter ddb using cnmagic sequence. Here's
> what I found from ddb:
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 eip c03aabec cs 8 eflags 202 cr2 bb504000 ilevel 8
> Stopped in pid 0.4 (system) at netbsd:breakpoint+0x4: popl %ebp
> db{0}> tr
> breakpoint(0,3f8,0,6,ca7953c0,cbd01900,cbb5beb0,c1540010,c1541000,7fa) at
> netbsd:breakpoint+0x4
> comintr(cbd017f4,cbb5bec0,6,10,c04f0030,cbb50010,c04f0010,c04f4800,1,cbb5bf4c)
> at netbsd:comintr+0x566
> Xintr_ioapic_edge4() at netbsd:Xintr_ioapic_edge4+0xa9
> --- interrupt ---
> fatal page fault in supervisor mode
> trap type 6 code 0 eip c03ad04f cs 8 eflags 10206 cr2 3e ilevel 8
> kernel: supervisor trap page fault, code=0
> Faulted in DDB; continuing...
> db{0}> mach cpu 1
> using CPU 1
> db{0}> tr
> __cpu_simple_lock(c1702c40,c15a99f8,0,0,c15560d8,c15560d0,0,c1556000,c01c52c0,
> cc5a2d20) at netbsd:__cpu_simple_lock+0x1c
> rf_RaidIOThread(c1556000,0,c01002a7,0,c01002a7,0,0,0,0,0) at
> netbsd:rf_RaidIOThread+0x7f
>
> rf_RaidIOThread+0x7f is:
> 0xc01c533f is in rf_RaidIOThread
> (/dsk/l1/misc/bouyer/netbsd-5/src/sys/dev/raidf
> rame/rf_engine.c:863).
> 858 /* See what I/Os, if any, have arrived */
> 859 while ((req = TAILQ_FIRST(&(raidPtr->iodone))) !=
> NULL) {
> 860 TAILQ_REMOVE(&(raidPtr->iodone), req,
> iodone_entries);
> 861 simple_unlock(&(raidPtr->iodone_lock));
> 862 rf_DiskIOComplete(req->queue, req,
> req->error);
> 863 (req->CompleteFunc) (req->argument,
> req->error);
> 864 simple_lock(&(raidPtr->iodone_lock));
> 865 }
>
> 0xc01c5324 <rf_RaidIOThread+100>: call 0xc010cd90
> <__cpu_simple_unlock>
> 0xc01c5329 <rf_RaidIOThread+105>: mov 0x70(%ebx),%eax
> 0xc01c532c <rf_RaidIOThread+108>: mov 0x58(%ebx),%edx
> 0xc01c532f <rf_RaidIOThread+111>: mov %ebx,0x4(%esp)
> 0xc01c5333 <rf_RaidIOThread+115>: mov %eax,0x8(%esp)
> 0xc01c5337 <rf_RaidIOThread+119>: mov %edx,(%esp)
> 0xc01c533a <rf_RaidIOThread+122>: call 0xc01c1780 <rf_DiskIOComplete>
> 0xc01c533f <rf_RaidIOThread+127>: mov 0x2c(%ebx),%edx
> 0xc01c5342 <rf_RaidIOThread+130>: mov 0x70(%ebx),%eax
> 0xc01c5345 <rf_RaidIOThread+133>: mov 0x28(%ebx),%ecx
> 0xc01c5348 <rf_RaidIOThread+136>: mov %edx,(%esp)
> 0xc01c534b <rf_RaidIOThread+139>: mov %eax,0x4(%esp)
> 0xc01c534f <rf_RaidIOThread+143>: call *%ecx
> 0xc01c5351 <rf_RaidIOThread+145>: mov %esi,(%esp)
> 0xc01c5354 <rf_RaidIOThread+148>: call 0xc010cd70 <__cpu_simple_lock>
here's what gdb says about it:
#0 0xc03b11e7 in cpu_reboot ()
#1 0xc0305348 in panic ()
#2 0xc02fdcdb in lockdebug_abort1 ()
#3 0xc02d3a24 in mutex_vector_enter ()
#4 0xc02e87b6 in suspendsched ()
#5 0xc03467a3 in vfs_shutdown ()
#6 0xc03b1227 in cpu_reboot ()
#7 0xc01a3fa8 in db_reboot_cmd ()
#8 0xc01a3ab8 in db_command ()
#9 0xc01a3e02 in db_command_loop ()
#10 0xc01a6d10 in db_trap ()
#11 0xc03ac4bb in kdb_trap ()
#12 0xc03b3283 in trap ()
#13 0xc010cc36 in calltrap ()
#14 0xc03aabec in breakpoint ()
#15 0xc01ec4a6 in comintr ()
#16 0xc0103949 in Xintr_ioapic_edge4 ()
#17 0xc02ce55f in _kernel_lock ()
#18 0xc039bd56 in intr_biglock_wrapper ()
#19 0xc010718d in Xintr_ioapic_level2 ()
#20 0xc03aac85 in x86_stihlt ()
Previous frame inner to this frame (corrupt stack?)
Any idea on how to get more informations ?
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index