kern/41591: nested-panic loop, no reboot

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/41591: nested-panic loop, no reboot
From: raeburn%raeburn.org@localhost
Date: Sat, 13 Jun 2009 20:10:00 +0000 (UTC)

>Number:         41591
>Category:       kern
>Synopsis:       nested-panic loop, no reboot
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jun 13 20:10:00 +0000 2009
>Originator:     Ken Raeburn
>Release:        NetBSD 5.0
>Organization:
>Environment:
System: NetBSD raeburn.org 5.0 NetBSD 5.0 (GENERIC) #0: Sun Apr 26 18:50:08 UTC 
2009 
builds%b6.netbsd.org@localhost:/home/builds/ab/netbsd-5-0-RELEASE/i386/200904260229Z-obj/home/builds/ab/netbsd-5-0-RELEASE/src/sys/arch/i386/compile/GENERIC
 i386
Architecture: i386
Machine: i386
>Description:

In an attempt to handle a panic (probably a copy_mbuf one like I just
reported), the kernel gets a fault in attempting to write out a crash
dump, panics, and attempts to write out a crash dump again, until I
hit a reset button (which this system doesn't appear to have) or
power-cycle it (so, no kernel messages remaining in memory).

When I got to the console, it was filled with repetitions of:

trap type 6 code 2 eip c05400b4 cs 8 eflags 10246 cr2 cd12e600 ilevel 8
panic: trap
Faulted in mid-traceback: aborting
dumping to dev 0,1 offset 8
dump fatal page fault in supervisor mode
trap type 6 code 2 eip c05400b4 cs 8 eflags 10246 cr2 cd12e600 ilevel 8
[...]

where c05400b4 is:

0xc054008f <dodumpsys+719>:     call   0xc052d7b0 <pmap_extract>
0xc0540094 <dodumpsys+724>:     test   %al,%al
0xc0540096 <dodumpsys+726>:     je     0xc05400b6 <dodumpsys+758>
0xc0540098 <dodumpsys+728>:     mov    0xfffffff0(%ebp),%ecx
0xc054009b <dodumpsys+731>:     mov    0xc0b15060,%eax
0xc05400a0 <dodumpsys+736>:     mov    %ecx,%edx
0xc05400a2 <dodumpsys+738>:     shr    $0xf,%edx
0xc05400a5 <dodumpsys+741>:     shr    $0xc,%ecx
0xc05400a8 <dodumpsys+744>:     add    %eax,%edx
0xc05400aa <dodumpsys+746>:     and    $0x7,%ecx
0xc05400ad <dodumpsys+749>:     mov    $0x1,%eax
0xc05400b2 <dodumpsys+754>:     shl    %cl,%eax
0xc05400b4 <dodumpsys+756>:     or     %al,(%edx)         **********
0xc05400b6 <dodumpsys+758>:     add    $0x1000,%ebx
0xc05400bc <dodumpsys+764>:     jne    0xc0540080 <dodumpsys+704>
0xc05400be <dodumpsys+766>:     jmp    0xc053feb3 <dodumpsys+243>

so I'm guessing it's in the setbit call in the loop in
sparse_dump_mark, the only place in dumpsys.c where I see a call to
pmap_extract; either sparse_dump_physmap is a bad pointer or
p/PAGE_SIZE is out of range.

This obviously makes it worse for my router than 4.0.1, which just
rebooted on panic. :-(

>How-To-Repeat:
        ?
>Fix:
        Once dumping has been started once, it should be disabled for
        any further panic calls.

Prev by Date: NetBSD Nightly Trouble Ticket Report
Next by Date: Re: install/36067
Previous by Thread: port-sgimips/41590: rescue binaries (all crunched binaries?) not working
Next by Thread: Re: install/36067
Indexes:

Home | Main Index | Thread Index | Old Index