Subject: Re: Kernel panic woes
To: Chris G. Demetriou <cgd@netbsd.org>
From: Dave Tyson <Dave.Tyson@liverpool.ac.uk>
List: port-i386
Date: 10/09/1998 22:26:09
On 9 Oct 1998, Chris G. Demetriou wrote:

>christos@zoulas.com (Christos Zoulas) writes:
>> You are dying because of a panic, which is good. Unfortunately you are
>> rebooting immediately after a panic, so that you are not seeing the panic
>> message. In addition, your stack is trashed, so you don't see where panic
>> is being called from.
>
>Uh, his stack trace said:
>
>(gdb) where
>#0  0x6 in ?? ()
>#1  0xf01e5d3b in cpu_reboot ()
>#2  0xf013a949 in panic ()
>#3  0xf01ec5d6 in trap ()
>
>Looks more likely to me that he's got the problem of not getting
>argument information, because the kernel image he's using doesn't have
>debugging symbols.
>
>
>My suggestion:
>
>recompile the kernel from scratch after putting:
>
>makeoptions    DEBUG="-g"      # compile full symbol table
>
>in the config file and re-'config'-ing.
>
>That'll generate two kernels, "netbsd" and "netbsd.gdb".  reboot with
>"netbsd" as your kernel, i.e. make it /netbsd and reboot, etc.
>But the next time it crashes, use "netbsd.gdb" as the symbol file.
>
>
>Also, you can easily check to see what the kernel message buffer
>looked like, via a command like:
>
>dmesg -N netbsd.2 -M netbsd.2.core
>
>etc., which might show you something interesting (including the panic
>message).
>

Thanks,

I knew I could get the kernel message buffer from the dump, but couldn't
remember the magic incantation ;-(

It's quite revealing:

...bits deleted...
de0 at pci0 dev 14 function 0
de0: interrupting at irq 10
de0: 21143 [10-100Mb/s] pass 2.1
de0: address 00:00:d1:1b:28:04
...bits deleted
mb_map full
mb_map full
mb_map full
de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256)
de0: abnormal interrupt: transmit underflow (raising TX threshold to 8|512)
mb_map full
mb_map full
de0: abnormal interrupt: transmit underflow (raising TX threshold to 1024)
de0: abnormal interrupt: transmit underflow (switching to store-and-forward mode)
mb_map full
mb_map full
mb_map full
mb_map full
fatal page fault in supervisor mode
trap type 6 code 0 eip f017ee48 cs 8 eflags 10282 cr2 deadbef7 cpl c0000000
panic: trap
syncing disks... 284 284 211 116 25 done

dumping to dev 1, offset 179861

We usually run with  NMBCLUSTER=4096, but that looks a bit small now.
I have been thinking of replacing the adaptec ethernet (eek!) card with
an IntelExpress one in the hope that it would behave a little better.

The supervisor trap is a bit unwelcome with the 'deadbef7' looking a bit
of a giveaway.


--
=====================================================================
Dave Tyson			Phone: 0151-794-3731
Computing Services Dept         Fax:   0151-794-3759
The University of Liverpool     Email: dtyson@liv.ac.uk	
Chadwick Building		Web:   http://www.liv.ac.uk/~dtyson 
Peach Street			
Liverpool  L69 7ZF		Why not use a real OS like NetBSD ?
United Kingdom                 
=====================================================================