Subject: Re: panics
To: Bartholomew Niswonger <bniswong@midway.uchicago.edu>
From: Michael L. Hitch <osymh@gemini.oscs.montana.edu>
List: amiga
Date: 10/23/1994 16:54:42
On Oct 23,  2:52pm, Bartholomew Niswonger wrote:
> in the first 3 cases I got --->  panic: kernel jumped to zero
>          			 stopped at 0x84010: unlk a6
> (in the first one it said -> unlk a6ng)

  This can be nasty to track down:  the jump to zero seems to leave
the kernel stack in a state that is difficult to trace back to see
where the jump to zero occurred.

  This is either called by an undefined reference in the kernel (which
would give you undefined symbol errors on the link) or indirect calls
through addresses stored in tables or other data structures.  If it's
from a subroutine call (jsr or bsr), it might be possible to locate
the return address in the stack and try to figure out where the
destination address is coming from.

  The address 0x84010 is going to be Debugger+0x?, and will always
be what is shown when the debugger is configured.

> the fourth was 
> 
> vm_fault(e8000, 322e000,3,0) -> 1
> type 8, code[mmu,ssw]: 485
> trap type 8, code = 485, v = 322e200
> pid = 117, pc = 0009B99C, ps = 2300, sfc = 0001, dfc = 0001
> 
> panic: MMU fault
> stopped at 0x84010: unlk a6

> I did a show all procs in db> to see what pid 117 was.. here is what I
> hope is all the pertinent info
> 
> pid         addr   uid ppid
> 119 577a00 322a000 10 118 119 004086 3 tcsh 	pause 	322a1c8
> 118 576c00 3224000  0  73  73 004084 3 telnetd 	select 	adc5c
> 117 576e00 3208000  0 104 117 05106  2 telnet
> 
> the mmu error seems to have something to do with addr 322e000, which I
> would think is not in telnet's addr space by the above.. 

  What gives you that idea?  The process table doesn't have anything about
the address space of the processes.  You need to dig through the vm map
tables for the processes to know anything about the process address space.
I think there is an option on the ps command in ddb to display the address
of the process map information, and other commands to display the map
information.  I don't think the map and object commands work correctly
though.

  You can examine the instruction at the trap to see what it is and 
to try to determine what it was attempting to do.  In this case,
ex/i 0009b99c would show you the instruction at the current PC location
of the MMU exception.  The 68040 complicates this considerably though.
If the MMU fault was due to a writeback fault, the actual instruction
that caused the fault will have already been executed and probably several
instructions back from the current PC.

> now for the questions:
> 
> the biggest one is everytime it crashes, I say continue, and it says
> 
> syncing disks x x y y y y y y y y y y y y y y y y y y giving up
> ...
> 
> where x was 5 for the first 3 crashes and 4 for the last one 
>       y was 1 for crash number 1,3,4 and 2 for crash 2
> 
> what is this all about.. what does it mean giving up???
> that really scared me the first time since it was right in the middle
> ot writing to the disk, with amigados I would have had major block
> error trouble.. but fortunately there seem to be no lasting problems..

  I think the numbers are the number of disk buffers in memory.  The
crash routine is waiting for all the buffers to be written to disk.
It's not going to wait forever, so after a certain amount of time, it's
going to give up.  I've seen this type of behaviour when I am using the
mfs file system for the /tmp, so I suspect the buffers that haven't been
written out are probably for the mfs files.  Since that requires the
process running mfs to deal with the buffers, and there aren't any
running processes at that point, the buffers will not be flushed.  Since
they are only going to memory, it's no problem.

> nyway.. I am not sure this shouldnt have been sent to current-users
> instead, but ah well.. 

  If it's a problem with the common kernel code, then current-users would
be the place. The trick is in determining if the problem is specific to
the Amiga or to all architectures.

> anyother info needed.. just ask

  If you use the -S option on the loadbsd command, loadbsd and netbsd will
provide the kernel symbol table to ddb.  Then the addresses displayed
will include symbolic information.

  A stack trace back (using the ddb trace command) is usually quite useful
in determining how the kernel got to where it did.

  Since you appear to be running your own kernel, the absolute locations
aren't going to mean much to anyone.  Having symbolic locations and
offsets are much more useful and can be used by those of us who are
familar enough with the kernel internals to try to debug dumps like
these.

Michael

-- 
Michael L. Hitch			INTERNET:  osymh@montana.edu
Computer Consultant
Office of Systems and Computing Services
Montana State University	Bozeman, MT	USA