Subject: Re: SCSI
To: Elmar Kolkman <kolkmae@la1.apd.dec.com>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: port-hp300
Date: 01/27/1997 12:00:42
On Mon, 27 Jan 1997 08:45:43 +0100 (CET) 
 "Elmar Kolkman" <kolkmae@la1.apd.dec.com> wrote:

 > I've tried a bit more, and I'm sure it ISN'T the SCSI code. I will attach
 > the full boot log from SCSI at the end of this file, but I've also tried by
 > removing ALL SCSI hardware, including the controller, from my system. It
 > still hangs with the 'old' 1.2 kernel (I didn't have a 1.2b prerelease
 > kernel) when netbooting from my linux-machine.

Thanks, your stack trace is _very_ helpful...  From looking at the
info you've provided, I see where the problem is, and I'm fairly sure
I know what's causing it...  More below..

 > > Well, I know the DCM driver works, since I'm using it to ppp to my ISP,
 > > so I can type this mail :-)
 > 
 > But then again, your machine at least starts, which I cann't say about mine.
 > ;-)

Yes, but it's worth noting, I'm not using the DCM as the console (I'm
using a Catseye framebuffer).

 > OK, but to make the debugging a bit easier, I will copy the whole booting
 > process, so you (all) see the rest of the process too. Maybe it is some
 > setting, because I don't have any documentation on this machine...
 > (I removed some '^H ^H stuff...)

Cool... (BTW, that self-test is _really_ cool looking, with all the
serial MUX entries :-)

 > dca0 at scode 9 ipl 5 flags 0x1: no fifo
 > dcm0 at scode 10 ipl 3 flags 0
 > dcm1 at scode 11 ipl 3 flags 0xetrap: bad kernel read access at 0x6e

...ok, I'm assuming that the console is on dcm1?  Can you tell
me _exactly_ which board the console is on?  (I'm assuming it's on
the port marked "console", since that's the only one the remote bit
affects :-)

Ok, here's a quick tutorial on using this kind of information...

Note the address in the "trap" message:

	trap: bad kernel read access at 0x6e

0x6e is in the first "page" (i.e. it's less than 0x1000).  This page
is not mapped ... i.e. the pte for this page doesn't have the PG_V bit
set.  This causes dereferences of NULL pointers to cause the trap
you're seeing (i.e. it's designed to catch bugs :-).

So, what that has told you is that you attempted to deref NULL.  This
is the kernel equivalent of getting a SIGSEGV (and, like catching SIGSEGV
in a user program, it's fatal).

 > trap type 8, code = 0x402074d, v = 0x6e
 > kernel program counter = 0xa6c0a
 > kernel: MMU fault trap
 > panic: MMU fault
 > Stopped at      _Debugger+0x6:  unlk    a6
 > db> trace
 > _Debugger(200ac,a0ccf,144cdc,2304,144d0c) + 6
 > _panic(a0ccf,1,1,eb4c4,3) + 34
 > _trap(8,402074d,6e) + 21a
 > _addrerr(?)
 > _dcmxint(eb4c4,1,12,0,0) + 10c

Ok...this is the part of the stack trace that tells you where the
problem occurred.  Basically, you ere in the function dcmxint()
when an address error occurred; the CPU jumps to that function
when an invalid address is used.

Ok, so, if you look at the dcmxint() function (sys/arch/hp300/dev/dcm.c,
line 898), it's pretty clear what's happening...

You're getting an interrupt, and you're dereferencing "tp", which
is NULL... it's NULL because the port hasn't yet been opened, which
means the tty structure hasn't been allocated yet.

"Oops!"  :-)

So, I have a question for you... "Do you have XON/XOFF flow control
enabled on the terminal you're using?"

If you do, please try disabling it, and tell me if that helps.  In the
mean time, I'll look for the nicest way to fix that bug...

I may need to send you a kernel or two to netboot, for testing, as well.

 > _dcmpint(eb4c4,1,1) + 2c
 > _dcmintr(eb4c4,219df,2004,a20000,144e84) + de
 > _isrdispatch(6c) + 7a
 > _intrhand(?)
 > _dcmselftest(eb52e,c8ca8,eb52e,28) + a

...hmm, and given that this is in the trace... I decided to look and see
what it does, and I've found a couple of slight bugs in it... *sigh*

 > _dcmattach(c8c7c,e8d88,9263e,de45c,de46c) + 82
 > _find_device(de45c) + 15e
 > _configure(c,ff801000,fffffffc,13a000,ffeffffc) + 92
 > _cpu_startup(c992c,c,ff801000,fffffffc,13a000) + 2f2
 > _main() + 4a
 > _main() + 4a
 > db> 
 > ----- End of minicom.cap ----
 > 
 > 					Elmar
 > -- 
 > Alp =	1) One of a number of ski mountains in Europe
 > 	2) A shouted request for assistance made by a European skier in
 > 	   America. An appropriate reply is "What's Zermatter ?".
 > 				Henry Beard & Roy McKie 
 > 
 > This mail was brought to you by:
 > 
 > 		Elmar Kolkman.
 > 
 > He can be reached as 'kolkmae@apd.dec.com' or 'elmar@usn.nl'

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                               Home: 408.866.1912
NAS: M/S 258-6                                          Work: 415.604.0935
Moffett Field, CA 94035                                Pager: 415.428.6939