Subject: Re: Boot Floppy Image Available
To: Curt Sampson <cjs@portal.ca>
From: Andrew Gallatin <gallatin@cs.duke.edu>
List: port-alpha
Date: 12/08/1997 17:32:32
Curt Sampson writes:
 > On Mon, 8 Dec 1997, Andrew Gallatin wrote:
 > 
 > > Well, I've finally had time to play with it, and the problem appears
 > > to be isolated to the kernel on that floppy.  Neither the generic
 > > kernel from the last 1.3-ALPHA snapshot (NetBSD 1.3_ALPHA (GENERIC)
 > > #2: Sat Nov 15 17:04:50 PST 1997) nor a kernel built from this
 > > morning's source tarballs exhibits the problem.
 > 
 > Did you build a GENERIC or an INSTALL kernel from this morning's
 > source tarballs?

It was a GENERIC kernel that I slimmed down to match my hardware.

It seems that building & booting the INSTALL kernel exhibits the same
problem -- the interrupt dispatch code jumps into never-never land 
immediately after bringing up de1.


...
	de0 at pci1 dev 0 function 0
	de0: interrupting at kn20aa irq 16
	de0: DEC 21040 [10Mb/s] pass 2.3
	de0: address 08:00:2b:e7:e6:d6
...
	de1 at pci0 dev 9 function 0
	de1: interrupting at kn20aa irq 12
	de1: DEC DE500-XA 21140 [10-100Mb/s] pass 1.1
	de1: address 00:00:f8:00:99:ba

Looking at kn20aa_pci_intr[] in a coredump, I see:

For de0:

(gdb) p kn20aa_pci_intr[16]->intr_q->tqh_first
$35 = (struct alpha_shared_intrhand *) 0xfffffe004a597e80
(gdb) p *kn20aa_pci_intr[16]->intr_q->tqh_first
$36 = {ih_q = {tqe_next = 0x0, tqe_prev = 0xfffffe004a59be00}, 
  ih_fn = 0xfffffc0000425728 <tulip_intr_normal>, ih_arg = 0xfffffe004a59f000, 
  ih_level = 2}

For de1:

(gdb) p kn20aa_pci_intr[12]->intr_q->tqh_first
$37 = (struct alpha_shared_intrhand *) 0xfffffe004a597580
(gdb) p *kn20aa_pci_intr[12]->intr_q->tqh_first
$38 = {ih_q = {tqe_next = 0xfffffc000077f318, tqe_prev = 0xfffffc000077f470}, 
  ih_fn = 0xfffffc000077f628 <cia_configuration+792>, 
  ih_arg = 0xfffffe0000000001, ih_level = 2}



The ih_fn, and most of ih_q is obviously bogus for de1.  

 > We've sometimes seen problems with kernels that don't have DIAGNOSTIC
 > or other options turned on. If the INSTALL kernel still fails, the
 > best thing to do is to start adding back the things in GENERIC but
 > not INSTALL, one by one, until we figure out which option it is
 > that causes things to break.
 > 

I've tried removing DIAGNOSTIC, adding the MEMORY_DISK options, and
changing the load address back to 0xfffffc0000300000 in my 'GENERIC'
kernel & it still works.  I'll try some other options later..  Is it
possible that its just a size problem?  The install kernel is
quite large:

text    data    bss     dec     hex     filename
1330392 95184   357984  1783560 1b3708  /netbsd  (my normal, slim kernel)
1624768 2217456 881200  4723424 4812e0  /netbsd.install (the failing kernel)
1330848 2192400 358048  3881296 3b3950  /netbsd.testing (the still-working kernel)


And I've found another problem -- if I boot NetBSD twice on this
machine with the tga driver built in, the second boot will always fail
with panic in the tga code:

tga0 at pci0 dev 11 function 0panic: tga_bt485_init: already have private struct
halted.

If I comment out the test at line 165 in tga_bt485.c, this panic goes
away.  I don't care about this, since I'm running with a serial
console (hence the hackish "fix"), but thought I should bring it up.
Powering off the box also fixes it.  Perhaps there's a missing bzero
someplace?

Drew