Subject: Re: E250 support and failed boot
To: Mark Blackman <mark.blackman@dircon.net>
From: Eduardo Horvath <eeh@turbolinux.com>
List: port-sparc64
Date: 08/28/2000 10:00:23
On Fri, 25 Aug 2000, Mark Blackman wrote:

> > On Fri, 25 Aug 2000, Mark Blackman wrote:
> > 
> > > partial success. Got most of the way through the kernel
> > > device attach. Got dumped into the debugger by a panic.
> > > not familiar with netbsd kernel debugger. suggestions appreciated?
> > > 
> > > [much verboseness elided]
> > > clock0 at ebus0 addr 0-1fff
> > > clock0: device_register: bpname network()
> > > : mk48t59: hostid ffffffff80cfd209       
> > > flashprom at ebus0 addr 0-fffff addr 0-fffff not configured
> > > SUNW,envctrltwo at ebus0 addr 600000-600003 ipl 40 ipl 37 not configured
> > > hme1 at pci0 dev 1 function 1                                           
> > > hme1: device_register: bpname network((null))
> > > instance_match: pci device, want dev 0x1 fn 0x1 have dev 0x1 fn 0x1
> > >         -- found ethernet controller hme1                          
> > > : address 08:00:20:cf:d2:09              
> > > nsphy0 at hme1 phy 1: DP83840 10/100 media interface, rev. 1
> > > nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> > > hme1: using vector 0 for interrupt                          
> > > data memory error type 32 sfsr=0x0 sfva=0xfe037010 afsr=0x88000000 afva=0x1fe01000a00
> >  tf=0xf1809100
> > > panic: trap: memory error                                                            
> >               
> > > kdb breakpoint at 0xf126d168
> > > Stopped in swapper at   cpu_Debugger+0x4:       nop
> > > db>                                                
> > 
> > Hm.  Looks like an async fault trying to access pa=0x1fe01000a00.
> > 
> > Try:
> > 
> > 	trace
> > 
> > 	machine tf
> > 
> > grab the tpc from the trapframe and do:
> > 
> > 	x/i <tpc value>
> 
> not obvious to me which of those values is the tpc so i tried
> two or three candidates. screen replay follows...
> 
> db> trace
> data_access_error(32, fe037010, 0, 1fe01000a00, f1410000, f1809100) at data_acce
> ss_error+0x338                                                                  
> data_error(4008002, 1d, f1802000, 1fe010009ec, 4, 1fe01000a00) at data_error+0x4
>                                                                                 
> confaddr_ok(f270ff00, a00, f1809470, f270bd30, f142c000, 4) at confaddr_ok+0xa8
> pci_conf_read(f1425d80, a00, 0, 200, 50a0000, 0) at pci_conf_read+0x98         
> pci_probe_bus(f26f3a00, 8, ffff, 2, f26f3d00, f26f3c80) at pci_probe_bus+0x10c
> pciattach(f12c26f0, f26f3a00, f18097d0, f1225420, f270bf90, f142c000) at pciatta

Hm.  Looks like the error occurred from 
confaddr_ok().  confaddr_ok() should be using probeget() to access the
address.  The pointer to the trapframe causing that problem should be the
second argument to data_access_error, so you can dump it with `mach tf
fe037010'.

Anyway, probeget should be able to safely access bad memory
locations.  Looks like a bug in probeget().  Let me fix that real quick.

O.K.  I updated: 

	ftp.netbsd.org:/pub/NetBSD/arch/sparc64/other/netbsd.bootdebug

with another fix.  Give it a try.

Eduardo Horvath