Subject: Re: ddb and kernel not auto-selecting root device
To: None <heas@shrubbery.net, port-sparc64@netbsd.org>
From: None <eeh@netbsd.org>
List: port-sparc64
Date: 10/09/2001 18:35:34
| i have a sparcengine which fails to auto-select the root device at
| boot time.
|
| NetBSD 1.5Y (sky) #0: Tue Oct  9 02:25:25 UTC 2001
|     root@sky:/home/src/sys/arch/sparc64/compile/sky
| total memory = 128 MB
| avail memory = 109 MB
| using 832 buffers containing 6656 KB of memory
| bootpath: /pci@1f,0/pci@1,0/scsi@1,0/disk@0,0
| mainbus0 (root): SUNW,UltraSPARC-IIi-Engine
| cpu0 at mainbus0: SUNW,UltraSPARC-IIi @ 440.127 MHz, version 0 FPU


| siop0 at pci2 dev 1 function 0: Symbios Logic 53c875 (ultra-wide scsi)
| siop0: using on-board RAM
| siop0: interrupting at ivec 20
| scsibus0 at siop0: 16 targets, 8 luns per target


| the last time i encountered this, it was a missing entry for SUNW/fas for
| bus_compatible().

This is a problem with siop and other PCI SCSI controllers.  The reason 
this is a problem for siop and not for fas is that PCI devices use the 
`generic' names for device nodes, such as `scsi', `ide' and `disk', 
while SBus uses the specific name such as `SUNW,fas', `esp', and `sd'.
This means that `disk' can be either an IDE disk or a SCSI disk.  The
current code is, uh, inflexible and maps `disk' to `wd', which means it
will only attach to an IDE controller.

The proper fix to this problem is to completely rewrite the bootpath code
so instead of matching a device name to a kernel driver and then trying
to use properties to match it to a specific instance, it uses the firmware
to find the parent device node and matches this node number with the node
number associated with a specific device driver instance.  Then we only need
to handle the last level of mapping `scsi', `wd', or `network' devices.

However, since this requires matching a device instance to a PROM node,
it will probably have to wait for devprops, since there is no generic
way to determine what PROM node a device instance corresponds to from
inside device_register().

| i'm a ddb novice and tend to muddle when forced to use it, so i'm not sure
| if it is pilot error, change in the way the pci based boxes determine bus
| compatibility/root device eligibility, or a ddb error.  but, setting a
| break point at bus_compatible only breaks during the cpu configuration.
|
| kdb breakpoint at 11d70a4
| 1 tt=1ff tstate=0 tpc=0x0 tnpc=0x0
| Stopped in pid 0 (swapper) at   bus_compatible+0x8:     call            bus_clas
| s
| db> c
| : SUNW,UltraSPARC-IIi @ 440.127 MHz, version 0 FPU
| cpu0: physical 4K instruction (32 b/l), 4K data (32 b/l), 2048K external (64 b/l) 
| psycho0 at mainbus0 addr 0xfffc0000kdb breakpoint at 11d7120
|
| while in bus_compatible, trying to examine bpname (bus_compatible's first
| argument) returns a 'symbol not found' error; which i do not understand.

Symbols only work for global and static variables.  Local variables are
usually stored in registers.  The first 5 parameters are stored in %i0..%i5.

| can anyone offer any pointers to help me track down the problem with
| determining the root device?  i saw messages in the archive that this
| was broken, but none that it had been fixed.

Eduardo