Hi,
I was able to root-cause the issue. It was introduced in r1.11 of
uba_mainbus.c when scanning for Qbus/Unibus memories was added.
When NetBSD is booted from a MSCP controller, the boot loader sets up
the Qbus map to provide the controller with a small command/response
ring in memory to be used for I/O. Once the kernel is loaded and uba(4)
is attaching, the Qbus map is cleared while scanning for memories. It
appears that sudden loss of access to the command/response ring causes
the firmware of these CMD controllers to drop dead.
As a result, these controllers don't react in a reasonable time (100ms
by the spec) when their IP register is written to re-initialze the
controller. Even though uda(4)'s udamatch() waits up to 10s for a sign
of life from the controller, that's usually not enough for it to wake
up and it is assumed to be absent. Which, of course, causes the kernel
to fail booting as the boot device can't be found.
This needs to be addressed both in the kernel and the bootloader.
We can work around this issue in the kernel by restoring the Qbus map
registers to their original values after we've used them for detecting
Qbus memories. The code to do that is already there but commented out,
it just needs uncommenting.
To fix this properly, the standalone ra.c driver in the boot loader
should provide a close() entry point which clears the IP register to
issue a controller reset. Thus when the kernel is loaded, the firmware
of the MSCP controller is again in reset state and can't get confused
anymore by the initialization of the Qbus map in uba(4).
In case anyone wants to review them, I've attached two patches, one with
a kernel workaround and another with a proper fix for the boot loader. I
intend to commit them some time next week and request pullups into -10.
Hans