Subject: Re: Kernel locks at root mount on April 23+ sup
To: None <greywolf@defender.vas.viewlogic.com>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: port-sparc
Date: 04/26/1996 12:05:28
On Fri, 26 Apr 96 09:20:27 PDT 
 greywolf@defender.VAS.viewlogic.com wrote:

 > Rob Healey sez:
 > 
 > 	   After my sup on April 23rd I can no longer make a kernel that will
 > 	   get past the "root on sd0a" message during boot. It's a HARD lock
 > 	   to, no L1-A will get you out of it, we're talking power switch time.
 > 
 > Oh, THANK YOU!  I get the SAME bloody result!  (I thought I had gone slightly
 > crazy...)

My SS2 is doing OK.  However, I have an SS1 that loses during the probe 
of the SCSI bus ... I'm actually booting a kernel with debugging traces 
turned on in the esp driver now... Good timing!  :-)

ok boot disk netbsd.test -s
Booting from: sd(0,0,0)netbsd.test -s 
>> NetBSD BOOT [$Revision: 1.1.1.2 $]
Booting netbsd.test @ 0x4000
974816+110640+66008+[42180+46423]=0x132c0b
[ preserving 88612 bytes of netbsd symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.

NetBSD 1.1B (LAB_SUN4C) #77: Fri Apr 26 11:39:30 PDT 1996
 thorpej@lestat:/tmp_mnt/antie/work/netbsd/src/sys/arch/sparc/compile/LAB_SUN4C
real mem = 12529664
avail mem = 10301440
using 152 buffers containing 622592 bytes of memory
bootpath: /sbus0/esp0/sd@0,0
mainbus0 (root)
cpu0 at mainbus0: Sun 4/60 (MB86900/1A or L64801 @ 20 MHz, WTL3170/2 FPU)
cpu0: 65536 byte write-through, 16 bytes/line, sw flush cache enabled
memreg0 at mainbus0 ioaddr 0xf4000000
clock0 at mainbus0 ioaddr 0xf2000000: mk48t02 (eeprom)
timer0 at mainbus0 ioaddr 0xf3000000
auxreg0 at mainbus0 ioaddr 0xf7400000
zs0 at mainbus0 ioaddr 0xf1000000 pri 12, softpri 6
zs0a: console i/o
zs1 at mainbus0 ioaddr 0xf0000000 pri 12, softpri 6
fdc0 at mainbus0 ioaddr 0xf7200000 pri 11, softpri 4: chip 82072
audio0 at mainbus0 ioaddr 0xf7201000 pri 13, softpri 4
sbus0 at mainbus0 ioaddr 0xf8000000: clock = 25 MHz
dma0 at sbus0 slot 0 offset 0x400000: rev 1
esp0 at sbus0 slot 0 offset 0x800000 pri 3: ESP100[ESP_INIT(1)]  25Mhz, target 7
scsibus0 at esp0
[ dead hang ... my words, not a printf ]

Note that at the top of esp_scsi_cmd():

int     
esp_scsi_cmd(xs)
        struct scsi_xfer *xs;
{
        struct scsi_link *sc_link = xs->sc_link;
        struct esp_softc *sc = sc_link->adapter_softc;
        struct ecb      *ecb;
        int s, flags;
        
        ESP_TRACE(("[esp_scsi_cmd] "));
        ESP_CMDS(("[0x%x, %d]->%d ", (int)xs->cmd->opcode, xs->cmdlen,
            sc_link->target)); 

So, I SHOULD be seeing:

scsibus0 at esp0
[esp_scsi_cmd]

...AT LEAST.  It appears to be losing in MI code ... Time for a 
bug-hunt, I guess.

Try turning on the ESP_TRACE() macros (set esp_debug to ESP_SHOWTRAC).  
See where you lose.

 > Anyone tried setting up a cg6 with RCONSOLE defined?  It has precisely
 > the opposite result from what was probably desired :-)  The scrolling mode
 > turns from slow to slower, as in scroll for each pixel-line.  Painful.

Actually, I found that my SS2 with cgsix improved when the cgsix driver 
was fixed to attach an rconsole ... here's my cgsix:

cgsix0 at sbus0 slot 3 offset 0x0: SUNW,501-1672, 1152 x 900, rev 6 (console)
cgsix0: attached to /dev/fb

...are you sure something evil isn't happening, like your cache not 
getting enabled?  On my SS10, there's some weird bug, so I can only get 
it to boot if I don't enable the cache.  When rconsole is attached to the 
cgsix, it is painfully slow on the SS10, but then again, I can duplicate 
the same slowness by not enabling the cache on my SS2, as well.

--------------------------------------------------------------------------
Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                               Home: 408.866.1912
NAS: M/S 258-6                                          Work: 415.604.0935
Moffett Field, CA 94035                                Pager: 415.428.6939