Subject: Re: scsibus lockups
To: None <gdt@BBN.COM>
From: David Brownlee <D.K.Brownlee@city.ac.uk>
List: port-sparc
Date: 06/09/1995 08:25:46
Have you tried running with just one scsi device (NFS mounting
the rest if needed) - I'm guessing but I think it might be down
to multiple scsi devices... (Thats a problem on the sun4/300
machines)
Dabvid
D.K.Brownlee@city.ac.uk (MIME) +44 171 477 8186 {post,host}master (abs)
Network Analyst, UCS, City University, Northampton Square, London EC1V 0HB.
--- Bite and I'll bite back until one of us lies dead and bleeding ---
On Thu, 8 Jun 1995 gdt@BBN.COM wrote:
> I'm having a problem with scsibus lockups. I have a Hyundai Sparc-2
> clone (HWS-210 is printed on poweron) with 1 internal disk and 1
> external disk. I'm running -current, supped around May 30th for the
> kernel, and pk's snapshot for most binaries. These sources run fine
> on several other machines here (real sun IPC), so I suspect I have
> some flaky hardware. However, Sunos 4.1.2 runs without any scsi
> problems on my machine.
>
> The symptoms are that the red LED on the external drive (sd0, scsi
> target 0) comes on and stays on solid, and the kernel repeatedly
> prints messages like the following
>
> sd0(esp0:0:0) timeout
> sd1(esp0:3:0) timeout
>
> I added a call to esp_scsi_reset in esp_timeout:
>
> void
> esp_timeout(arg)
> void *arg;
> {
> int s = splbio();
> struct ecb *ecb = (struct ecb *)arg;
> struct esp_softc *sc;
>
> sc = ecb->xs->sc_link->adapter_softc;
> sc_print_addr(ecb->xs->sc_link);
> ecb->xs->error = XS_TIMEOUT;
> printf("timed out\n");
>
> esp_done(ecb);
> esp_reset(sc);
> esp_scsi_reset(sc);
> splx(s);
> }
>
> Now, the bus is reset, and the red light goes back out.
> In about 10 seconds, this repeats.
>
> I am assuming that the scsi bus is being reset, but that the driver is
> still waiting for the answer from the previous command, which will
> never come because the device has been reset.
>
> Does anyone know if there is a (easy?) way to cause such in-progress
> commands to be restored to the not-yet-issued state, so that this
> problem might become non-fatal?
> I haven't looked at all the SCSI code carefully yet, but such a change
> might well make the system more robust for anyone with a slightly
> flaky scsi bus.
>
> On another note, I get a few 'pmap botch' errors soon after booting;
> they seem to happen while xdm is starting up.
>
> Greg Troxel
>
>
> Here is what netbsd prints on boot:
>
>
> Jun 8 14:09:36 aardvark /netbsd: NetBSD 1.0A (GDT) #6: Thu Jun 8 13:44:22 EDT 1995
> Jun 8 14:09:37 aardvark /netbsd: gdt@aardvark.bbn.com:/nfs/aardvark/u1/NETBSD/src-current/sys/arch/sparc/compile/GDT
> Jun 8 14:09:37 aardvark /netbsd: real mem = 33161216
> Jun 8 14:09:37 aardvark /netbsd: avail mem = 29904896
> Jun 8 14:09:37 aardvark /netbsd: using 404 buffers containing 1654784 bytes of memory
> Jun 8 14:09:37 aardvark /netbsd: mainbus0 (root)
> Jun 8 14:09:37 aardvark /netbsd: cpu0 at mainbus0: SUNW,Sun 4/75 (W8601/8701 or MB86903 @ 40 MHz, on-chip FPU)
> Jun 8 14:09:37 aardvark /netbsd: cpu0: cache chip bug; trap page uncached
> Jun 8 14:09:37 aardvark /netbsd: cpu0: 65536 byte write-through, 32 bytes/line, hw flush cache enabled
> Jun 8 14:09:37 aardvark /netbsd: memreg0 at mainbus0 ioaddr 0xf4000000
> Jun 8 14:09:37 aardvark /netbsd: clock0 at mainbus0 ioaddr 0xf2000000: mk48t02 (eeprom)
> Jun 8 14:09:37 aardvark /netbsd: timer0 at mainbus0 ioaddr 0xf3000000
> Jun 8 14:09:38 aardvark /netbsd: auxreg0 at mainbus0 ioaddr 0xf7400003
> Jun 8 14:09:38 aardvark /netbsd: zs0 at mainbus0 ioaddr 0xf1000000 pri 12, softpri 6
> Jun 8 14:09:38 aardvark /netbsd: zs1 at mainbus0 ioaddr 0xf0000000 pri 12, softpri 6
> Jun 8 14:09:38 aardvark /netbsd: audio0 at mainbus0 ioaddr 0xf7201000 pri 13, softpri 4
> Jun 8 14:09:38 aardvark /netbsd: sbus0 at mainbus0 ioaddr 0xf8000000: clock = 20 MHz
> Jun 8 14:09:38 aardvark /netbsd: dma0 at sbus0 slot 0 offset 0x400000: rev 1+
> Jun 8 14:09:38 aardvark /netbsd: esp0 at sbus0 slot 0 offset 0x800000 pri 3: ESP100 20Mhz, target 7
> Jun 8 14:09:38 aardvark /netbsd: scsibus0 at esp0
> Jun 8 14:09:38 aardvark /netbsd: esp0 targ 1 lun 0: <SEAGATE, ST42100, 7544> SCSI2 0/direct fixed
> Jun 8 14:09:38 aardvark /netbsd: sd0 at scsibus0: 1812MB, 2574 cyl, 15 head, 96 sec, 512 bytes/sec
> Jun 8 14:09:38 aardvark /netbsd: esp0 targ 3 lun 0: <SEAGATE, ST1480, 7336> SCSI2 0/direct fixed
> Jun 8 14:09:38 aardvark /netbsd: sd1 at scsibus0: 411MB, 1476 cyl, 9 head, 63 sec, 512 bytes/sec
> Jun 8 14:09:39 aardvark /netbsd: le0 at sbus0 slot 0 offset 0xc00000 pri 5: hardware address 00:00:3b:86:01:ee
> Jun 8 14:09:39 aardvark /netbsd: cgsix0 at sbus0 slot 3 offset 0x0: SUNW,501-1672, 1152 x 900, rev 8
> Jun 8 14:09:39 aardvark /netbsd: fdc0 at mainbus0 ioaddr 0xf7200000 pri 11, softpri 4: chip 82072
> Jun 8 14:09:39 aardvark /netbsd: fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
> Jun 8 14:09:37 aardvark savecore: no core dump
> Jun 8 14:09:48 aardvark init: kernel security level changed from 0 to 1
> Jun 8 14:10:10 aardvark /netbsd: vm_mmap: pmap botch!
> Jun 8 14:10:11 aardvark last message repeated 2 times
>
>
>