Subject: scsibus lockups
To: None <port-sparc@NetBSD.ORG>
From: None <gdt@BBN.COM>
List: port-sparc
Date: 06/08/1995 15:03:02
I'm having a problem with scsibus lockups.  I have a Hyundai Sparc-2
clone (HWS-210 is printed on poweron) with 1 internal disk and 1
external disk.  I'm running -current, supped around May 30th for the
kernel, and pk's snapshot for most binaries.  These sources run fine
on several other machines here (real sun IPC), so I suspect I have
some flaky hardware.  However, Sunos 4.1.2 runs without any scsi
problems on my machine.

The symptoms are that the red LED on the external drive (sd0, scsi
target 0) comes on and stays on solid, and the kernel repeatedly
prints messages like the following

sd0(esp0:0:0) timeout
sd1(esp0:3:0) timeout

I added a call to esp_scsi_reset in esp_timeout:

  void
  esp_timeout(arg)
	  void *arg;
  {
	  int s = splbio();
	  struct ecb *ecb = (struct ecb *)arg;
	  struct esp_softc *sc;

	  sc = ecb->xs->sc_link->adapter_softc;
	  sc_print_addr(ecb->xs->sc_link);
	  ecb->xs->error = XS_TIMEOUT;
	  printf("timed out\n");

	  esp_done(ecb);
	  esp_reset(sc);
	  esp_scsi_reset(sc);
	  splx(s);
  }

Now, the bus is reset, and the red light goes back out.
In about 10 seconds, this repeats.

I am assuming that the scsi bus is being reset, but that the driver is
still waiting for the answer from the previous command, which will
never come because the device has been reset.

Does anyone know if there is a (easy?) way to cause such in-progress
commands to be restored to the not-yet-issued state, so that this
problem might become non-fatal?
I haven't looked at all the SCSI code carefully yet, but such a change
might well make the system more robust for anyone with a slightly
flaky scsi bus.

On another note, I get a few 'pmap botch' errors soon after booting;
they seem to happen while xdm is starting up.

	Greg Troxel


Here is what netbsd prints on boot:


 Jun  8 14:09:36 aardvark /netbsd: NetBSD 1.0A (GDT) #6: Thu Jun  8 13:44:22 EDT 1995
 Jun  8 14:09:37 aardvark /netbsd:     gdt@aardvark.bbn.com:/nfs/aardvark/u1/NETBSD/src-current/sys/arch/sparc/compile/GDT
 Jun  8 14:09:37 aardvark /netbsd: real mem = 33161216
 Jun  8 14:09:37 aardvark /netbsd: avail mem = 29904896
 Jun  8 14:09:37 aardvark /netbsd: using 404 buffers containing 1654784 bytes of memory
 Jun  8 14:09:37 aardvark /netbsd: mainbus0 (root)
 Jun  8 14:09:37 aardvark /netbsd: cpu0 at mainbus0: SUNW,Sun 4/75 (W8601/8701 or MB86903 @ 40 MHz, on-chip FPU)
 Jun  8 14:09:37 aardvark /netbsd: cpu0: cache chip bug; trap page uncached
 Jun  8 14:09:37 aardvark /netbsd: cpu0: 65536 byte write-through, 32 bytes/line, hw flush cache enabled
 Jun  8 14:09:37 aardvark /netbsd: memreg0 at mainbus0 ioaddr 0xf4000000
 Jun  8 14:09:37 aardvark /netbsd: clock0 at mainbus0 ioaddr 0xf2000000: mk48t02 (eeprom)
 Jun  8 14:09:37 aardvark /netbsd: timer0 at mainbus0 ioaddr 0xf3000000
 Jun  8 14:09:38 aardvark /netbsd: auxreg0 at mainbus0 ioaddr 0xf7400003
 Jun  8 14:09:38 aardvark /netbsd: zs0 at mainbus0 ioaddr 0xf1000000 pri 12, softpri 6
 Jun  8 14:09:38 aardvark /netbsd: zs1 at mainbus0 ioaddr 0xf0000000 pri 12, softpri 6
 Jun  8 14:09:38 aardvark /netbsd: audio0 at mainbus0 ioaddr 0xf7201000 pri 13, softpri 4
 Jun  8 14:09:38 aardvark /netbsd: sbus0 at mainbus0 ioaddr 0xf8000000: clock = 20 MHz
 Jun  8 14:09:38 aardvark /netbsd: dma0 at sbus0 slot 0 offset 0x400000: rev 1+
 Jun  8 14:09:38 aardvark /netbsd: esp0 at sbus0 slot 0 offset 0x800000 pri 3: ESP100 20Mhz, target 7
 Jun  8 14:09:38 aardvark /netbsd: scsibus0 at esp0
 Jun  8 14:09:38 aardvark /netbsd: esp0 targ 1 lun 0: <SEAGATE, ST42100, 7544> SCSI2 0/direct fixed
 Jun  8 14:09:38 aardvark /netbsd: sd0 at scsibus0: 1812MB, 2574 cyl, 15 head, 96 sec, 512 bytes/sec
 Jun  8 14:09:38 aardvark /netbsd: esp0 targ 3 lun 0: <SEAGATE, ST1480, 7336> SCSI2 0/direct fixed
 Jun  8 14:09:38 aardvark /netbsd: sd1 at scsibus0: 411MB, 1476 cyl, 9 head, 63 sec, 512 bytes/sec
 Jun  8 14:09:39 aardvark /netbsd: le0 at sbus0 slot 0 offset 0xc00000 pri 5: hardware address 00:00:3b:86:01:ee
 Jun  8 14:09:39 aardvark /netbsd: cgsix0 at sbus0 slot 3 offset 0x0: SUNW,501-1672, 1152 x 900, rev 8
 Jun  8 14:09:39 aardvark /netbsd: fdc0 at mainbus0 ioaddr 0xf7200000 pri 11, softpri 4: chip 82072
 Jun  8 14:09:39 aardvark /netbsd: fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
 Jun  8 14:09:37 aardvark savecore: no core dump
 Jun  8 14:09:48 aardvark init: kernel security level changed from 0 to 1
 Jun  8 14:10:10 aardvark /netbsd: vm_mmap: pmap botch!
 Jun  8 14:10:11 aardvark last message repeated 2 times