port-sparc: Re: scsibus lockups

Subject: Re: scsibus lockups
To: None <gdt@BBN.COM>
From: David Brownlee <D.K.Brownlee@city.ac.uk>
List: port-sparc
Date: 06/09/1995 08:25:46
	Have you tried running with just one scsi device (NFS mounting
	the rest if needed) - I'm guessing but I think it might be down
	to multiple scsi devices... (Thats a problem on the sun4/300
	machines)
		Dabvid

D.K.Brownlee@city.ac.uk (MIME)  +44 171 477 8186  {post,host}master   (abs)
Network Analyst, UCS, City University, Northampton Square, London EC1V 0HB.
  --- Bite and I'll bite back until one of us lies dead and bleeding ---

On Thu, 8 Jun 1995 gdt@BBN.COM wrote:

> I'm having a problem with scsibus lockups.  I have a Hyundai Sparc-2
> clone (HWS-210 is printed on poweron) with 1 internal disk and 1
> external disk.  I'm running -current, supped around May 30th for the
> kernel, and pk's snapshot for most binaries.  These sources run fine
> on several other machines here (real sun IPC), so I suspect I have
> some flaky hardware.  However, Sunos 4.1.2 runs without any scsi
> problems on my machine.
> 
> The symptoms are that the red LED on the external drive (sd0, scsi
> target 0) comes on and stays on solid, and the kernel repeatedly
> prints messages like the following
> 
> sd0(esp0:0:0) timeout
> sd1(esp0:3:0) timeout
> 
> I added a call to esp_scsi_reset in esp_timeout:
> 
>   void
>   esp_timeout(arg)
> 	  void *arg;
>   {
> 	  int s = splbio();
> 	  struct ecb *ecb = (struct ecb *)arg;
> 	  struct esp_softc *sc;
> 
> 	  sc = ecb->xs->sc_link->adapter_softc;
> 	  sc_print_addr(ecb->xs->sc_link);
> 	  ecb->xs->error = XS_TIMEOUT;
> 	  printf("timed out\n");
> 
> 	  esp_done(ecb);
> 	  esp_reset(sc);
> 	  esp_scsi_reset(sc);
> 	  splx(s);
>   }
> 
> Now, the bus is reset, and the red light goes back out.
> In about 10 seconds, this repeats.
> 
> I am assuming that the scsi bus is being reset, but that the driver is
> still waiting for the answer from the previous command, which will
> never come because the device has been reset.
> 
> Does anyone know if there is a (easy?) way to cause such in-progress
> commands to be restored to the not-yet-issued state, so that this
> problem might become non-fatal?
> I haven't looked at all the SCSI code carefully yet, but such a change
> might well make the system more robust for anyone with a slightly
> flaky scsi bus.
> 
> On another note, I get a few 'pmap botch' errors soon after booting;
> they seem to happen while xdm is starting up.
> 
> 	Greg Troxel
> 
> 
> Here is what netbsd prints on boot:
> 
> 
>  Jun  8 14:09:36 aardvark /netbsd: NetBSD 1.0A (GDT) #6: Thu Jun  8 13:44:22 EDT 1995
>  Jun  8 14:09:37 aardvark /netbsd:     gdt@aardvark.bbn.com:/nfs/aardvark/u1/NETBSD/src-current/sys/arch/sparc/compile/GDT
>  Jun  8 14:09:37 aardvark /netbsd: real mem = 33161216
>  Jun  8 14:09:37 aardvark /netbsd: avail mem = 29904896
>  Jun  8 14:09:37 aardvark /netbsd: using 404 buffers containing 1654784 bytes of memory
>  Jun  8 14:09:37 aardvark /netbsd: mainbus0 (root)
>  Jun  8 14:09:37 aardvark /netbsd: cpu0 at mainbus0: SUNW,Sun 4/75 (W8601/8701 or MB86903 @ 40 MHz, on-chip FPU)
>  Jun  8 14:09:37 aardvark /netbsd: cpu0: cache chip bug; trap page uncached
>  Jun  8 14:09:37 aardvark /netbsd: cpu0: 65536 byte write-through, 32 bytes/line, hw flush cache enabled
>  Jun  8 14:09:37 aardvark /netbsd: memreg0 at mainbus0 ioaddr 0xf4000000
>  Jun  8 14:09:37 aardvark /netbsd: clock0 at mainbus0 ioaddr 0xf2000000: mk48t02 (eeprom)
>  Jun  8 14:09:37 aardvark /netbsd: timer0 at mainbus0 ioaddr 0xf3000000
>  Jun  8 14:09:38 aardvark /netbsd: auxreg0 at mainbus0 ioaddr 0xf7400003
>  Jun  8 14:09:38 aardvark /netbsd: zs0 at mainbus0 ioaddr 0xf1000000 pri 12, softpri 6
>  Jun  8 14:09:38 aardvark /netbsd: zs1 at mainbus0 ioaddr 0xf0000000 pri 12, softpri 6
>  Jun  8 14:09:38 aardvark /netbsd: audio0 at mainbus0 ioaddr 0xf7201000 pri 13, softpri 4
>  Jun  8 14:09:38 aardvark /netbsd: sbus0 at mainbus0 ioaddr 0xf8000000: clock = 20 MHz
>  Jun  8 14:09:38 aardvark /netbsd: dma0 at sbus0 slot 0 offset 0x400000: rev 1+
>  Jun  8 14:09:38 aardvark /netbsd: esp0 at sbus0 slot 0 offset 0x800000 pri 3: ESP100 20Mhz, target 7
>  Jun  8 14:09:38 aardvark /netbsd: scsibus0 at esp0
>  Jun  8 14:09:38 aardvark /netbsd: esp0 targ 1 lun 0: <SEAGATE, ST42100, 7544> SCSI2 0/direct fixed
>  Jun  8 14:09:38 aardvark /netbsd: sd0 at scsibus0: 1812MB, 2574 cyl, 15 head, 96 sec, 512 bytes/sec
>  Jun  8 14:09:38 aardvark /netbsd: esp0 targ 3 lun 0: <SEAGATE, ST1480, 7336> SCSI2 0/direct fixed
>  Jun  8 14:09:38 aardvark /netbsd: sd1 at scsibus0: 411MB, 1476 cyl, 9 head, 63 sec, 512 bytes/sec
>  Jun  8 14:09:39 aardvark /netbsd: le0 at sbus0 slot 0 offset 0xc00000 pri 5: hardware address 00:00:3b:86:01:ee
>  Jun  8 14:09:39 aardvark /netbsd: cgsix0 at sbus0 slot 3 offset 0x0: SUNW,501-1672, 1152 x 900, rev 8
>  Jun  8 14:09:39 aardvark /netbsd: fdc0 at mainbus0 ioaddr 0xf7200000 pri 11, softpri 4: chip 82072
>  Jun  8 14:09:39 aardvark /netbsd: fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
>  Jun  8 14:09:37 aardvark savecore: no core dump
>  Jun  8 14:09:48 aardvark init: kernel security level changed from 0 to 1
>  Jun  8 14:10:10 aardvark /netbsd: vm_mmap: pmap botch!
>  Jun  8 14:10:11 aardvark last message repeated 2 times
> 
> 
>