Subject: Re: SCSI reset hang on sparc64 1.5X and sunos binary
To: None <abs@formula1.com, port-sparc64@netbsd.org>
From: None <eeh@netbsd.org>
List: port-sparc64
Date: 09/22/2001 22:32:16
| 	I've just had a couple of hangs on my Ultra1 (the first time
| 	its misbehaved under 1.5X).
|
| 	Both times were while running the SunOS 4.x netscape binary, and
| 	both resulted in a complete hang short of L1+A
|
| 	The last dmesg entries were:
|
| esp0: error:
| csr=b2930a13<INT,ERR,DRAINING=0,IEN,ENDMA,DSBL_SCSI_DRN,BURST=0,TCI
| esp0: DMA error; resetting
| esp0: !TC on DATA XFER [intr 10, stat 87, step 4] prevphase 101, resid 1f0
| esp0: waiting for SCSI Bus Reset to happen
|
| 	and trace reports (all via hamfisted c&p):
|
| zsc_intr_hard()
| zshard()
| intr_list_handler()
| sparc_intr_retry(5a35ec8f, 0, 5a35ec8c, 2182620, 0, ffffffff) at sparc_intr_retry+0x48
| soreceive(216de80, 1854040, eb85ac0, 2182620, 0, 18540b8) at soreceive+0x7b4
| soo_read(893d70, e893da0, eb85ac0, 2170d00, 1, 104f420) at soo_read+0x20
| dofileread(e8a16b0, b, 8903d70, fffffffd, fffffffd, e893da0) at dofileread+0x8c
| sys_read(e8a16b0,  eb85c80, eb85dc0, fffffffd, 0, ffffb090) at sys_read+0x58
| netbsd32_read(e8a16b0, eb85dd0, eb85dc0, 1152760, 800, 1) at netbsd32_read+0x24
| syscall(eb85ed0, 3, 0, 40c515d4, 0, 775) at syscall+0x304
| syscall_setup(b, ffffb68b, fffffffd, 23, f75800, 1) at syscall_setup+0x12c
|
| 	I'm running two wide SUN2.1G disks, and disk activity was light
| 	(I'v run both into the ground for sustained periods without incident
| 	otherwise).
|
| 	Does anyone have any thoughts on what might be up, or any additional
| 	information I could get which might help?

Well, it appears you're getting some sort of DMA error.  This is usually 
caused by either DMA to a page that's not mapped by the IOMMU or an error
from the memory controller.

I'd suggest running diagnostics on your memory subsystem.  If that checks
out, insert a breakpoint in lsi64854_scsi_intr() where it prints out the
DMA error message and then enter the PROM and dump the iommu fault status
and fault address registers.  Alernatively, you can add async fault interrupt
handlers to the sysio driver similar to the ones in psycho.

Eduardo