Subject: Re: ESP SCSI controller errors?
To: Nathan Dorfman <nathan@rtfm.net>
From: john heasley <heas@shrubbery.net>
List: port-sparc
Date: 01/23/2002 21:36:25
Wed, Jan 23, 2002 at 11:46:25PM -0500, Nathan Dorfman:
> On Wed, Jan 23, 2002 at 11:57:18AM -0800, Andrey Petrov wrote:
> > If build.sh could be reduced to a small test which reproduce the problem
> 
> It's not anything specific to build.sh. Running a full build is just
> something to generate a lot of I/O. I've also chanced on these messages
> by doing a find, and in one case just cp'ing one file (~5MB) from /usr
> to /.
> 
> One thing that is now apparent is that it'll only happen once, when 
> the system is freshly booted. It is 100% reproduceable then, AFAIK
> 100% guaranteed to occur after I boot, triggered by some I/O bound
> task. But after that, I have not been able to reproduce it a second
> time, no matter how hard I try -- until the next time I boot.
> 
> Weird.
> 
> Sorry, but I know next to nothing about the workings of these drivers,
> and thus am at a complete loss as to what to look for or try next on my
> own. Anything else I can do to gather more info on whatever could be
> behind the complaints? Consider me at your disposal.
> 
> Privet,
> 

same here (with either the sbus hme/fas (sparc or ultra) or built-in ultra
FAS366 packages).  as far as i can tell, the controller starts off doing
tagged-queuing, this error occurs, tagged-queuing is disabled and it never
recurs.

i've havent been able to figure out what is happening in the driver, but
it seems possible to reproduce it by having two drives on u2 scisbus0,
reboot, then cp /netbsd /sd1; as in the two cases below.

esp0 at sbus0 slot 14 offset 0x8800000 vector 20 ipl 3: dma rev fas
esp0: FAS366/HME, 40MHz, SCSI ID 7
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <IBM, DCAS-34330W, S65A> SCSI2 0/direct fixed
sd0: 4134 MB, 8205 cyl, 6 head, 171 sec, 512 bytes/sect x 8467200 sectors
esp0: 16 bit mode
esp0: ti->flags & T_WIDE = 128, ti->width = 1
esp0: 16 bit mode
esp0: ti->flags & T_WIDE = 0, ti->width = 1
sd0: sync (100.0ns offset 15), 16-bit (20.000MB/s) transfers, tagged queueing
sd1 at scsibus0 target 1 lun 0: <IBM, DCAS-34330W, S65A> SCSI2 0/direct fixed
sd1: 4134 MB, 8205 cyl, 6 head, 171 sec, 512 bytes/sect x 8467200 sectors
esp0: 16 bit mode
esp0: ti->flags & T_WIDE = 128, ti->width = 1
esp0: 16 bit mode
esp0: ti->flags & T_WIDE = 0, ti->width = 1
sd1: sync (100.0ns offset 15), 16-bit (20.000MB/s) transfers, tagged queueing

esp0: error: csr=b2930a13<INT,ERR,DRAINING=0,IEN,ENDMA,DSBL_SCSI_DRN,BURST=0,TCI
esp0: DMA error; resetting
esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 101, resid 1f0
esp0: waiting for SCSI Bus Reset to happen

esp0: error: csr=b2930a13<INT,ERR,DRAINING=0,IEN,ENDMA,DSBL_SCSI_DRN,BURST=0,TCI
esp0: DMA error; resetting
esp0: !TC on DATA XFER [intr 10, stat 83, step 4] prevphase 101, resid 1f0
esp0: waiting for SCSI Bus Reset to happen