Subject: Re: ccd/SCSI error
To: Justin T. Gibbs <gibbs@caspian.plutotech.com>
From: Erik Rungi <blackbox@openface.ca>
List: port-i386
Date: 06/29/1999 06:24:31
Ok I have some follow-up information...

After running 12 hours straight with only that one error, I got

Jun 28 18:11:38 what /netbsd: sd2(ahc0:2:0):  DEFERRED ERROR, key = 0x4
Jun 28 18:11:38 what /netbsd: sd2(ahc0:2:0):  Check Condition on CDB: 0x08 03
08
 90 06 00
Jun 28 18:11:38 what /netbsd:     SENSE KEY:  Hardware Error
Jun 28 18:11:38 what /netbsd:    INFO FIELD:  4046568
Jun 28 18:11:38 what /netbsd:      ASC/ASCQ:  Mechanical Positioning Error
Jun 28 18:11:38 what /netbsd:      FRU CODE:  0x1
Jun 28 18:11:38 what /netbsd: 
Jun 28 18:11:43 what /netbsd: sd2(ahc0:2:0):  DEFERRED ERROR, key = 0x4
Jun 28 18:11:43 what /netbsd: sd2(ahc0:2:0):  Check Condition on CDB: 0x08 04
19
 e0 04 00
Jun 28 18:11:43 what /netbsd:     SENSE KEY:  Hardware Error
Jun 28 18:11:43 what /netbsd:    INFO FIELD:  4046569
Jun 28 18:11:43 what /netbsd:      ASC/ASCQ:  Peripheral Device Write Fault
Jun 28 18:11:43 what /netbsd:      FRU CODE:  0x10
Jun 28 18:11:43 what /netbsd:          SKSV:  Actual Retry Count: 24
Jun 28 18:11:43 what /netbsd: 

Except not once, but, for "Mechanical Positioning Error" it happened 258 times
in 10 minutes, while the "Peripheral Device Write Fault" problem happened
only 75 times.  Hmm.  The errors seemed more or less unprovoked to me.

Erik made angry by silly errors from drive.  Erik remove drive from system.

I suspect that either my sd2 was about to bite it, or that I have some serious
heat dissipation problems with my setup with sd2, or a bad connector cable, or
something.  Regardless, that'll be enough of that for a while.  I need a bit
of sleep this week.  I think I'm going to try again with new drive that has
exactly the same geometry as my other two later this week.

sd2 was a new drive, the other two (sd0 and sd1) have been happily running as
parts of a a NetBSD 1.3 ccd for over a year now.

Must sleep...

Erik

On Mon, 28 Jun 1999, Justin T. Gibbs wrote:

> > > Can anyone enlighten me as to what this means:
> > > 
> > > sd2(ahc0:2:0):  DEFERRED ERROR, key = 0x4
> 		    ^^^^^^^^^^^^^^
> 
> >The CCD failed to write the data, because the underlying disk failed to
> >write the data; if the error is recovered from, EIO isn't (supposed to be)
> >returned.
> 
> Unfortunately, CCD found out about the error too late.  This was a deferred
> error, so the original write was reported as successfully completed to CCD
> before it actually hit the media.  It is hard to say what the correct
> way to deal with deferred errors is (the original client of the write could
> be gone now).
> 
> --
> Justin
> 

Openface Internet Inc.                                          Erik Rungi
Montreal, Canada                                        rungus@openface.ca
(514) 281-8585                                          Technical Director
Web Services, Software Development                            OpenFace INC