Subject: hp300 SCSI & async I/O errors
To: None <port-hp300@axel.home, tech-kern@axel.home>
From: Steve Peurifoy <sp128@ibm.net>
List: port-hp300
Date: 08/11/1997 00:15:26
Hi all,

A while back David Jones (dej@ox.org) submitted a PR regarding a situation
in which a disk was not recognized at boot because the hp300 SCSI
driver set a selection timeout which was too short for the disk.

I bring this up because I just spent a few days tracking down a problem
that turned out to have the same cause and felt I should point out that
the effects can, in some cases, be a bit more severe.

In my case, a recent vintage Seagate Hawk (ST32151N) connected to a 380
was experiencing random silent FFS corruption.  It seems that the Hawk
takes longer than 2 ms to go through the selection phase under some
(but not all) conditions and the result is, of course, an I/O error
on the transfer.  The driver doesn't log anything in this situation
(although mine now does :-) unless DEBUG is defined.  I set my selection
timeout to ~6 ms (hd->scsi_tcm = 100 in issue_select()) and things work
much better.  Yes, it's a hack.

The larger issue (and the reason I bother tech-kern with this) is the
handling of I/O errors for asynchronous and delayed writes.  Am I missing
something or is nothing done with them?  Would it be a bad idea for
biodone() to log something (or panic) if B_ERROR is set on an async write?
Should it be handled elsewhere?  Can someone lend me a clue?

Thanks,

Steve