Subject: Re: Feb. 29 kernel
To: None <pk@cs.few.eur.nl>
From: Jarle Fredrik Greipsland <jarle@runit.sintef.no>
List: port-sparc
Date: 03/03/1996 19:46:14
Paul Kranenburg writes:
>> esp0 targ 4 lun 0: <MICROP, 1588-15MBSUN0669, SN0C> SCSI1 0/direct fixed
>> The other interesting new event is I now have a disappearing
>> file problem. On sd1, whenever I add or remove files, all the other
>> files in that directory disappear. If I unmount the drive, fsck, and
This is similar behavior to the one you'll see for the aic6360 driver
without the patch in port-i386/875 applied. Actually, I sent an IMO
better fix than the one in the original pr to gnats-admin, and also
warned that this bug may have been dragged along into any derived
drivers. Seeing that the esp-driver refers to the aic6360 driver this
may very well be the case.
> I have reason to believe that this drive contains a SCSI protocol bug.
Probably not. What takes place on the scsi bus, at least in my case
(an aic6360 + Quantum PD1800S), is:
o A scsi write command is started, it selects ok, identifies ok, and
the command block is correctly sent.
o The drive requests data, and the data is sent in one or more data
transfer phases, possibly with disconnects and reselections
intermingled. However, for the last data transfer phase, the drive
receives all the data it wants, but it cannot send COMMAND COMPLETE
to the initiator because the data hasn't hit the media yet.
Instead of hanging on to the bus, it sends a DISCONNECT message to
the initiator without sending a SAVE POINTER message first. This
is a _perfectly_valid_behavior_ for a scsi device, it has gotten
all the data it needs, and thus doesn't need the pointers anymore.
Later, it reconnects, the initiator does the proper implicit
RESTORE POINTERS, but they won't be used for anything, and the
target device sends GOOD status and COMMAND COMPLETE.
o Now the bug manifests itself. NOTE 32 in my draft of the scsi spec
says: "Since the data pointer value may be modified by the target
before the I/O-process ends, it should not be used to test for
actual transfer length because it is not reliable." Contrary to
this recommendations our drivers use the pointers (actually a
length count, but in this case it amounts to the same thing) to
check for complete transfers. And it's partly my fault.
Old versions of the upper level scsi drivers, at least for disks, used
to ignore the value in xs->residue if the transfer completed
successfully. They don't do that anymore. The fix attached below
tries to conserve the transfer length accounting, but keeps around a
boolean value for each transfer that says whether the transfer count
has ever been 0. If it has once been 0, and the status code is GOOD,
just do xs->residue = 0; This fixes my problem, at least. Something
similar may work for the esp driver as well.
-jarle
--
"As far as I'm concerned, if something is so complicated that you can't
explain it in 10 seconds, then it's probably not worth knowing anyway"
-- Calvin
----------------------------------------------------------------
*** aic6360.c.orig Fri Oct 20 12:52:39 1995
--- aic6360.c Sat Feb 24 23:37:05 1996
***************
*** 460,463 ****
--- 460,464 ----
#define ACB_CHKSENSE 2
#define ACB_ABORTED 3
+ u_char null_left;
};
***************
*** 992,995 ****
--- 993,997 ----
acb->data_length = xs->datalen;
acb->target_stat = 0;
+ acb->null_left = 0;
s = splbio();
***************
*** 1265,1268 ****
--- 1267,1271 ----
acb->data_addr = (char *)&xs->sense;
acb->data_length = sizeof(struct scsi_sense_data);
+ acb->null_left = 0;
acb->flags = ACB_CHKSENSE;
ti->senses++;
***************
*** 1275,1279 ****
return;
} else {
! xs->resid = acb->data_length;
}
}
--- 1278,1288 ----
return;
} else {
! /* This is a successfully completed command, but
! * warn if we haven't seen data_length == 0
! */
! xs->resid = acb->null_left ? 0 : acb->data_length;
! if (xs->resid)
! printf("%s: warning: xfer didn't complete\n",
! sc->sc_dev.dv_xname);
}
}
***************
*** 1466,1470 ****
acb->data_length = 0;
}
! acb->xs->resid = acb->data_length = sc->sc_dleft;
sc->sc_state = AIC_CMDCOMPLETE;
break;
--- 1475,1479 ----
acb->data_length = 0;
}
! acb->data_length = sc->sc_dleft;
sc->sc_state = AIC_CMDCOMPLETE;
break;
***************
*** 1512,1515 ****
--- 1521,1525 ----
ti->dconns++;
sc->sc_state = AIC_DISCONNECT;
+ acb->null_left = sc->sc_dleft == 0;
break;