Subject: Re: Feb. 29 kernel
To: None <pk@cs.few.eur.nl>
From: Jarle Fredrik Greipsland <jarle@runit.sintef.no>
List: port-sparc
Date: 03/03/1996 19:46:14
Paul Kranenburg writes:
>> esp0 targ 4 lun 0: <MICROP, 1588-15MBSUN0669, SN0C> SCSI1 0/direct fixed
>> The other interesting new event is I now have a disappearing
>> file problem. On sd1, whenever I add or remove files, all the other
>> files in that directory disappear. If I unmount the drive, fsck, and
This is similar behavior to the one you'll see for the aic6360 driver
without the patch in port-i386/875 applied.  Actually, I sent an IMO
better fix than the one in the original pr to gnats-admin, and also
warned that this bug may have been dragged along into any derived
drivers.  Seeing that the esp-driver refers to the aic6360 driver this
may very well be the case.

> I have reason to believe that this drive contains a SCSI protocol bug.
Probably not.  What takes place on the scsi bus, at least in my case
(an aic6360 + Quantum PD1800S), is:

 o A scsi write command is started, it selects ok, identifies ok, and
   the command block is correctly sent.
   
 o The drive requests data, and the data is sent in one or more data
   transfer phases, possibly with disconnects and reselections
   intermingled.  However, for the last data transfer phase, the drive
   receives all the data it wants, but it cannot send COMMAND COMPLETE
   to the initiator because the data hasn't hit the media yet.
   Instead of hanging on to the bus, it sends a DISCONNECT message to
   the initiator without sending a SAVE POINTER message first.  This
   is a _perfectly_valid_behavior_ for a scsi device, it has gotten
   all the data it needs, and thus doesn't need the pointers anymore.
   Later, it reconnects, the initiator does the proper implicit
   RESTORE POINTERS, but they won't be used for anything, and the
   target device sends GOOD status and COMMAND COMPLETE.
   
 o Now the bug manifests itself.  NOTE 32 in my draft of the scsi spec
   says: "Since the data pointer value may be modified by the target
   before the I/O-process ends, it should not be used to test for
   actual transfer length because it is not reliable."  Contrary to
   this recommendations our drivers use the pointers (actually a
   length count, but in this case it amounts to the same thing) to
   check for complete transfers.  And it's partly my fault.

Old versions of the upper level scsi drivers, at least for disks, used
to ignore the value in xs->residue if the transfer completed
successfully.  They don't do that anymore.  The fix attached below
tries to conserve the transfer length accounting, but keeps around a
boolean value for each transfer that says whether the transfer count
has ever been 0.  If it has once been 0, and the status code is GOOD,
just do xs->residue = 0;  This fixes my problem, at least.  Something
similar may work for the esp driver as well.

					-jarle
-- 
"As far as I'm concerned,  if something is so complicated that you can't
 explain it in 10 seconds, then it's probably not worth knowing anyway"
                                        -- Calvin
----------------------------------------------------------------
*** aic6360.c.orig	Fri Oct 20 12:52:39 1995
--- aic6360.c	Sat Feb 24 23:37:05 1996
***************
*** 460,463 ****
--- 460,464 ----
  #define ACB_CHKSENSE	2
  #define	ACB_ABORTED	3
+ 	u_char null_left;
  };
  
***************
*** 992,995 ****
--- 993,997 ----
  	acb->data_length = xs->datalen;
  	acb->target_stat = 0;
+ 	acb->null_left = 0;
  
  	s = splbio();
***************
*** 1265,1268 ****
--- 1267,1271 ----
  			acb->data_addr = (char *)&xs->sense;
  			acb->data_length = sizeof(struct scsi_sense_data);
+ 			acb->null_left = 0;
  			acb->flags = ACB_CHKSENSE;
  			ti->senses++;
***************
*** 1275,1279 ****
  			return;
  		} else {
! 			xs->resid = acb->data_length;
  		}
  	}
--- 1278,1288 ----
  			return;
  		} else {
! 			/* This is a successfully completed command, but
! 			 * warn if we haven't seen data_length == 0
! 			 */
! 			xs->resid = acb->null_left ? 0 : acb->data_length;
! 			if (xs->resid)
! 				printf("%s: warning: xfer didn't complete\n",
! 					sc->sc_dev.dv_xname);
  		}
  	}
***************
*** 1466,1470 ****
  				acb->data_length = 0;
  			}
! 			acb->xs->resid = acb->data_length = sc->sc_dleft;
  			sc->sc_state = AIC_CMDCOMPLETE;
  			break;
--- 1475,1479 ----
  				acb->data_length = 0;
  			}
! 			acb->data_length = sc->sc_dleft;
  			sc->sc_state = AIC_CMDCOMPLETE;
  			break;
***************
*** 1512,1515 ****
--- 1521,1525 ----
  			ti->dconns++;
  			sc->sc_state = AIC_DISCONNECT;
+ 			acb->null_left = sc->sc_dleft == 0;
  			break;