Subject: Re: DS2100/3100 SCSI problem (still?)
To: Michael L. Hitch <osymh@lightning.oscs.montana.edu>
From: Matt Beal <beal@umiacs.UMD.EDU>
List: port-pmax
Date: 07/21/1996 14:49:12
osymh@lightning.oscs.montana.edu (Michael L. Hitch) writes:

> I've just experienced what appears to be an old problem with the
> DS2100/3100 SII SCSI driver.  The problem seems to only show up on
> certain disks - I can't replicate it with the RZ23 and RZ24 I'm
> currently using, but found it on a Fujitsu "M2261S-512".  The problem
> appears to be related to reading data:  corrupt data appears while
> reading from a file.

Yes! I thought I was just going crazy. :)

> This was first noticed when another person here
> was trying to untar the snapshots using the Fujitsu on a DS2100.  I put
> the drive on a DS5000/25 and was able to untar all the snapshot files
> with no problem.  I then put the drive on a DS2100 (and DS3100 as well)
> and experienced the same problem reading the snapshot files.

I've been having this problem on a 3100 with several RZ55's. It's been
too infrequent and random to pinpoint exactly, though. At first I thought
it might be nfs bugs, since the corruption appeared in a copy of the
source tree I tarred and extracted from an nfs-mounted filesystem, but
now that I think about it I've had it happen after the copy was made as
well.

In most cases, it looked like a block of one file found itself embedded
in another. i.e., a .c file would have about 512 bytes worth of another
.c file, or a man page file, stuck in the middle of it.  In at least one
case, however, the damaged portion was total garbage - the file may have
been cross-linked with some compiled part of the tree.

> It looks like the SII SCSI driver is corrupting the read data randomly
> once in a while.  I haven't been able to figure out what might be
> happening or how to debug it yet.  I no longer have the disk that was
> showing this problem.  If I can locate a disk that exhibits the problem
> and can keep it long enough, I'll try to find the problem.

Now that I know I wasn't just going batty, I'll take the time to poke
at it and see if I can get it to corrupt data in a repeatable sort of
way.

> [I did see the references to disabling the clustered disk I/O in the
> port-pmax archives, but I didn't get a chance to see if that changed the
> problem.]

If I can find out how to repeat it, I'll give this a try.

matt