Subject: A flaw in the handling of removable SCSI devices?
To: None <current-users@NetBSD.ORG>
From: John F. Woods <jfw@jfwhome.funhouse.com>
List: current-users
Date: 09/21/1996 10:05:44
I have a Syquest EZ-135 drive, which once in a great while appears to
return UNIT ATTENTION for no obvious reason.

messages.5:Sep 18 15:40:01 jfwhome /netbsd: sd4(aha0:4:0): unit attention, data = 00 00 00 00 00 00 00 00 00 00 00 00 00 00

If it is mounted at the time, this renders the drive unusable until I reboot.

scsi_base.c appears to assume that if a drive is a removable drive,
then UNIT ATTENTION is only issued for media changes.  The draft
SCSI-2 standard is available on-line (hurray), and lists several other
conditions that result in UNIT ATTENTION (none of them plausible,
though, except perhaps a target reset condition or the helpfully
non-specific "Any other event occurs that requires the attention of
the initiator.").  Then the fun begins: you cannot do a new open on
the device file until you have closed all the current opens.  You
cannot do any further I/O until SDEV_MEDIA_LOADED becomes true, which
it won't until the device is closed and then opened.  And (here's the
booby prize) if the drive was mounted, you cannot unmount it because
the write of the superblock fails (not even with umount -f), therefore
you cannot close the device, therefore it's time to reboot.

I see two problems: (1) UNIT ATTENTION does not mean *only* that the
media has changed (though unfortunately there seems to be no way to
determine what it *does* mean in any given instance).  (2) if the
process of unmounting a filesystem gets an I/O error, leaving the
filesystem mounted is not necessarily a good strategy -- it's unlikely
that the I/O error will go away, though here is a case where
preventing the unmount is the only thing preventing the I/O error from
going away! At the very least, unmount should accept an additional
flag,  MNT_FORCE_DAMNIT, to tell unmount to do its best to write out
data but to unmount the device and unbusy it unconditionally even if
the I/O fails.

I'm not sure what the right thing to do about (1) is.  Since SCSI
devices can't inform the host that the media has JUST changed, it's
not like you can try a TEST UNIT READY right afterward to check that
the "media change" is real.  (2) seems straightforward, if not
necessarily simple.

Does anyone else have an EZ-135?  Do you ever see this happen?