Subject: Re: (2nd send) opinion sought about EOM handling for tapes
To: Matthew Jacob <mjacob@feral.com>
From: Stefan Grefen <grefen@hprc.tandem.com>
List: tech-kern
Date: 07/13/1998 12:21:47
In message <Pine.LNX.3.93.980711003457.11082A-100000@feral-gw>  Matthew Jacob wrote:
> 
> 

[...]

> "Current EOM handling for all tape drives under NetBSD is broken."

I wouldn't restrict that to NetBSD. Unix and tapes is always a
diffcult story if you want to do more than backups ...
(there are special    tape - machine - OS - application combinations that
do work, but if you change one of the 4 your millage will vary) 
I do think a good tapesystem is (like a DMAPI fs-extension) a very good 
idea and worth the effort, but we should keep 'compatible' devices-nodes
so the normal UNIX software works without modification.
For a good tapesystem we also need basic volume handling and 
label-processing.



> 
> Why? Well, when you hit logical EOM when writing and get a check
> condition with EOM set, the driver returns EIO. This is broken
> in at least this two respects:
> 
> 	1. You "lose" the last write (at least to the writer). When
> 	you later actually read the tape, the last write you made
> 	is there- usually all of it (because this is almost always
> 	a "logical" EOM- usually several megabytes if not tens of
> 	megabytes before hard EOT), but conceivably just a partial
> 	write.

Thats the behaviour you can expect on any UNIX.

> 	
> 	2. The driver doesn't really note that it's at EOM (well, it
> 	does for fixed block sizes, but not for variable). This allows

This masut be fixed

> 
> As a minor side note, the driver forces 2 Filemarks at EOT
> for all but QIC tapes. That's sorta OK, but it really should
> be 2 Filemarks on all tapes that cannot know hard EOT (the
> reflective marker on 9 track is 'logical' EOM, not hard EOT-
> unlike, eg., QIC which has a hole in the tape marking actual
> physical end of media). The reason this 2 FM is wrong is that
> if there actually is a drive that actually writes two filemarks,
> user applications counting files will get screwed because they
> weren't expecting an extra empty file. 

Thats wrong, 2 FM is EOD (End of Data) and has nothing todo with EOT
handling.

> 
> Be that as it may be- the EIO behaviour has got to go. The correct
> behaviour is to internally note that EOM has been seen. For the write
> that completed had a residual, return the count-residual. Until the
> tape is closed, all further writes return residual==count. A rewind

The problem is with applications that expect the write to return an error on
failure (eg. -1) and don't check for residual==count. They may loose
data on the last record before EOT.
If we don't use seperate devices for the enhanced features we should return 
an error for the record after the EAW or the record causing the EAW
if  residual != 0).
Then an ioctl can return the residual aqnd enable writing after EOT.
(I would prefer tradionial and extended device - nodes).

> or a backspace operation causes a filemark to be written, and then

NO NO NO. Only a close causes automagic filemarks. If the application
wants them it can write them. 

> the (corrected) operation to be performed. A forward space operation
> is EIO'd. An ERASE operation is accepted (let the drive worry
> aboutit). A close causes a terminal FM to be written (2 on
> drives that cannot detect hard EOT). State is still "At EOM".
> 
> Most Unix drivers stop at this point, allowing you basically
> only rewind, but then you're left with the problem of wanting to allow
> trailer records to be written. That's where the mail I sent out comes in,
> listing a possibly too complex protocol for handling trailer records.

If you allow writing past logical EOT for 'normal' application data, you
can't write trailers anymore, as the apllication will have used up all the
physical tape. So the above should apply to extended feature aware aplications
only.

> > 3. Incremental bug fixes and development are just fine, but since you are
> > making an _interface_ change (right?) I think you need to put the effort
> > into finalizing the interface rather than stepping it.
> > 
> > 	A. If you are going to deal with EOM, you also might want to deal
> > 	with EOV and with multi-volume tape files. This is something Unix
> > 	hasn't ever really done...the uniquely unix way of telling dump
> > 	how big the tape is was never needed on other (non-M$) systems.
> > 	If it's hard to do this in a compatible way, then you can
> > 	always slice a bit off the minor number to specify the new model.

I second that, (but  I think at least CRAY has and Convex had a tapesubsytem
able to handle multi-volume and labeled tapes).

I would add st.c est.c for the new tapesystem and depending on dev-node use
either the code in st.c or est.c to handle a tape. That avoids most of the
compatibility problems.

> > 
> > 	B. On other systems, you can write as much stuff as you want and
> > 	it just _works_. Tape volumes are changed at a layer below the
> > 	application. The applications usually had knowledge of this so they
> > 	weren't forced to read from tape 0; I don't know if writing
> > 	multi-volume tape files was completely transparent or not, but it
> > 	was certainly _possible_. We don't do that at all on Unix...the
> > 	multi-volume dump(8) tapes are just application-layer-concatenated
> > 	multiple single-file volumes, and not a multiple-volume file. Quite
> > 	crude, really. But it's a cool feature and to do EOM "right" this
> > 	is what you need.
> 

[...]
> *been* the irate customer- the tape manager code for NetWorker
> (nsrmmd) practically sinks beneath the waves with loads of different
> #ifdef sun4/solaris/sco/hpux/aix/__alpha/_win32 thingies...).

Thats why I think we should keep the 'standard' (suboptimal) UNIX 
tape-basics, so that simple programs using the tape don't havee to 
add a #ifdef NetBSD ....

> 
> But you have a valid point about 'stepping' here. I noticed these
> problems with the NetBSD tape driver two years ago, and have
> wrung my hands over what to do about this since then. I've added
> a feature or two since then (compression, hardware block report/locate),
> but mostly have ignored it until work on NAStore made me address
> the problems directly. If one changes a behaviour or API significantly,
> you're right that you ought to get something worthwhile out of it.
> Again, one of the reasons I've held off from doing this is that I
> didn't want to hack on *this* driver and then toss all that out if
> we were to consider the CAM layer (which has a completely different
> sequential access driver- which I haven't even begun to analyze
> for behavioural consistency).

If things like  multi-volume and labeld tapes are to be added sometime
I think a redesign of the tapesystem is needed, reagdless of CAM or
the current scsi-layer  ...
> 
> I'm not sure I'm changing things all *that* much- but I really
> appreciate the feedback. I maybe should at least fix the EOM
> behaviour that is now currently broken (the immediate EIO)- do
> you think that would be too much to do for now?

I would at least make it a compile timeoption (off by default), 
I would expect (without analyzing) it to break several applications ...

Stefan

> 
> -matt
> 

--
Stefan Grefen                                Tandem Computers Europe Inc.
grefen@hprc.tandem.com                       High Performance Research Center
 --- Hacking's just another word for nothing left to kludge. ---