Subject: Re: (2nd send) opinion sought about EOM handling for tapes
To: Ross Harvey <ross@teraflop.com>
From: Matthew Jacob <mjacob@feral.com>
List: tech-kern
Date: 07/11/1998 01:29:22
On Fri, 10 Jul 1998, Ross Harvey wrote:
> 
> 
> 1. I think you might be on the right track here, and I'm glad you are
> putting effort into it, but I've reread your message several times and I
> can't quite tell:
> 
> 	A. What is the problem you are trying to solve?
> 
> 	B. What is the point of the change?
> 
> 	C. Did they do this just so they could write those stupid little
> 	ANSI 80-byte bracket records?  I haven't seen an ANSI format tape in
> 	almost a decade. Does it even work on fixed-block tapes?


[ as for #C: NAStore uses ANSI format headers. ANSI labels do
work for 'fixed block' tapes if you do RMW cycles. I'm actually thinking
of an Oil/GeoChemical company that has something on the order of one
million (1*10**6) reels of 1/2" and 3480 tape reels that contain
seismic data that continues to be read and parsed every day- hell, they
even need READ REVERSE because you often get most of the rest of
a bad record if your read it forwards and backwards a couple times.
They don't want ANSI records, but they sure do want sensible
EOM handling ]

I guess I haven't been clear enough and probably
left out important pieces. Let me try to explain
where I went with my thinking on this...


"Current EOM handling for all tape drives under NetBSD is broken."

Why? Well, when you hit logical EOM when writing and get a check
condition with EOM set, the driver returns EIO. This is broken
in at least this two respects:

	1. You "lose" the last write (at least to the writer). When
	you later actually read the tape, the last write you made
	is there- usually all of it (because this is almost always
	a "logical" EOM- usually several megabytes if not tens of
	megabytes before hard EOT), but conceivably just a partial
	write.
	
	2. The driver doesn't really note that it's at EOM (well, it
	does for fixed block sizes, but not for variable). This allows
	the unaware applications to keep opening the tape and keep
	writing until you get hard EOT, or in the case of old
	9 track tape drives (in SCSI, the M4 Data systems and
	the HP decks), actually wind the tape off the feed reel.

As a minor side note, the driver forces 2 Filemarks at EOT
for all but QIC tapes. That's sorta OK, but it really should
be 2 Filemarks on all tapes that cannot know hard EOT (the
reflective marker on 9 track is 'logical' EOM, not hard EOT-
unlike, eg., QIC which has a hole in the tape marking actual
physical end of media). The reason this 2 FM is wrong is that
if there actually is a drive that actually writes two filemarks,
user applications counting files will get screwed because they
weren't expecting an extra empty file. 

Be that as it may be- the EIO behaviour has got to go. The correct
behaviour is to internally note that EOM has been seen. For the write
that completed had a residual, return the count-residual. Until the
tape is closed, all further writes return residual==count. A rewind
or a backspace operation causes a filemark to be written, and then
the (corrected) operation to be performed. A forward space operation
is EIO'd. An ERASE operation is accepted (let the drive worry
aboutit). A close causes a terminal FM to be written (2 on
drives that cannot detect hard EOT). State is still "At EOM".

Most Unix drivers stop at this point, allowing you basically
only rewind, but then you're left with the problem of wanting to allow
trailer records to be written. That's where the mail I sent out comes in,
listing a possibly too complex protocol for handling trailer records.

The risk of breakage is only for devices that will allow you to physically
run the tape off the feed reel.

[ I won't discuss reading issues here ]
> 
> 2. I don't think you should break 9-track handling.
> 
> 	A. It should be easier to deal with, not harder, since as you
> 	probably know, 9-tracks do detect EOM, and in addition they allow
> 	writing a certain distance _past_ it, so you actually get to
> 	correctly finish the tape record that hit the EOM, and then write
> 	small records past that if you must. It's just a warning on 9-tracks.
> 	(It seems safer to back up and rewrite the record on the next volume,
> 	though. See 3A/3B.)
> 
> 	B. If it takes a little complexity to _not_ remove existing
> 	functionality, then fine, that's acceptable complexity.

That's *why* I want to add the complexity. See above. Almost all tapes
allow you to write past logical EOM (aks "Early Warning"). One of
the exceptions to this that I know was early M4 Data Systems 1/2"
reel drives if they were in buffered mode. That's why they were marked
in the SunOS st_conf.c as "unbuffered".

> 
> 3. Incremental bug fixes and development are just fine, but since you are
> making an _interface_ change (right?) I think you need to put the effort
> into finalizing the interface rather than stepping it.
> 
> 	A. If you are going to deal with EOM, you also might want to deal
> 	with EOV and with multi-volume tape files. This is something Unix
> 	hasn't ever really done...the uniquely unix way of telling dump
> 	how big the tape is was never needed on other (non-M$) systems.
> 	If it's hard to do this in a compatible way, then you can
> 	always slice a bit off the minor number to specify the new model.
> 
> 	B. On other systems, you can write as much stuff as you want and
> 	it just _works_. Tape volumes are changed at a layer below the
> 	application. The applications usually had knowledge of this so they
> 	weren't forced to read from tape 0; I don't know if writing
> 	multi-volume tape files was completely transparent or not, but it
> 	was certainly _possible_. We don't do that at all on Unix...the
> 	multi-volume dump(8) tapes are just application-layer-concatenated
> 	multiple single-file volumes, and not a multiple-volume file. Quite
> 	crude, really. But it's a cool feature and to do EOM "right" this
> 	is what you need.

I don't think that this is the right thing to do- in the driver.
This method of handling multivolume tapes is for an access manager-
In Unix this is either a filesystem or a user level application (dump,
NetWorker, what have you)- not really a driver. What you're referring
to is a large virtual tape manager. There's a point in doing this,
particularly as it may allow you to do tape striping if you allow
system control records to intersperse with user data records, but
if there's one thing I've learned after having been yelled at by
irate customers is that whole subsystems like this are usually
considered secondary to having a tape driver that works reasonably
well and matches reasonable behaviours on other systems (Hell, I've
*been* the irate customer- the tape manager code for NetWorker
(nsrmmd) practically sinks beneath the waves with loads of different
#ifdef sun4/solaris/sco/hpux/aix/__alpha/_win32 thingies...).

But you have a valid point about 'stepping' here. I noticed these
problems with the NetBSD tape driver two years ago, and have
wrung my hands over what to do about this since then. I've added
a feature or two since then (compression, hardware block report/locate),
but mostly have ignored it until work on NAStore made me address
the problems directly. If one changes a behaviour or API significantly,
you're right that you ought to get something worthwhile out of it.
Again, one of the reasons I've held off from doing this is that I
didn't want to hack on *this* driver and then toss all that out if
we were to consider the CAM layer (which has a completely different
sequential access driver- which I haven't even begun to analyze
for behavioural consistency).

I'm not sure I'm changing things all *that* much- but I really
appreciate the feedback. I maybe should at least fix the EOM
behaviour that is now currently broken (the immediate EIO)- do
you think that would be too much to do for now?

-matt