Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: st.c update has broken dump multi-tape support



Hi Brett !

A quick analysis leaves me to believe that the culprit is in this commit:

   revision 1.234
   date: 2018-03-24 09:08:19 +0100;  author: mlelstv;  state: Exp;
   lines: +176 -134;  commitid: xU4Kh6YFLfDywGvA;
   branches:  1.234.2;
   Use separate lock to protect internal state and release locks when
   calling biodone.

Here the logic for ST_EARLY_WARN got lost. So the EOM always delivers EIO instead

of a 0 write count when EOM is reported by the drive and early warning is enabled.

The early warning logic is described in st.4 as

EOM HANDLING
Attempts to write past EOM and how EOM is reported are handled slightly
     differently based upon whether EARLY WARNING recognition is enabled in
     the driver.

If EARLY WARNING recognitions is not enabled, then detection of EOM (as
     reported in SCSI Sense Data with an EOM indicator) causes the write
     operation to be flagged with I/O error (EIO).  This has the effect for
     the user application of not knowing actually how many bytes were read
     (since the return of the read(2) system call is set to −1).

     If EARLY WARNING recognition is enabled, then detection of EOM (as
     reported in SCSI Sense Data with an EOM indicator) has no immediate
     effect except that the driver notes that EOM has been detected. If the
     write completing didn't transfer all data that was requested, then the
     residual count (counting bytes not written) is returned to the user
application. In any event, the next attempt to write (if that is the next action the user application takes) is immediately completed with no data transferred, and a residual returned to the user application indicating
     that no data was transferred.  This is the traditional UNIX EOF
     indication. The state that EOM had been seen is then cleared.

     In either mode of operation, the driver does not prohibit the user
     application from writing more data, if it chooses to do so. This will
continue up until the physical end of media, which is usually signalled internally to the driver as a CHECK CONDITION with the Sense Key set to
     VOLUME OVERFLOW. When this or any otherwise unhandled error occurs, an
     error return of EIO will be transmitted to the user application.  This
     does indeed mean that if EARLY WARNING is enables and the device
continues to set EOM indicators prior to hitting physical end of media, that an indeterminate number of 'short write returns' as described in the
     previous paragraph will occur. However, the expected user application
behaviour (in common with other systems) is to close the tape and rewind
     and request another tape upon the receipt of the first EOM indicator,
     possibly after writing one trailer record.

dump abort on EIO. dump will switch tapes if it gets a zero write count.

Thus the 1.234 commit should be fixed with respect to EOM signalling.

Frank


On 06/09/21 02:47, Brett Lymn wrote:
Folks,

I don't perform a tape backup nor update this machine very often so it
has taken a while for me to spot this.

I backup to tape which takes a few tapes to complete, in the past this
has worked fine, when one tape is full dump recognises this and prompts
for a new tape.

I attempted a backup a couple of days ago and now dump says "write
error" and then asks if it should restart the dump, answering yes does
restart the dump from the beginning, answering no causes dump to exit.

As I said, this machine does not get updated often so I suspect this
problem has been there for a while.  The kernel was built with v1.240 of
st.c, this version causes dump to misbehave.  I reverted st.c back to
v1.231 (this was the version of st.c that was used in the kernel that
made the last successful backup).  After adding a couple of FALLTHROUGH
comments to get v1.231 to compile I booted to this kernel and found that
dump behaved correctly again.

Given the above it looks like a change to st.c between v1.231 and v1.240
has broken multi-tape dumps.  Fortunately most of the commits in that
bracket are cosmetic, one that does stand out is v1.238 which does
modify the tape position handling.  I will try a kernel that
incorporates v1.237 of st.c and see what happens.  Unfortunately,
testing is a very slow process as it takes about 3 hours to fill a tape
though I may be able to reduce that by using a lto-1 tape instead which
should halve the time taken to fill a tape.




Home | Main Index | Thread Index | Old Index