NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/38643: [dM] st tape drive loses data

On Tue, Sep 02, 2008 at 05:07:44AM +0000, David Holland wrote:
 > no such luck; the value left in uio_offset is ignored.

However, it looks like st.c flatly ignores the passed-in block number
(b_blkno) and always reads whatever's under the tape head.

So I think what's happening is that physio is firing off sixteen 64k
reads, since PHYSIO_CONCURRENCY is 16 and MAXPHYS is 64k; each one
reads one 10k block, positioning the tape such that the next data read
will be in the place observed. But physio notices the short count on
the first read and drops the other 15, so only 10k comes back from the
read system call, and while vn_read will update the fd's seek position
to 10240, and that position is passed down on the next read, st
ignores it.

This also explains why things broke going from 3.0 -> 4.0, because in
3.0 physio didn't support having multiple requests in flight at once.

The 1 byte extra after those sixteen reads should either be skipped
entirely or be a 17th read generating an EIO that gets dropped,
depending on timing. (Note that physio only checks for an error
*before* it calls physio_wait, which might be construed as a

So there are at least two things wrong: (1) physio assumes b_blkno is
honored, and st doesn't; and (2) physio assumes st will read 64k when
asked, but it in fact apparently only reads one 10k block at a time.
(What does it do if the tape is written in 16k blocks? Or worse, say,
80k blocks?)

I have no idea what the proper way to resolve these discrepancies is.
It appears the immediate problem can be hacked around by having st
allocate a buf and pass it to physio, because that will cause physio
to use only that buf instead of up to PHYSIO_CONCURRENCY of its own;
but that's hardly a fix and doesn't even cover all the possible
failure cases.

David A. Holland

Home | Main Index | Thread Index | Old Index