Re: kern/38643: [dM] st tape drive loses data

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,mouse%Rodents.Montreal.QC.CA@localhost
Subject: Re: kern/38643: [dM] st tape drive loses data
From: David Holland <dholland-bugs%netbsd.org@localhost>
Date: Tue, 2 Sep 2008 06:55:03 +0000 (UTC)

The following reply was made to PR kern/38643; it has been noted by GNATS.

From: David Holland <dholland-bugs%netbsd.org@localhost>
To: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Cc: David Holland <dholland-bugs%NetBSD.org@localhost>, Havard Eidnes 
<he%NetBSD.org@localhost>,
        gnats-bugs%NetBSD.org@localhost, mouse%Rodents.Montreal.QC.CA@localhost,
        kern-bug-people%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost,
        netbsd-bugs%NetBSD.org@localhost, yamt%NetBSD.org@localhost
Subject: Re: kern/38643: [dM] st tape drive loses data
Date: Tue, 2 Sep 2008 06:54:54 +0000

 On Tue, Sep 02, 2008 at 05:07:44AM +0000, David Holland wrote:
  > no such luck; the value left in uio_offset is ignored.

 However, it looks like st.c flatly ignores the passed-in block number
 (b_blkno) and always reads whatever's under the tape head.

 So I think what's happening is that physio is firing off sixteen 64k
 reads, since PHYSIO_CONCURRENCY is 16 and MAXPHYS is 64k; each one
 reads one 10k block, positioning the tape such that the next data read
 will be in the place observed. But physio notices the short count on
 the first read and drops the other 15, so only 10k comes back from the
 read system call, and while vn_read will update the fd's seek position
 to 10240, and that position is passed down on the next read, st
 ignores it.

 This also explains why things broke going from 3.0 -> 4.0, because in
 3.0 physio didn't support having multiple requests in flight at once.

 The 1 byte extra after those sixteen reads should either be skipped
 entirely or be a 17th read generating an EIO that gets dropped,
 depending on timing. (Note that physio only checks for an error
 *before* it calls physio_wait, which might be construed as a
 shortcoming.)

 So there are at least two things wrong: (1) physio assumes b_blkno is
 honored, and st doesn't; and (2) physio assumes st will read 64k when
 asked, but it in fact apparently only reads one 10k block at a time.
 (What does it do if the tape is written in 16k blocks? Or worse, say,
 80k blocks?)

 I have no idea what the proper way to resolve these discrepancies is.
 It appears the immediate problem can be hacked around by having st
 allocate a buf and pass it to physio, because that will cause physio
 to use only that buf instead of up to PHYSIO_CONCURRENCY of its own;
 but that's hardly a fix and doesn't even cover all the possible
 failure cases.

 -- 
 David A. Holland
 dholland%netbsd.org@localhost

Prev by Date: Re: kern/38643: [dM] st tape drive loses data
Next by Date: Re: kern/2727 (Writing to SCSI tape panics system with 'done < 0; strategy broken message)
Previous by Thread: Re: kern/38643: [dM] st tape drive loses data
Next by Thread: Re: kern/38643: [dM] st tape drive loses data
Indexes:

Home | Main Index | Thread Index | Old Index