Subject: Re: NTP loses sync if st driver pushed hard?
To: Thor Lancelot Simon <tls@rek.tjls.com>
From: Matthew Jacob <mjacob@feral.com>
List: tech-kern
Date: 09/17/2001 00:34:23
On Mon, 17 Sep 2001, Thor Lancelot Simon wrote:

> On Sun, Sep 16, 2001 at 05:22:14PM -0700, Matthew Jacob wrote:
> > 
> > Do you, perchance, for some really strange reason, use the *block* interface
> > to the tape drive? I have to admit that my experiences with things wedging up
> > come from some inadvertant testing with the block version (repeat inf: blush).
> > 
> > I rather suspect that any soft error which causes a check condition will
> > reduce streaming. We may need to rethink whether or not all errors should be
> > done by the completion thread.
> 
> No, I don't use the block device.  And my kernel isn't *reporting* a soft
> error on the tape drive ever.

Okay.

>  But, I note that:
> 
> 1) The clock slippage *may* be limited to the end of the backup runs; I
>    can't shake the feeling that I've seen it otherwise, but... it may
>    only happen when I do the close(); however, it does happen with either
>    the rewinding or non-rewinding device.

The filemark writes can take a long time- all data in a write buffer will
flush. But this shouldn't cause any lossage.

I don't recall what HBA this is... let me go back and look at old mail...you
said "various machines"- did they all have the same HBA type (Advansys)?


> 
> 2) Indeed, I *cannot stream* my DLT8000, though it's being fed via a
>    large circular buffer (either the one in Amanda, "buffer" from pkgsrc,
>    or a circular-buffer writer of my own) from large sequential files that
>    are striped across two IDE disks that can read from them through the
>    filesystem at >50MB/sec and is the *only* device on an Advansys LVD
>    SCSI adapter.  The data rate rises gradually to about 12MB/sec over the
>    course of about 30 seconds, then the tape mechanism stops, the data rate
>    falls off, then picks up again.  Quantum have no idea what's going on
>    and have advised me to wait for them to fix the drive's variable-speed
>    write feature with new firmware.  However, they say that if I can hand
>    the drive 12MB/sec of data, it ought to stream indefinitely, much less
>    50MB/sec.
> 
> Is there a soft error for "internal buffer full" or some such that might be
> causing this kind of lossage?

Not at this level, no. 

There is definitely something wierd going on. I've been trying to 'fix' (at a
slow rate) some of the tape driver brokenness- but not really throwing any
high end h/w at it. I have an Archive QIC150 attached thru isp on a PC164, and
NetBSD-current can't keep it streaming using my test programs (which just
generate known data patterns and write them out into records and files, rewind
and check them). The trouble here is I'm trying to fix some other issues, and
I ended up totally mangling the driver so I have to spend some time recovering
where I was before trying a bit more freshly.

I also need to spend some money and get some more high end tape h/w- I wore my
DLT4000 out fixing the FreeBSD tape driver recently :-( and need to replace
it. I have a Mammoth2 LVD tape on loan from Antares, but it seems to be
broken. To get an FC or a LVD tape drive you have to fork over > 2K$ at least
it seems.

You might try picking up my tape_patter_tester and using that to see if you
can factor source variants away. I can email you a copy if you don't wanna use
bitkeeper to pick up the package.

Looking again at #2 above- this makes me wonder if this isn't some scheduler
problem, oddly enough. It sounds to me like the backup process runs and
consumes a very large quantum of runtime because it's being so succesful at
pushing data- so much so that when it has to wait a bit (because the internal
tape buffer is full and needs to flush a bit), that it gets penalized and
doesn't get started quickly enough when the command that waited on a tape
buffer flush finishes. But this may make no sense at all.

Now that I'm remembering on on NetBSD instead of FreeBSD (FreeBSD-current now
has a DEVFS standard and lost all the 'raw' names for devices over the side
months ago- sigh...), I'm not seeing the system get totally bogged while
writing to tape. I was doing so before (when I was mistakenly writing to the
cooked device) probably because of some UVM/SPECFS artifact.

-matt