Subject: more on 2940UW and 4856NP tape drive
To: None <netbsd-users@NetBSD.ORG>
From: Laine Stump <laine@MorningStar.Com>
List: netbsd-users
Date: 03/23/1997 13:22:27
This is on a Pentium, but it could be considered a generic SCSI
question...

On our machine w/2940UW controller and Archive 4856NP tape drive, we are
doing a backup of 2 filesystems on different machines (NFS-mounting the
filesystems). Each of these filesystems is total 6.5GB capacity, with
5.5GB used on one and 4.3GB used on the other.

During the backup, tar sometimes gets and error, causing it to change
tapes way too so (eg - backup of 5.5B takes 5 x 4GB (w/o compression)
DAT tapes).

Last night I compiled a kernel with the 1.2.1 kernel sources, in order
to get the newer 2940 driver. In the subsequent backup, the 5.5GB
filesystem went onto a single tape without trouble, but about an hour
into the other filesystem, there was a SCSI error, followed by a bus
reset, followed by tar deciding it was time to change tapes (after the
change, the backup lasted another 1.5 hours, completing successfully).

Here are my questions:

1) Does this message seem like a hardware problem, a media problem, or a
   software problem?

2) If software, is there an even newer version of driver than that in 1.2.1?

3) Is there some way to get tar (or the driver) to retry the operation
   so that it doesn't switch tapes so easily? (Could this even work?)

4) Any other ideas?

Here is the tar command used:

   tar --create --file /dev/rst0 -b40 \
    --one-file-system --totals --new-volume-script tapenext .
	
(tapenext is a script that determines from a file which tape is in use,
and switches to the next using chio).

Here is the error from /var/log/messages (tar decided to change tapes at
4:02:57):

   Mar 23 04:01:17 pink /netbsd: st0(ahc0:4:0): timed out in dataout phase, SCSISIGI == 0x0
   Mar 23 04:01:17 pink /netbsd: st0(ahc0:4:0): BUS DEVICE RESET message queued.
   Mar 23 04:01:17 pink /netbsd: Bus Device Reset Message Sent
   Mar 23 04:01:17 pink /netbsd: st0(ahc0:4:0): Bus Device Reset delivered. 1 SCBs aborted

(any idea why there is such a long pause here?)

   Mar 23 04:02:54 pink /netbsd: ahc0:A:4: refuses WIDE negotiation.  Using 8bit transfers
   Mar 23 04:02:54 pink /netbsd: st0(ahc0:4:0): unit attention, data = 00 00 00 00 00 00 00 00 00 00
   Mar 23 04:02:54 pink /netbsd: ahc0: target 4 synchronous at 5.0MHz, offset = 0xf
   Mar 23 04:02:54 pink /netbsd: st0(ahc0:4:0): Target Busy
   Mar 23 04:02:57 pink last message repeated 5 times