netbsd-users: problems with Exabyte 8200 on NetBSD/sparc 1.3.1

Subject: problems with Exabyte 8200 on NetBSD/sparc 1.3.1
To: NetBSD/sparc Discussion List <port-sparc@netbsd.org>
From: Greg A. Woods <woods@most.weird.com>
List: netbsd-users
Date: 09/18/1998 19:38:38
[[ FYI, I'm not subscribed to netbsd-users, so please CC port-sparc or
myself directly... ]]

I recently acquired an Exabyte 8200 tape drive to enhance my Sparc-2
based system that's running NetBSD/sparc 1.3.1 (and not a moment too
soon either -- I've got about 10GB of disk on the machine, and will soon
add another 4GB, but until now I had only a 525MB tape drive).

I decided it would be a really good idea to do my first full system
backup before I dive in and upgrade to 1.3.2.

Now I've had lots of fights with Exabyte drives before (mostly under
SunOS-4), but today's experience has me more than frustrated.

I decided to backup the "big" filesystem first:

	Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
	/dev/sd1a       51447    16998    31876    35%    /
	/dev/sd1e     1028982   864409   113123    88%    /usr
	/dev/sd1d      895142   777124    73260    91%    /var
	/dev/sd2d     1975660   941915   934962    50%    /cvs
>>>>	/dev/sd0d     4008441  1807819  2000199    47%    /big1
	/dev/sd3a       50447        3    47921     0%    /altroot
	/dev/sd3d      100951    65578    30325    68%    /altroot/var
	/dev/sd3e      353415        1   335743     0%    /altroot/usr
	/dev/sd3h     1409805     7308  1332006     1%    /build

I figured this should fit onto one tape, and after a wee bit of fiddling
I worked out what seem to be appropriate parameters for dump, and dump
agreed that the filesystem should easily fit on one tape:

15:24 [2037] # dump -0S -B 2200000 -b 1 -f /dev/nrst0 /big1   
  DUMP: Date of this level 0 dump: Fri Sep 18 15:25:07 1998
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping /dev/rsd0d (/big1) to /dev/nrst0
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 1953283 tape blocks on 0.89 tape(s).

After a wee bit of coaxing I managed to get the drive to accept a Sony
P6-120MP 8mm tape, and started dump:

15:28 [2039] # dump -0un -B 2200000 -b 1 -f /dev/nrst0 /big1
  DUMP: Date of this level 0 dump: Fri Sep 18 15:29:02 1998
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping /dev/rsd0d (/big1) to /dev/nrst0
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 1953283 tape blocks on 0.89 tape(s).
  DUMP: Volume 1 started at: Fri Sep 18 15:30:01 1998
  DUMP: dumping (Pass III) [directories]
  DUMP: 1.57% done, finished in 5:14
  DUMP: dumping (Pass IV) [regular files]
  DUMP: 3.40% done, finished in 4:44

and on it chugged until quite by surprise the driver reported:

st0(esp0:4:0):  Check Condition on opcode 0x10
    SENSE KEY:  No Additional Sense
                EOM Detected
   INFO FIELD:  16777215
     ASC/ASCQ:  No Additional Sense Information

(five iterations of the above message were displayed)

and dump asked for a new volume:

  DUMP: End of tape detected
  DUMP: Closing /dev/nrst0
  DUMP: Volume 1 completed at: Fri Sep 18 18:32:35 1998
  DUMP: Volume 1 took 3:02:34
  DUMP: Volume 1 transfer rate: 122 KB/s
  DUMP: Change Volumes: Mount volume #2
  DUMP: Is the new volume mounted and ready to go?: ("yes" or "no") 

and so I thought, well, the tape may not be the regulation length, so
I'll put another in and see how it goes.

After I got the first tape rewound and ejected (with "mt rewoffl"), I
could not get the drive to accept a new tape, it just spit them back out
immediately without even ``tasting'' them (Exabytes sometimes spit a
tape out after whirring and clicking for a while -- presumably they
don't think it's a good tape for some reason).  So I power cycled the
drive and, voila, it gave me a green light after accepting the tape.
I suppose I shouldn't have told the tape to go offline, esp. since
there's no easy way to send it a ``wakeup'' command on NetBSD.

Anyway, I went ahead and immediately typed "yes" to dump, and kaboom:

st0(esp0:4:0):  Check Condition on opcode 0x0
    SENSE KEY:  Hardware Error
   INFO FIELD:  16777215
     ASC/ASCQ:  No Additional Sense Information

  DUMP: master/slave protocol botched.
  DUMP: The ENTIRE dump is aborted.

This is probably where I made my second mistake.  I suppose I should
have retensioned the tape and verified that everything was OK with it
(eg. with "mt status" first), but I didn't do that.

Anyway, if anyone can tell me why the tape ended too soon, and perhaps
confirm why I had to power cycle the drive, I'd much appreciate it.

I've seen errors following tape changes on NetBSD/i386 too, and if I
remember right they can be kept from annoying dump by either doing an
"mt status", or perhaps trying to read a block from the device -- i.e.
trigger the driver to report the error and reset the drive back to a
usable state.

In another three hours I'll report how the second attempt made out...

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>      <robohack!woods>
Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>