Subject: Bug? Help w/ wd*/wt*
To: None <port-i386@NetBSD.ORG>
From: Brian C. Grayson <bgrayson@ece.utexas.edu>
List: port-i386
Date: 09/20/1996 00:04:14
I've got a _big_ problem. I finally got an old Archive
tape working on my i386 system at home, running NetBSD-i386
1.1. The disk drive started acting a little flaky, but fsck
could fix things up real quick during reboot. I figured the
flakiness was just because the Archive hardware is old ((c)
1985) and might not be behaving completely. I was wrong. :(
To make a long story short, I made a non-rewinding device
(minor number 04, methinks) and called it /dev/nwt0. I created
a bunch of dumps on /dev/wt0, and could read them fine. I
started a dump on /dev/nwt0, the tape drive didn't start up,
and the system panicked. When I brought it back up, partition
rwd0e had been hosed in a big way. When I looked at /dev, I
noticed that the major device number for rwd0* is the same as
the major number for the wt* device -- 3. And thus, device 3,4
also corresponds to /dev/rwd0e. Hence, I probably shoved 30MB
of a dump onto my hard drive instead of the tape. Does this
sound plausible?
Disclaimer: I am relatively new at sysadmin'ing, so I could
have done something really stupid. If so, please enlighten me.
Yes, I probably should have checked MAKEDEV for major number
conflicts before I got too ambitious, but I didn't think of it.
Believe me, I'll never blindly trust MAKEDEV again. If I'm
way off-base on some of my assumptions or comments,
let me know (hopefully in a kind tone?).
Several questions:
1. Did I do something incorrect when setting up the
non-rewinding version? I'm guessing that for some
reason when I used /dev/wt0, the kernel did the right
thing most of the time, but for some reason it didn't
for the nwt0.
2. Are there supposed to be major-number conflicts
like this? The default wt0 setup conflicts with rwd0a,
and the nwt0 setup conflicts with rwd0e (mknod didn't
catch this, nor did anything else). Does anyone have a
big table with what numbers are reserved for what, to
prevent these personal catastrophes from happening in the
future? I don't know enough about device drivers to
be _sure_ that conflicts are a bad thing, but if
conflicts were okay, we could just make everything be
major 0, minor 0!
2b. A quick check on ftp.netbsd.org showed the problem
still around in MAKEDEV for -current i386 -- should this
be fixed (with the accompanying changes to wt.c or
wtreg.h or whatever) before 1.2 ships? Believe me, I
don't want to delay 1.2 any. But this is a serious bug,
if I didn't do something stupid, for people who happen to
use the wt code. (more important to _me_ than, say,
bounce-buffer support! :) )
3. Is anyone else out there using an Archive 5945C tape
drive, with an Archive SC400S controller? It was a
hand-me-down, with no docs, and I would like to know a
bit more about its jumpers. For that matter, is anyone
else using the wt interface at all? From what I've
seen in my net-searches for info about the tape drive,
it was popular on Suns at some point.
4. Is there any hope of recovering the filesystem, at
least the portion past the 30MB worth of stuff that got
overwritten by the dump? Or is the filesystem going to
be totally trashed, which is what I expect? I've never
used fsdb, and I figured after answering "yes" to
fsck's prompts for the first dozen inodes that fsck might
not be able to handle the severe damage. It looked
like it was prompting to clear the inode, which
sounded like a bad thing. Then again, the inode
probably contained "Hello World" where the access
times were supposed to be. Advice? Hope? Sympathy? :)
Until I hear any advice, I think I'll just let the system
rest in peace for a while. I've got my trusty Apple ][+
with the 14.4, so at least I can still do some research
from home. Luckily, most of my 300MB /usr partition was
software that can be (painstakingly) downloaded again, but
/home was on there too.
I can't believe I'm not in a worse mood -- I guess the
shock just hasn't hit me yet!
Brian
(And yes, I'm one of those students alluded to in recent
discussions that can't afford to upgrade to, say, a SCSI
controller and SCSI tape drive at the moment, in which case
bounce-buffer might matter to me!)
--
Brian Grayson (bgrayson@ece.utexas.edu)
Graduate Student, Electrical and Computer Engineering
The University of Texas at Austin
Office: ENS 406 (512) 471-8011
Finger bgrayson@orac.ece.utexas.edu for PGP key.