Subject: Bug? Help w/ wd*/wt*
To: None <port-i386@NetBSD.ORG>
From: Brian C. Grayson <bgrayson@ece.utexas.edu>
List: port-i386
Date: 09/20/1996 00:04:14
  I've got a _big_ problem.  I finally got an old Archive
tape working on my i386 system at home, running NetBSD-i386
1.1.   The disk drive started acting a little flaky, but fsck
could fix things up real quick during reboot.  I figured the
flakiness was just because the Archive hardware is old ((c)
1985) and might not be behaving completely.  I was wrong.  :(

  To make a long story short, I made a non-rewinding device
(minor number 04, methinks) and called it /dev/nwt0.  I created
a bunch of dumps on /dev/wt0, and could read them fine.  I
started a dump on /dev/nwt0, the tape drive didn't start up,
and the system panicked.  When I brought it back up, partition
rwd0e had been hosed in a big way.  When I looked at /dev, I
noticed that the major device number for rwd0* is the same as
the major number for the wt* device -- 3.  And thus, device 3,4
also corresponds to /dev/rwd0e.  Hence, I probably shoved 30MB
of a dump onto my hard drive instead of the tape.  Does this
sound plausible?

  Disclaimer:  I am relatively new at sysadmin'ing, so I could
have done something really stupid.  If so, please enlighten me.
Yes, I probably should have checked MAKEDEV for major number
conflicts before I got too ambitious, but I didn't think of it.
Believe me, I'll never blindly trust MAKEDEV again.  If I'm
way off-base on some of my assumptions or comments,
let me know (hopefully in a kind tone?).

  Several questions:
  1.  Did I do something incorrect when setting up the
      non-rewinding version?  I'm guessing that for some
      reason when I used /dev/wt0, the kernel did the right
      thing most of the time, but for some reason it didn't
      for the nwt0.

  2.  Are there supposed to be major-number conflicts
      like this?  The default wt0 setup conflicts with rwd0a,
      and the nwt0 setup conflicts with rwd0e (mknod didn't
      catch this, nor did anything else).  Does anyone have a
      big table with what numbers are reserved for what, to
      prevent these personal catastrophes from happening in the
      future?  I don't know enough about device drivers to
      be _sure_ that conflicts are a bad thing, but if
      conflicts were okay, we could just make everything be
      major 0, minor 0!

  2b.  A quick check on ftp.netbsd.org showed the problem
      still around in MAKEDEV for -current i386 -- should this
      be fixed (with the accompanying changes to wt.c or
      wtreg.h or whatever) before 1.2 ships?  Believe me, I
      don't want to delay 1.2 any.  But this is a serious bug,
      if I didn't do something stupid, for people who happen to
      use the wt code.  (more important to _me_ than, say,
      bounce-buffer support!  :)  )

  3.  Is anyone else out there using an Archive 5945C tape
      drive, with an Archive SC400S controller?  It was a
      hand-me-down, with no docs, and I would like to know a
      bit more about its jumpers.  For that matter, is anyone
      else using the wt interface at all?  From what I've
      seen in my net-searches for info about the tape drive,
      it was popular on Suns at some point.

  4.  Is there any hope of recovering the filesystem, at
      least the portion past the 30MB worth of stuff that got
      overwritten by the dump?  Or is the filesystem going to
      be totally trashed, which is what I expect?  I've never
      used fsdb, and I figured after answering "yes" to
      fsck's prompts for the first dozen inodes that fsck might
      not be able to handle the severe damage.  It looked
      like it was prompting to clear the inode, which
      sounded like a bad thing.  Then again, the inode
      probably contained "Hello World" where the access
      times were supposed to be.  Advice?  Hope?  Sympathy?  :)

  Until I hear any advice, I think I'll just let the system
rest in peace for a while.  I've got my trusty Apple ][+
with the 14.4, so at least I can still do some research
from home.  Luckily, most of my 300MB /usr partition was
software that can be (painstakingly) downloaded again, but
/home was on there too.

  I can't believe I'm not in a worse mood -- I guess the
shock just hasn't hit me yet!

  Brian

  (And yes, I'm one of those students alluded to in recent
discussions that can't afford to upgrade to, say, a SCSI
controller and SCSI tape drive at the moment, in which case
bounce-buffer might matter to me!)
-- 
Brian Grayson (bgrayson@ece.utexas.edu)
Graduate Student, Electrical and Computer Engineering
The University of Texas at Austin
Office:  ENS 406       (512) 471-8011
Finger bgrayson@orac.ece.utexas.edu for PGP key.