Subject: Re: recovering from a bad crash, ffs recovery
To: Charles Shannon Hendrix <shannon@widomaker.com>
From: Steven M. Bellovin <smb@research.att.com>
List: netbsd-users
Date: 01/05/2004 21:16:25
In message <20040105004435.GA1775@widomaker.com>, Charles Shannon Hendrix write
s:
>Sun, 04 Jan 2004 @ 19:09 -0500, Charles Shannon Hendrix said:
>
>> Right now, I'm not sure what happened, except the kernel on first
>> booting was confused about the drive layout.  However, that shouldn't
>> have caused any overwriting of the filesystem.  
>
>Whoops... one thing I hadn't thought about is /tmp being wiped.
>
>When my system booted with the wrong drive order, it is possible that
>slice d on the wiped drive was mounted as /tmp, because sd0d is the
>slice for /tmp normally.

Unfortunately, I don't have any suggestions for how to recover your 
data.  But maybe we can fix the system so that this doesn't happen to 
someone else.

The first and easiest thing to do is to fix /etc/rc.d/cleartmp so that 
it doesn't clear the directory unless a sentinel file -- something like 
/tmp/.ThisReallyIsTmp -- exists.  It would also create such a file 
after clearing out everything else -- but *only* if it did the deletes, 
i.e., if the file had existed previously.

Another solution is local -- get rid of kernel config lines like 

	sd*     at scsibus? target ? lun ?

since they're invitations to disaster if some drive isn't there.  Once 
you know what your configuration really is, use explicit lines:

	sd1	at scsibus0 target 4 lun 0

or some such.  If you must use the generic lines, make sure that /tmp 
is on the highest-numbered disk (but beware adding a new drive!).

But these just a patch to stop the bleeding.  The real solution is to 
name drives symbolically.  Drives are labeled -- see disklabel(8).  
Suppose that there was an rc.d script that read the disklabels on all 
drives, and created directories /dev/dsk-<label>, one per drive.  
(Well, two -- we'd also need /dev/rdsk-<label>.)  Inside each directory 
would be device files for a-h, but only those that appeared to exist 
according to disklabel.  (It would be nice if there were a file system 
name for each partition in the disk label, but there doesn't seem to 
be.  Maybe take it from the superblock for file systems that have such 
a thing?)

You see where I'm heading.  You'd mount /tmp on /dev/disk-pack2/tmp or 
some such.

Note that this isn't a new idea.  I used it on Amdahl's "Au" port of 
7th Edition Unix about 25 years ago.  I think that they invented it, 
though there's a slight chance it was me -- I've been burned by such 
problems long ago.  But we can go back 15 years before that -- IBM's 
operating systems for its 360 line had labels on disks and tapes, and 
you couldn't write to a medium if you had the label wrong unless you 
explicitly overrode it.  For all I know, it went back even further, but 
my experience with operating systems for second generation machines was 
too limited.


		--Steve Bellovin, http://www.research.att.com/~smb