Subject: Re: recovering from a bad crash, ffs recovery
To: Charles Shannon Hendrix <shannon@widomaker.com>
From: Steven M. Bellovin <smb@research.att.com>
List: netbsd-users
Date: 01/05/2004 21:16:25
In message <20040105004435.GA1775@widomaker.com>, Charles Shannon Hendrix write
s:
>Sun, 04 Jan 2004 @ 19:09 -0500, Charles Shannon Hendrix said:
>
>> Right now, I'm not sure what happened, except the kernel on first
>> booting was confused about the drive layout. However, that shouldn't
>> have caused any overwriting of the filesystem.
>
>Whoops... one thing I hadn't thought about is /tmp being wiped.
>
>When my system booted with the wrong drive order, it is possible that
>slice d on the wiped drive was mounted as /tmp, because sd0d is the
>slice for /tmp normally.
Unfortunately, I don't have any suggestions for how to recover your
data. But maybe we can fix the system so that this doesn't happen to
someone else.
The first and easiest thing to do is to fix /etc/rc.d/cleartmp so that
it doesn't clear the directory unless a sentinel file -- something like
/tmp/.ThisReallyIsTmp -- exists. It would also create such a file
after clearing out everything else -- but *only* if it did the deletes,
i.e., if the file had existed previously.
Another solution is local -- get rid of kernel config lines like
sd* at scsibus? target ? lun ?
since they're invitations to disaster if some drive isn't there. Once
you know what your configuration really is, use explicit lines:
sd1 at scsibus0 target 4 lun 0
or some such. If you must use the generic lines, make sure that /tmp
is on the highest-numbered disk (but beware adding a new drive!).
But these just a patch to stop the bleeding. The real solution is to
name drives symbolically. Drives are labeled -- see disklabel(8).
Suppose that there was an rc.d script that read the disklabels on all
drives, and created directories /dev/dsk-<label>, one per drive.
(Well, two -- we'd also need /dev/rdsk-<label>.) Inside each directory
would be device files for a-h, but only those that appeared to exist
according to disklabel. (It would be nice if there were a file system
name for each partition in the disk label, but there doesn't seem to
be. Maybe take it from the superblock for file systems that have such
a thing?)
You see where I'm heading. You'd mount /tmp on /dev/disk-pack2/tmp or
some such.
Note that this isn't a new idea. I used it on Amdahl's "Au" port of
7th Edition Unix about 25 years ago. I think that they invented it,
though there's a slight chance it was me -- I've been burned by such
problems long ago. But we can go back 15 years before that -- IBM's
operating systems for its 360 line had labels on disks and tapes, and
you couldn't write to a medium if you had the label wrong unless you
explicitly overrode it. For all I know, it went back even further, but
my experience with operating systems for second generation machines was
too limited.
--Steve Bellovin, http://www.research.att.com/~smb