Subject: Re: fsck
To: Jon Ribbens <jon@oaktree.co.uk>
From: Jim Reid <jim@mpn.cp.philips.com>
List: netbsd-help
Date: 04/07/1997 19:42:27
>>>>> "Jon" == Jon Ribbens <jon@oaktree.co.uk> writes:

    Jon> I knew I shouldn't've lent my O'Reilly Essential System
    Jon> Administration to someone else.

Personally, I prefer O'Reilly's set of the BSD manuals. If you're a
novice to UNIX sysadmin, "UNIX System Administration" by Nemeth, Hein
and Seebass is the best choice, though I don't like their jokey style
of presentation.

    >> This is an area where experience and judgement help. For
    >> example, if there are lots of errors, it usually means the disk
    >> is bust or on its way out: probably a head crash.

    Jon> How many is 'lots'? We got around ten pages full of
    Jon> diagnostics when running fsck after the reset.

That's lots. Unless of course, they're all reports about unlinked
files which belonged in the one huge and now missing directory....
My previous comment about experience and judgement should indicate
what constitutes "lots".

    >> BTW, I have to differ with your comments about
    >> "incomprehensible errors" from fsck. The reports and questions
    >> it generates are clear enough to me, though admittedly it does
    >> help if you know how a UNIX filesystem and the ffs in
    >> particular is organised on disk.

    Jon> You are joking, I presume? "DUP/BAD", "BAD" "EXCESSIVE DUP
    Jon> BLKS" aren't all that comprehensible to me. I know vaguely
    Jon> how the ffs filesystem works (or at least, I did. I've mostly
    Jon> forgot).

"DUP" is short for duplicated, same as in the system call. "BAD" is
short for bad - the opposite of good. "BLKS" is short for blocks. (blk
is a common abbreviation for block in UNIX kernel/user code.) Seems
pretty clear to me so far. "DUP/BAD" means fsck doesn't like the inode
because of a mangled block (pointer) and it can't tell if it's a
duplicated one - ie a block claimed by two or more files - or a bad
one - ie full of crap. "EXCESSIVE DUP BLKS" means that too many blocks
have been found to be duplicates: hence fsck's confusion over bad and
duplicated block pointers. Even so, it is still clear enough for
me. If you *still* want chapter and verse, consult the guide in
/usr/share/doc/smm.

    Jon>  fsck could be a lot more helpful than it is.

I suppose so, if you like chatty programs. OTOH, if you think like a
moron - I'm not suggesting you are - fsck *very* helpful. It could
hardly be more helpful. [Be grateful you weren't around in the old
days when folk had to use dcheck, ncheck and clri to fix the
filesystem.] Just consider fsck's questions to be: "I've found
something wrong. Do you want me to fix it?" Answer "yes" and make a
note of any files it deletes so you can recover them from backup. If
there are "too many" errors - you choose how many - give up. Replace
or reformat the disk, then do newfs and restore. In any case, repeat
the fsck until it says the filesystems are clean.

    Jon> Uh-uh. The first time we ran fsck, it complained about the
    Jon> files and asked us whether we wanted to remove them. The
    Jon> second time we ran it, it complained they were removed, and
    Jon> put them back (in lost+found, since the first fsck had
    Jon> destroyed the directory information). We let it do everything
    Jon> it wanted to.

This doesn't make sense. If the files were removed and their inodes
cleared on the first fsck, there would be nothing for the second to
recover as the relevant filesystem metadata would have been zeroed on
disk. Perhaps some files were removed first time round because of the
dup/bad blocks and some of those files were directories? If that was
the case, the first run of fsck will have patched up the mangled
directories as best it could, perhaps simply by deleting them. The
second fsck would find all the files orphaned as a result of the
corrupted directories getting cleared.

Looks like you've been very unlucky. This is why sync, shutdown and
dump are the system administrator's best friends..... and fsck for
that matter... :-)