Subject: Re: Playing with dkwedge
To: Bill Studenmund <wrstuden@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-kern
Date: 08/24/2005 21:49:35
On Wed, Aug 24, 2005 at 12:37:51PM -0700, Bill Studenmund wrote:
> 
> Yes and no. That's what softdeps uses snapshots for, or one of the things 
> it uses them for. However snapshots are more for being able to make 
> self-consistent backups and for simple "undelete" (deleted something by 
> mistake? Chances are it's in the snapshot, so just bring it back).

I meant snapshot are made to take a fixed, consistent image of a live
filesystem. So it's possible to run fsck on this to detect problems.
Of course there are other uses too :)

> 
> The problem I see with what you're proposing, using fsck as a disk 
> reliability verification tool, is that that's not what it was designed 
> for. While I do not doubt that it really really helped you, I do not think 
> we should make this a recomended practice.
> 
> If you (or I) really care about the data, we should be using a RAID 5 or 
> better. And we should have a program that verifies parity. Not just reads 
> the whole disk, but verifies each stripe's parity. Run it say once a week 
> on the whole array, and things are good.

Yes, that would be the best choise. Unless you're using a hardware RAID,
in which case you can't do this check (and if the hardware controller does,
you have to trust it).

> 
> The problem with fsck is that you really just got lucky. fsck wouldn't 
> notice if the cache messed up reading file data. It also won't really 
> notice (AFAICT) if it gets passed incorrect-but-sensible-looking data at 
> certain points.

Yes, I was lucky. But I think fsck does enouth checks that random
corruption caused by hardware problems will detected quickly.

And, we can't afford to use ECC memory and RAID everywhere. A periodic fsck
helps detect hardware problems (not talking about software bugs :), and
it would be a shame to loose this.


> 
> So if we want to do something, let's use the right tool for it.
> 
> > BTW, we should probably add the -x and -X options to fsck, similar to
> > dump(8).
> 
> What options are those? I do not see them in our dump(8).

From a 3.0_BETA system:
     -x snap-backup
             Use a snapshot with snap-backup as backup for this dump.  See
             fss(4) for more details.  Snapshot support is experimental.  Be
             sure you have a backup before you use it.

     -X      Similar to -x but uses a file system internal snapshot on the
             file system to be dumped.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--