Re: Snapshots in tmpfs

To: tech-kern%netbsd.org@localhost
Subject: Re: Snapshots in tmpfs
From: David Young <dyoung%pobox.com@localhost>
Date: Wed, 29 Feb 2012 18:45:41 -0600

On Thu, Feb 23, 2012 at 08:04:01PM -0500, Thor Lancelot Simon wrote:
> On Fri, Feb 24, 2012 at 12:45:32AM +0000, David Holland wrote:
> > On Thu, Feb 23, 2012 at 11:20:18PM +0000, David Holland wrote:
> >  > > > Is CHFS really suitable for CompactFlash?  Is LFS even usable?
> >  > > 
> >  > > No 
> >  > 
> >  > I thought the whole point of chfs was to be able to operate on raw
> >  > flash devices that don't have their own flash translation layer.
> > 
> > Oh, my mistake, since there was concern about filesystem type I
> > thought you were talking about raw flash, but apparently CompactFlash
> > is not raw flash, same as USB sticks aren't.
> > 
> > In that case, just use wapbl.
> 
> That doubles the write rate for the common "create new version of
> file and rename into place" pattern...
> 
> Translation layer or not, doubling the write rate to any type of
> flash is not a great idea.

One way to hold writes to flash down to a very low rate is to keep files
that change in a tmpfs, and everything else in a read-only FFS.

Sometimes the files that change need to persist across reboots and power
failures.  One way to make them persist is to periodically write a
checkpoint of the tmpfs containing those files to flash.  After a reset
or power failure, use the last checkpoint to restore the tmpfs.

One way to store the checkpoints is to reserve a partition on flash for
receiving them.  You don't put a filesystem on the checkpoint partition,
but you treat it like a (circular) tape with big blocks.  Ideally, the
block size is a multiple of the biggest block size that the flash uses.

To create a checkpoint of your tmpfs, first you create a (possibly
read-only) snapshot of it: in this way you can write a self-consistent
checkpoint, containing the tmpfs contents at a moment in time, without
suspending tmpfs activity.  Write the checkpoint to the first half of
the checkpoint partition with something like this:

{
        checkpoint_header       # writes checkpoint magic, a checkpoint
                                # generation number, checkpoint date & time
        cd $tmpfs_mountpoint
        pax -w . | gzip
        checkpoint_trailer      # SHA1 sum of previous
} | dd obs=$big_block_size seek=$checkpoint_offset of=$checkpoint_partition

Finally, destroy the snapshot.

Write checkpoints to alternate halves of the checkpoint partition: the
2nd checkpoint to the 2nd half of the checkpoint partition, the 3rd to
the 1st half, 4th to the 2nd half, and so on.

The latest complete checkpoint is the one with the greatest generation
number of all checkpoints with a correct sum.

(It's possible to be fancy, reserving space both for complete
checkpoints and for "partials"---think partial backups.)

This checkpoint scheme has the interesting property that once the kernel
part, the tmpfs snapshots, is done, you can write the rest using a
Bourne shell script, and there are countless alternate scripts that you
could write.  Also, you can write the checkpoints at the full bandwidth
of whichever device receives them, which can be very fast indeed!

Dave

-- 
David Young
dyoung%pobox.com@localhost    Urbana, IL    (217) 721-9981

Follow-Ups:
- Re: Snapshots in tmpfs
  - From: David Holland

References:
- Snapshots in tmpfs
  - From: Manuel Wiesinger
- Re: Snapshots in tmpfs
  - From: David Young
- Re: Snapshots in tmpfs
  - From: David Holland
- Re: Snapshots in tmpfs
  - From: David Young
- Re: Snapshots in tmpfs
  - From: Adam Hoka
- Re: Snapshots in tmpfs
  - From: David Holland
- Re: Snapshots in tmpfs
  - From: David Holland
- Re: Snapshots in tmpfs
  - From: Thor Lancelot Simon

Prev by Date: Re: SSD "trim" support
Next by Date: Re: maximum number of CPUs
Previous by Thread: Re: Snapshots in tmpfs
Next by Thread: Re: Snapshots in tmpfs
Indexes:

Home | Main Index | Thread Index | Old Index