tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: FFS write coalescing

On Tue, Dec 04, 2012 at 09:59:46AM +0300, Alan Barrett wrote:
 > >the genfs code also never writes clean pages to disk, even though for
 > >RAID5 storage it would likely be more efficient to write clean pages
 > >that are in the same stripe as dirty pages if that would avoid issuing
 > >partial-stripe writes.  (which is basically another way of saying
 > >what david said.)
 > Perhaps there should be a way for block devices to report at least three
 > block sizes:
 > a) smallest possible block size (512 for almost all disks)
 > b) smallest efficient block size and alignment (4k for modern disks,
 > stripe size for raid)
 > c) largest possible size (a device and bus-dependent variant of MAXPHYS)
 > Then the file system could use (b) to know when it's a good idea to
 > combine dirty and clean pages into the same write.

As I was saying in the other thread, what filesystems really want to
know is the atomic write size. E.g. in ffs this affects the way
directories are laid out and is necessary (AFAIK including with wapbl)
for ~safe operation. This is not (a), and as far as I know it is also
not (b); see below.

I don't see (a) as useful. It is conceivable that a journaled FS might
want to know about it to allow packing journal records as tightly as
possible, but doing so is rather dubious from a recovery POV: the
point of flushing a journal is to get it physically onto disk safely,
and if you later let the disk rewrite part of what you thought was
safely on disk, it might cease to be safely on disk and break your
recovery scheme.

What guarantees do we actually get in practice for RAID5? Do you have
to commit journals in units of a whole stripe or stripe group to avoid
having them rewritten unsafely later? Or is the parity logging code
sufficient to make that safe? This matters for wapbl...

David A. Holland

Home | Main Index | Thread Index | Old Index