Subject: Re: Porting Hammerfs (fwd)
To: Bill Stouder-Studenmund <wrstuden@netbsd.org>
From: Matthew Dillon <dillon@apollo.backplane.com>
List: tech-kern
Date: 12/11/2007 11:56:15
:I read that paragraph differently. I don't disagree that it could well be
:
:a gotcha about porting, but I took it as meaning they didn't have to roll
:
:a custom cache for their metadata stuff. They have btrees, so they have0
:more complicated in-core structures than say ffs does.
:
:I could be wrong though...
:
:Take care,
:
:Bill

    Yes, that's it exactly.  B-Tree's can be reasonably well cached but
    you can still wind up with a lot of code overhead if you have to
    dive into the OS and issue a lookup (bread() in the case of DFly/FBsd)
    every time you want to access a B-Tree node.  e.g. having to scan a
    6-deep B-Tree might require 6 bread()s JUST to find B-Tree element,
    then another bread() to access the data.  FFS requires maybe 1/3 the
    bread() calls to access the same data.

    HAMMER solves this problem by maintaining in-memory tracking structures
    for things like B-Tree nodes and those structures cache a pointer to the
    actual on-disk data, by pointing directly into the related buffer
    cache buffer.   The pointers are maintained even after HAMMER is 
    through with an operation.  HAMMER then relies on the OS to tell it when
    a buffer cache buffer should be flushed/recycled.  If the in-memory
    tracking structures are in-use (have a non-zero ref count), HAMMER 
    sets B_LOCKED which tells the OS not to throw away the buffer.  If
    the in-memory tracking structures are not in-use (ref count == 0),
    HAMMER disassociates the buffer cache buffer from the structure(s)
    and allows the OS to proceed.  This removes nearly all the bread()
    calls from the critical path.

    HAMMER also relies heavily on caching work in-memory and in the
    OS's buffer or VM page cache, in order to be able to flush the work
    out in larger chunks.  Ultimately I hope to have one B-Tree element
    represent potentially huge swaths of data instead of one 16K chunk.
    This ultimately is what will make HAMMER seek-efficient.  A B-Tree
    element in HAMMER is around 64 bytes verses the 4 (or 8) bytes FFS needs
    to represent a pointer to a disk block.  The more data that B-Tree
    element can represent, the better.

    That all said, I don't think it would be hard to port the buffer cache
    aspects of HAMMER.  I have most of the buffer cache ops isolated in
    a single support file specifically to make porting easier.

    In anycase, I really appreciate the interest.  I hope to have things
    in better shape by mid-January.

						-Matt