Subject: Re: mp->mnt_vnodelist change
To: None <tech-kern@NetBSD.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 10/19/2006 21:28:38
>> I would LOVE to handle lots of metadata updates in parallel.
> The programmatic context is wildly different, but we do have *a* way
> of answering that sort of question in rcorder(8).

I thought the same thing: in each case, we have a partial order for
input and a total order for output.  In the disk I/O case, though,
there are additional issues, such as wanting the output to be in
something like elevator-sort order as far as that's consistent with (a)
the input and (b) not doing excessive computation.

As for how to do this?  I'd probably keep an elevator-sort list of
blocks to write which have no antecedents, that is, blocks which could
be the next block written.  As each block is written, its dependents
lose that antecedent; each block that thus loses all its antecedents
goes into the elevator-sort list.

This means that blocks waiting to be written need to know which other
blocks their write must happen after, but you have to have that
knowledge around in order to do it at all (well, unless you do
something really egregious like inserting a write barrier after every
block with dependents, which totally wrecks the elevator-sort
optimization).

It means a data structure which can maintain elevator-sort order in
decent time, but that's not that hard.  You could do it with two heaps,
one for blocks yet to come on this pass and one for the next pass,
inserting into one or the other depending on whether the block is past
the hand.  I'm sure there are plenty of other alternatives too.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B