tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: B_ARRIER (addresses wapbl performance?)



On Tue, Dec 09, 2008 at 10:32:29PM +0100, Manuel Bouyer wrote:
> I don't get what you mean with "flushing the journal content to disk".
> More exactly, I don't understand why it doesn't have to be done each time
> we're going to write blocks. Without it, you can end up with blocks being
> written before the corresponding journal entry, isn't it ?

Basically, you can write journal entries without having to write the
journalled blocks to the real location yet. Writing the journal entry is
good enough to fulfill the requirents of fsync (when the data was
already written or is itself journalled), so the actual destination
write can be defered. If a second write happens to the same location,
the writes can be aggregated as long as the topological order is kept.
So two writes that are journalled can be aggregated to the target
location (and the first write be dropped) if both journal entries and
all entries between them have been written already and the first entry
hasn't been purged from the tail.

> Either with the write cache disabled and doing the barrier in software
> (i.e. waiting for command completion before sending the next commands, up
> to the next write barrier), or using an ordered queue tag for journal
> writes (the issue with the latter being, as already pointed out, that
> it enforce the barrier for all commads, including those unrelated
> to this filesystem).

Well, one possible approach here (multiple filesystems on one disk
wanting to use barriers) is to have a barrier callback and for each item
on the disk queue a generation counter. The callback is called before
incrementing the generation counter and can be used to flush all pending
work that can be written in any order. This would mean that the disk can
order the writes between two barriers in any order and the filesystems
have a chance to do semi-intelligent decision making.

Joerg


Home | Main Index | Thread Index | Old Index