tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL

> Some quick thoughts, though:
> (1) ultimately it's necessary to patch each driver to crosscheck the
> flag, because otherwise eventually there'll be silent problems.

Maybe. I think I like having this as responsibility on the caller for
now, avoids too broad tree changes. Ultimately it might indeed be
necessary, if we find out that it can't be reasonably be handled by
the caller. Like maybe raidframe kicking in spare disk without FUA
into set with FUA.

> (2) it would be better not to expose hardware-specific flags in the
> buffercache, so it would be better to come up with a name that
> reflects the semantics, and a semantic guarantee that's at least
> notionally not hardware-specific.

I want to avoid unnecessary private NetBSD nomenclature. If storage
industry calls it FUA, it's probably good to just call it FUA.

For DPO it's not so clear cut maybe. We could reuse B_NOCACHE maybe
for the same functionality, but not sure if it matches with  what swap
is using this flag for. DPO is ideal for journal writes however,
that's why I want to add the support for it now.

> (3) as I recall (can you remind those of us not currently embedded in
> this stuff what the semantics of FUA actually are?) FUA is *not* a
> write barrier (as in, all writes before happen before all writes
> after) and since write barriers are a natural expression of the
> requirements for many fses, it would be well to make sure the
> implementation of this doesn't conflict with that.

FUA doesn't enforce any barriers. It merely changes the sematics of
the write request - the hardware will return success response only
after the data is written to non-volatile media.

Any barriers required by filesystem sematics need to be handled by the
fs code, same as now with DIOCCACHESYNC.

I've talked about adding some kind of generic barrier support in the
previous thread. After thinking about it, and reading more, I'm not
convinced it's necessary. Incidentally, Linux has moved away from the
generic barriers and pushed the logic into their fs code, which can
DTRT with e.g. journal transactions, too.

> (3a) Also, past discussion of this stuff has centered around trying to
> identify a single coherent interface for fs code to use, with the
> expansion into whatever hardware semantics are available happening in
> the bufferio layer. This would prevent needing conditional logic on
> device features in every fs. However, AFAICR these discussions have
> never reached any clear conclusion. Do you have any opinion on that?

I think that I'd like to have at least two different places in kernel
needing particular interface before generalizing this into a bufferio
level. Or at minimum, I'd like to have it working on one place
correctly, and then it can be generalized before using it on second
place. It would be awesome to use FUA e.g. for fsync(2), but let's not
get too ahead of ourselves.

We don't commit too much right now besides a B_* flag. I'd rather to
keep this raw and lean for now, and  concentrate on fixing the device
drivers to work with the flags correctly. Only then maybe come up with
interface to make it easier for general use.

I want to avoid broadening the scope too much. Especially since I want
to introduce SATA NCQ support within next few months, which might need
some tweaks to the semantics again.

> We don't want to block improvements to wapbl while we figure out the
> one true device interface, but on the other hand I'd rather not
> acquire a new set of long-term hacks. Stuff like the "logic" wapbl
> uses to intercept the synchronous writes issued by the FFS code is
> very expensive to get rid of later.

Yes, that funny bwrite() not being real bwrite() until issued for
second time from WAPBL :) Quite ugly. It's shame the B_LOCKED hack is
not really extensible to cover also data in journal, as it holds all
transaction data in memory.


Home | Main Index | Thread Index | Old Index