tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Exposing FUA as alternative to DIOCCACHESYNC for WAPBL



> Now, it might be the case that the on-media integrity is not the
> primary goal. Then flush is only a write barrier, not integrity
> measure. In that case yes, ORDERED does keep the semantics (e.g.
> earlier journal writes are written before later journal writes).
So either I'm completely wrong or there's some fundamental confusion here.

Probably it's due to different interpretations of ``on-media integrity''.

In my world -- save fsync() or fdatasync() (which no doubt require something 
like FUA or a cache flush (but see below) -- the one and only point of not 
writing to disc asynchronously is to ensure that at all points in time 
(where the system may crash) the on-disc date is in a state that can be 
made consistent again by fsck (or, more recently, a log replay). And this, 
with all approaches to the problem known to me, requires guaranteeing a 
write order.
[Of course there's a silent assumption that the ``consistent state'' 
restored by fsck is somewhat close in time to the time of the crash, 
otherwise you could just newfs.]

> It does make stuff much easier to code, too - simply mark I/O as ORDERED 
> and fire, no need to explicitly wait for competition, and can drop e.g 
> journal locks faster.
Which doesn't surprise me because, in my understanding, it's the solution 
closest to the problem to be solved.

> I do think that it's important to concentrate on case where WCE is on,
> since that is realistically what majority of systems run with.
I still doubt that makes any difference in the design.

> Just for record, I can see these practical problems with ORDERED:
> 1. only available on SCSI, so still needs fallback barrier logic for
> less awesome hw
Yes, sure. But it would still be nice to have some OS caring about sensible 
hardware. If I need support for commodity PeeCee HW, I know where to find 
Linux or FreeBSD (where I would assume that FB's SCSI support may well be 
more advanced than NB's).

> 3. bufq processing needs special care for MPSAFE SCSI drivers, to
> prevent processing any further commands while I/O with ORDERED tag is
> being submitted to the controller.
I don't get that.

If you have two processes concurrently writing to disc directly, nobody 
guarantees an ordering of the writes issued by them. If the two processes
write through the FS, it's the FS's job to serialize that anyway. I'm 
probably missing something.

> I still see my FUA effor[t] as more direct replacement of the cache flushes
Yes, sure.


Of course, there's still the problem of too many programs out there issuing
fsync()s. As far as I remember, SQLite issues four syncs for a transactional
update. Firefox keps a SQLite database for cookies, open tabs, history and
whatnot. Each is updated several times a minute. In the end, a completely 
idling browser causes half a magabyte of NFS traffic per minute and in the 
order of ten journal flushes per minute. Multiply that by 150 clients.


Home | Main Index | Thread Index | Old Index