Subject: Re: wd, disk write cache, sync cache, and softdep.
To: Bill Studenmund <>
From: Daniel Carosone <>
List: tech-kern
Date: 12/17/2004 07:39:35
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Dec 16, 2004 at 10:27:37AM -0800, Bill Studenmund wrote:
> > Today, another simpler alternative occurred to me.  What we want is
> > for the upper layers to have access to the multiple outstanding
> > commands that are implied in the disk's write cache, and confirmation
> > that they are on stable storage, even though we don't get individual
> > tagged completion events for them.
> Are you sure? I don't think those are quite the semantics we want.

I certainly don't want the disk's lies to propagate up into the
filesystem.  This idea is a way to manage the disk's lies, for the
sake of the efficiency and speed it gives us over a limited interface
that doesn't support tagged commands, and restore the true semantics
of what a completed write means.

I think those are exactly the semantics we want (and presently assume)
from a low-level block device.

Whether we also want more advanced semantics through other layers is a
separate question.

> > Conditions for issuing a synccache in this fashion might include:
> >  - all currently pending write requests in the completion queue
> Your first condition didn't parse. As I understand you, the completion=20
> queue will only contain pending write requests, so I don't see how that=
> can (or should) trigger a synccache. :-)

Sorry, let me elaborate:

The 'completion queue' perhaps should be called something like the
'pseudo-complete requests queue'.  These are write requests that have
been issued to the disk, and for which the disk has issued its
completion interrupt.  Formerly, they would have been biodone() from
that interrupt.  However, we know that they may well not actually be
complete, but only in cache, so we defer completing them to upper
layers until we next complete a SYNC CACHE command.  At that point, we
know that all previous psuedo-complete commands are now
really-complete, and can tell the upper layers so. =20

The completion queue contains write requests that might not yet be on
platters, but only in the disk write cache; they are unconfirmed
writes.  This is directly analogous to keeping track of pending tagged
commands to a scsi disk, except we can only do them as a collection
because of the limitations of the ide interface (pre-NCQ).

The first condition can be reworded, with verbs, as: We have no more
incoming write requests we could issue to the disk right now, and we
have a collection of pseudo-complete requests we need to confirm, so
we issue a sync cache command.  We would check this each time a
request is pseudo-completed (or probably in wdstart, which gets called

The second condition (some threshold number or size of requests since
the last sync) is also desirable, to stop a very long stream of writes
forcing the earliest ones to wait a long time in the completion queue.


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.2.6 (NetBSD)