Subject: Re: wd, disk write cache, sync cache, and softdep.
To: Bill Studenmund <wrstuden@netbsd.org>
From: Daniel Carosone <dan@geek.com.au>
List: tech-kern
Date: 12/17/2004 11:50:58
--pWvls6SgojIh1mq8
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Dec 16, 2004 at 04:10:39PM -0800, Bill Studenmund wrote:
> > Exactly; this is somewhat akin to my earlier idea of a barrier-type
> > operation that percolates through the softdep trees until it hits the
> > disk and triggers a 'sync point'.  It would be nice to have,
> > certainly, but non-trivial at best to implement.
>=20
> I disagree. I agree it needs thought, but I think it can be done.

Great! :)

> I agree it's wrong to assume we don't have write caches when we do, but I=
=20
> think you tackle the problem backwards. Rather than hide the write caches=
,=20
> I think we need to start changing the upper code to deal with them.

This was my thinking originally, too.

And by all means, when the upper layers are ready to deal properly
with maybe-writes, we can expose this fully again.

But what is the upper layer?  Is it the FAT32 filesystem I share with
a windows install?  Is it raidframe-and-a-bit trying to do by-region
dirty flags?

> I'll describe more below, but ordered is NOT what we want.

Agreed. That's kinda the whole point - disabling write cache incurs
the cost penalty of making writes ordered. =20

However, if something was already making assumptions about ordered
writes using some form of sync flag, my point was that we can preserve
that assumption too by using that as one of the triggers for a sync
cache. That's already been rendedered moot, because nothing is.

> I agree ignoring the write caches is an issue, I just think we will do=20
> much better to deal with them rather than ignore them.

Again, I'm dealing with them a layer at a time.  The first layer was
simply to disable them. The second layer proposes to use them "safely"
at the block device interface.  As I fully acknowledge, and Charles so
gracefully reminds us, layers above this are beyond me - my previous
ideas about barriers involved wading fearfully into the softdep code
with arms waving wildly.  I'd love for someone more qualified to do
that, or even rewrite a new filesystem from scratch that addresses
many other shortcomings, and introduces no new bugs.

There will still be a place for a reliable block device that supports
multiple outstanding commands and traditional confirmed completions,
surely, just as there may be a place for one that doesn't care:
swapspace seems like one example where I don't care if the data is
still there after a crash or power failure, and if I can swap to disk
cache faster I'm happy. If this deferred-until-sure completion
behaviour for writes is something that is selectable per-partition,
and per-write once the fs knows how, all the better.

Another point: people keep saying this will have a performance
penalty. Not for me; it's a way for me to improve performance given
where I'm starting from. Hopefully, it gets me most of the way back to
the full speed of write cache, in the common and problem cases. When I
have it going, others can judge for themselves from their own starting
points.

--
Dan.
--pWvls6SgojIh1mq8
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (NetBSD)

iD8DBQFBwi1yEAVxvV4N66cRAhlmAKDpd89iQAov+aM4tLg5ffAzkjcwyQCgsp4G
s2MHkBLKKMh++WajZ0Afc1A=
=SodV
-----END PGP SIGNATURE-----

--pWvls6SgojIh1mq8--