Subject: Re: Extension of fsync_range() to permit forcing disk cache flushing
To: Alan Barrett <apb@cequrux.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 12/17/2004 17:22:06
--BI5RvnYi6R4T2M87
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Dec 17, 2004 at 12:00:26PM +0200, Alan Barrett wrote:
> On Thu, 16 Dec 2004, J Chapman Flack wrote:
> > Bill Studenmund writes:
> > > After discussing this with some developers, the best solution seems
> > > to be to add a flag to fsync_range() to force this behavior. Then
> > > pass a flag
> >
> > What would be the performance hit in making this the /default/
> > behavior of fsync and fsync_range?
>=20
> I always thought that fsync() and friends were guaranteed to commit the
> data to stable storage.  The fsync(2) man page sometimes says "written
> to permanent storage" and sometimes says "moved to a permanent storage
> device".

fsync(2) was added before writeback caches as best I can tell.

> The volatile write-back cache of a disk does not count as stable storage
> in my book, so I would expect that fsync() and friends were guaranteed
> to somehow get the data through the disk's cache to the platters.  I
> think it's a bug if that guarantee is not being met.  I suppose I
> can see a way of legalisticly twisting the words in the man page to
> mean "moving the data to a /permanent storage device/ is good enough,
> regardless of whether or not the device has really written the data
> to /permanent storage/, and a disk with a write-cache is a permanent
> storage device, regardless of whether or not the data has been written
> to the platters", but I think that that violates the POLA.

The thing is that exactly what guarantee people wants depends on the=20
person. If we leave fsync() as it is, we permit individual admins to=20
choose what happens. If we change fsync(), we greatly reduce the scope of=
=20
what can be done easily.

> It would be fine to have a set of flags telling fsync() things like
> "just get the data to the disk controller, don't worry about getting
> it to the disk", or "get the data from the controller to the disk,
> but don't worry about getting it through any on-disk cache to stable
> storage", or "make sure that the data gets to stable storage, but an
> NVRAM or battery backed RAM cache is good enough", or "not even NVRAM
> is good enough, make sure that the data really gets to it's ultimate
> destination".  But I'd like the default behaviour to match what I have
> always believed that fsync was guaranteed to do: get the data to stable
> storage (NVRAM or battery-backed RAM is good enough).

Last sentance first: then turn off your write caches. You will then get=20
that behavior. :-)

Most applications don't care about the data - they trust the OS won't lose
it. Some (many) applications care more, and want to take an extra step for
safety. fsync() is their friend. The administrator can configure caches as
he or she sees fit.

My application wants a lot of control. Sometimes it's ok that written data=
=20
sits in the OS cache. Sometimes it's ok that it's in the disk cache.=20
Sometimes it REALLY REALLY needs to be out of the disk cache, regardless=20
of what the admin thinks.

Take care,

Bill

--BI5RvnYi6R4T2M87
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFBw4Y+Wz+3JHUci9cRAhiaAJ9qWqWGZZfJxr7LiyrxItSP7WuKgwCgme+n
tFR7jefmx0nnFibxsZn9Y5w=
=q4DS
-----END PGP SIGNATURE-----

--BI5RvnYi6R4T2M87--