Subject: Re: wd, disk write cache, sync cache, and softdep.
To: Eric Haszlakiewicz <erh@jodi.nimenees.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 12/20/2004 09:52:51
--kORqDWCi7qDJ0mEj
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Dec 17, 2004 at 11:55:18PM -0600, Eric Haszlakiewicz wrote:
> On Fri, Dec 17, 2004 at 05:36:13PM -0800, Bill Studenmund wrote:
> > On Fri, Dec 17, 2004 at 01:56:46PM -0600, Eric Haszlakiewicz wrote:
> > > 	How can it know A1 is done if there's no way to tell the drive to
> > > flush its cache?  A possible timeline I see is:
> >=20
> > You assert FUA for write A1.
> >=20
> > > send A1, drive says OK,
> > > send A2, drive says OK,
> > > send A3+FUA, drive writes A3 (due to FUA), drive says OK,
> > > drive writes A2, drive writes A1.
> >=20
> > You assert FUA for EACH write.
>=20
> 	um... ok.  so what data do you NOT assert FUA on?

File data. Or more specifically, anything you don't have ordering=20
constraints on.

> In my rather vague view of what a filesystem might do during some
> operation, I was thinking that it will go and write some data, which
> isn't entirely critical to get to the platter right away.  Then, some
> time later it will write some other piece of data that needs to be
> written out right away, but also needs the first piece to be already
> permanently stored.  Like say an inode for piece one, and the directory
> newly referring to the inode for piece two.

That's not so easy. The main thing would be how do we keep track of the=20
part that didn't matter much at the time. At least that's the question=20
that made my head spin. :-)

Two ways to handle that are: 1) issue that initial write as FUA but mark=20
it as asynchronous in the kernel. That way outstanding transaction=20
tracking will let us know if it's been written or not. 2) just before the=
=20
second write, issue a SYNCHRONIZE CACHE command (remember the SCSI command=
=20
can be given a starting offset and a run length) to cover the first write,=
=20
and have the second write not start until the sync is done.

Take care,

Bill

--kORqDWCi7qDJ0mEj
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFBxxFyWz+3JHUci9cRAv/gAJwJo1EkuePPHqYdj/HdX4Ci7hEctwCgi6iE
fwDB56oVHStbEggaVzz8F0k=
=PR4Z
-----END PGP SIGNATURE-----

--kORqDWCi7qDJ0mEj--