netbsd-users: Re: soft updates Re: Summer of code ideas

Subject: Re: soft updates Re: Summer of code ideas
To: Dieter <netbsd@sopwith.solgatos.com>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: netbsd-users
Date: 04/10/2007 15:33:04
--kORqDWCi7qDJ0mEj
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Apr 07, 2007 at 11:16:59AM +0100, Dieter wrote:
> > >> >
> > >> > What mess?  My understanding is that with soft updates, the only
> > >> > thing that can possibly happen is that disk space can be lost.
> > >> > The background fsck is for reclaiming this lost space.
> > >>
> > >> That's both the idea and the promise. For better or worse, that has
> > >> not=3D20
> > >> been the experience of a number of users.
> > >
> > > Did these users have their disk's write caches set to write-through
> > > mode rather than write-back mode?
> >=20
> > For one yes
>=20
> So no one believes Usenix papers by respected authors, but we have one
> report from an unknown user with unknown hardware that softdep allegedly
> caused lossage and therefore no one trusts softdeps?

Do you honestly think that we based our opinion of our softdeps
implementation on one user's experience? Our opinions of our softdeps=20
implementation have been shaped by our experiences over time. That's many=
=20
users over many years.

You're coming into this discussion years late. You're telling us things we
already knew yet not bothering to ask about the additional things that
went into shaping our thinking.

> > > NetBSD doesn't do this by default, you have to add code to /etc/rc.lo=
cal.
> >=20
> > If we get a journalling implementation that's something it can take care
> > of, whether it's PATA (flush the cache) or SCSI, where it can force acc=
ess
> > to the disc.
>=20
> Flushing the entire cache is unnecessary and will kill performance.
> We only need to force the order for the metadata, for everything else
> we want to allow the disk to order the writes for the best performance.
> The way to do that is queuing.  IIRC *BSD has queuing for SCSI, but
> I haven't been able to find support for SATA's NCQ.  Where is the NCQ
> support?

That is not correct. We do not need queuing in the drive. We in fact don't=
=20
want queuing in the drive for this, as queuing, especially SCSI queuing,=20
applies to all tasks in the queue. We want writes to a specific metadata=20
update stream to be in order but we don't care about completion order=20
relative to other update streams. SCSI queuing, though, would tie them all=
=20
together.

We can do the queuing in the kernel by dispatching the writes in the right=
=20
order. When one finishes, dispatch the next.

The bigger problem with trying to fix this with queuing is that we still
haven't fixed the issue. Queuing says, "Complete this before startintg
that." As long as "completing" an operation doesn't necessarily imply
writing it to disk (which i the whole point of the cache), we have a
problem. What we need is a form of FUA (Force Unit Access) support; we
need to know that a given operation has been committed to disk (or to a
BBU).

Take care,

Bill

--kORqDWCi7qDJ0mEj
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iD8DBQFGHBCgWz+3JHUci9cRAmDCAJ47IwpcT/uSk7DPKVVmxC2hizyqxQCfTbyL
yo/Yu4S/YF7OgeOjC/rcS3U=
=Q+t2
-----END PGP SIGNATURE-----

--kORqDWCi7qDJ0mEj--