netbsd-users: Re: soft updates Re: Summer of code ideas

Subject: Re: soft updates Re: Summer of code ideas
To: None <netbsd-users@netbsd.org>
From: Dieter <netbsd@sopwith.solgatos.com>
List: netbsd-users
Date: 04/10/2007 20:56:27
> > > >> > What mess?  My understanding is that with soft updates, the only
> > > >> > thing that can possibly happen is that disk space can be lost.
> > > >> > The background fsck is for reclaiming this lost space.
> > > >>
> > > >> That's both the idea and the promise. For better or worse, that has
> > > >> not=3D20
> > > >> been the experience of a number of users.
> > > >
> > > > Did these users have their disk's write caches set to write-through
> > > > mode rather than write-back mode?
> > >=20
> > > For one yes
> >=20
> > So no one believes Usenix papers by respected authors, but we have one
> > report from an unknown user with unknown hardware that softdep allegedly
> > caused lossage and therefore no one trusts softdeps?
> 
> Do you honestly think that we based our opinion of our softdeps
> implementation on one user's experience?

The message I responded to said one.

> > > > NetBSD doesn't do this by default, you have to add code to /etc/rc.lo=
> cal.
> > >=20
> > > If we get a journalling implementation that's something it can take care
> > > of, whether it's PATA (flush the cache) or SCSI, where it can force acc=
> ess
> > > to the disc.
> >=20
> > Flushing the entire cache is unnecessary and will kill performance.
> > We only need to force the order for the metadata, for everything else
> > we want to allow the disk to order the writes for the best performance.
> > The way to do that is queuing.  IIRC *BSD has queuing for SCSI, but
> > I haven't been able to find support for SATA's NCQ.  Where is the NCQ
> > support?
> 
> That is not correct. We do not need queuing in the drive.

We need queuing in the drive if we want decent write performance.

> We can do the queuing in the kernel by dispatching the writes in the right=
> =20
> order. When one finishes, dispatch the next.

That's what we have now, and write performance sucks big time.

> The bigger problem with trying to fix this with queuing is that we still
> haven't fixed the issue. Queuing says, "Complete this before startintg
> that."

That's not what I got out the the NCQ info I've read.  NCQ allows
you to send multiple writes to the disk in parallel, and get separate
responses to each.  The disk is allowed to reorder the commands for
performance.  If you care about write order (e.g. fs metadata),
you wait for a response before sending the next related write.
But writes that don't depend on waiting for the first block to hit
the platter can be sent in parallel.  So the few writes for the
directory entry and inodes and such get done in the correct order,
and the 20 GB of data gets written out of order but fast.

> As long as "completing" an operation doesn't necessarily imply
> writing it to disk (which i the whole point of the cache),

Again, you MUST put the disk's write cache in write-through mode rather
than write-back mode.  If you leave them in write-back mode you are
asking for trouble.  On the PATA & SATA disks I've looked at, they
power up into write-back mode and setting write-through mode doesn't
survive a power cycle.  (As if a disk drive has a shortage of non-volatile
storage!)  The SCSI disks I've looked at did power up in write-through
mode, although I don't know what the factory default was.  (Your disks
may vary.)  Given that many disks default to write-back mode, and that
NetBSD does not fix it, I suspect that this is the source of many cases
of corruption.

Unfortunately write performance with write-through mode but without
queuing sucks.  We NEED support for NCQ to get decent write performance.

>  we need to know that a given operation has been committed to disk

Yes, that is what putting the cache in write-through mode gets you.

But to get decent write performance we need queuing, so that the
kernel can enforce write order when it needs to, and allow the
disk to sort writes for performance when write order doesn't matter.