Subject: Re: soft updates Re: Summer of code ideas
To: Dieter <netbsd@sopwith.solgatos.com>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: netbsd-users
Date: 04/07/2007 15:27:53
On Sat, Apr 07, 2007 at 11:16:59AM +0100, Dieter wrote:
> 
> So no one believes Usenix papers by respected authors, but we have one
> report from an unknown user with unknown hardware that softdep allegedly
> caused lossage and therefore no one trusts softdeps?

When researchers repeatedly publish papers containing unreproducible
results, it's sensible to take their work with a grain of salt.

I think few people doubt the _theoretical_ work on filesystem efficiency
that's present in the series of papers that came out of CSRG as it
closed up shop (I'd personally include both Seltzer's and McKusick's
publications in filesystems over the last 15 years or so in this
category) but most of the content of those papers consists of empirical
claims often with little or no theoretical basis.

The early papers in this body of work contained a number of false
conclusions that turned out to stem from mistakes in the implementations
of the algorithms the papers described -- see the infamous
Seltzer/Ousterhout debate on LFS for some examples of these, including
LFS sorting blocks into backwards order when writing to disk -- and for
the later papers, there are two issues: first, that for the Seltzer
papers, it's proven impossible over the course of a decade to actually
get the code used to generate the papers' results, and second, for the
McKusick work, that the softdep code has had such an ugly lingering
history of bugs that it's been difficult to actually generalize from
the synthetic benchmarks detailed in the publications to users' actual
workloads (particularly in the area of the exposure of data to loss).

I don't think anyone denies that both log-structured filesystems and
directed dependency graphs in conventional filesystems are good ideas.
The problem is that the quality of the existing implementations in BSD
kernels is so low that there is a growing realization that synthetic
benchmark results from the literature in the area are not very relevant
to what users actually have available to deploy -- and when you couple
this with the exasperating problem that one can't actually get the code
used to generate the published results (in the cases where code was
offered at time of publication, it frequently didn't even compile, and
I and others, though we received polite and friendly responses to our
requests for the actual code, never actually managed to get what had
actually been used to generate the numbers), I think the research in the
area starts to look almost irrelevant to decisions about where
development priorities should be for production operating systems that
people actually have to use today.

Thor