Subject: Re: wd, disk write cache, sync cache, and softdep.
To: Steven M. Bellovin <smb@research.att.com>
From: Charles M. Hannum <abuse@spamalicious.com>
List: tech-kern
Date: 12/16/2004 22:31:15
On Thursday 16 December 2004 22:03, Steven M. Bellovin wrote:
> Examples are fine; the trick is to figure out the right answer(s) for
> the important cases, notably FFS. (You're quite correct that in
> generaly, *two* synchronize requests are required for each critical
> block -- one to make sure that everything ahead of it is flushed, and
> one to ensure that the critical block itself is written immediately.
Really, this boils down to globally serializing I/O again. To be blunt, any
such idea is a non-starter. The performance is so phenomenally bad in normal
cases that it simply cannot be shipped. (Been there, done that.) You will
*not* be providing users with a more "robust" system, because they will
simply switch to something else that performs much better.
Keep in mind that we're not just talking about the performance of many
transactions at once. There is self-limiting behavior here (as there is in
the loss of I/O sorting inherent in the use of tagged queueing) that kicks in
at a certain point. A critical problem is what it does to performance in the
presence of a *small* set of transactions, as is typical for, say, a desktop
system.
> I've often argued that it's pointless to do the wrong thing quickly,
> but people should at least know the tradeoffs.
You're assuming in that statement that systems are currently doing "the wrong
thing." To make such an assertion, though, you would have to define what
"the wrong thing" is.
ATA disks, for example, guarantee (as strongly as they make any other
guarantee, including whether you'll be able to read back the data at all)
that all blocks cached for writing will eventually be written out, even if
they have to be spared to do so. Yes, there are things that violate this
guarantee -- mostly having to do with catastrophic failures that would make
the drive unreadable anyway. The one major exception is power loss, but most
"critical" systems have backup power.
The fundamental question here is what risk you're willing to trade off for
what performance. In general, the actual risk is quite low, and users would
rather have excellent performance. Large systems have backup power and
backups. They're not concerned with this level of nit-picking.
And for those that are concerned with this level of detail... we have much
bigger problems in our file system code that need to be solved first.