Subject: Re: IO throttle VOP
To: David Laight <David.Laight@btinternet.com>
From: Frank van der Linden <fvdl@wasabisystems.com>
List: tech-kern
Date: 12/16/2001 23:43:02
On Sun, Dec 16, 2001 at 09:46:27PM -0000, David Laight wrote:
> Don't think that helps?

It does actually, I did a special-cased sample implementation
as a proof of concept, and it works for the softdep case.

> What would you do with the layered filesystems?

Not sure what you mean.. those would usually just pass down the
VOP to the lower layer.

> - some method of 'callback' from the vm system into certain drivers
>   (eg softdeps) to request than memory be freed if possible.
> - noting the 'resource allocation rate' of processes and reducing the
>   priority of those where it is high.
> - making kernel code that is very likely to need memory request it
>   before locking too many structures - maybe under some 'busy'
>   conditions you get a call to 'unwind' part of the request until
>   the resource is available.
> (if you can ever guess which one it is!)

These all seem sane suggestions, but the softdep case doesn't really
allow for these. Let me explain.

The concept of softdeps is that, in order to speed things up, you
don't synchronously write metadata, but instead you keep a list
of dependencies in memory, which can be used to write it in a
delayed fashion. This should give you a consistent on-disk state.

Dependencies are allocated in the process' create/remove/mkdir/etc
path. They are deallocated after they have eventually been pushed
to disk. Pushing this metadata out to disk is done, just as for
'plain' data, by the syncer process.

So far, so good. Now, under heavy metadata usage (like simultaneous
rm -rf's on a few large source trees, such as pkgsrc), dependencies
may be allocated at such a pace that the syncer can't keep up,
and memory starts filling up with dependency structures. This
must be avoided. Currently, some heuristic limits are used for
memory usage, above which the syncer is pushed into action more
actively. But there is no guarantee that the syncer will outrun
the process which produces the metadata changes. Yes, you can
make the process of lower priority. Wouldn't work on SMP systems
though, where the process could run on another CPU, and still
overtake the syncer.

Callbacks to free resources are also troublesome, since pushing out
softdeps may mean having to take some locks, possibly vnode locks.
Also, resource usage may temporarily increase when pushing them
out. So, you must do it from a controlled environment, in which
you know you can't get into deadlock trouble. The syncer process
is such an environment. Others (like the pagedaemon, or even
from any other process as part of a callback) will likely lead
to disaster.

The only way to enforce an upper limit for softdeps, is to
make (user) processes wait until resources are available *before*
they engage in metadata activity. The same likely applies
to other resources.

- Frank

Frank van der Linden                           fvdl@wasabisystems.com
======================================================================
Quality NetBSD CDs, Support & Service.   http://www.wasabisystems.com/