Subject: IO throttle VOP
To: None <tech-kern@netbsd.org>
From: Frank van der Linden <fvdl@wasabisystems.com>
List: tech-kern
Date: 12/16/2001 20:47:31
The subject of throttling I/O has come up a couple of times. One case
was one process essentially hogging system resources with UBC, by
doing, say, a large local copy operation.

Another problem is the theoretically unbound growth of softdep data
structures. This problem was alleviated recently when Chuq made
softdeps use pools, so that it doesn't eat away at kmem_map. However,
it's still possible to eat all available physical memory by starting
enough parallel rm -rf processes on system with not that much memory
(I could make softdeps take up 28M of RAM that way).

For the latter, it's possible to come up with heuristic schemes
that survive most loads. We apply one such heuristic, and FreeBSD
has additional measures in the softdep code. However, it's hard
to get 'right', since actually waiting for resources inside the
softdep code is dangerous in places, as the process will be holding
vnode locks, but you may not be quite sure which, and sleeping can
cause deadlocks.

I'd like to propose a more generic solution which also, in the future,
can address the first problem. That solution is to add a new VOP:

	int VOP_THROTTLE(struct vnode *vp, int op, struct proc *p)

Where 'op' is REMOVE, WRITE, READ, MKDIR, etc.

This VOP would be called from the toplevel of the kernel filesystem
code, i.e. the system call entries, since this is at a level where all
vnode locks that the process holds are known. sys_unlink might call
VOP_THROTTLE(vp, REMOVE, p), for example.

The function would, if needed, sleep until resources are available
(or until whatever conditions it imposes have been met).

If it sleeps, it would drop the lock on 'vp' (which must be held
on entry), and re-acquire it when woken up. Some system call
code may have to drop and re-acquire other vnode locks as well
(some hold 2 after namei() returns). That's not perfect, but
I'm not sure how to do it otherwise, I'm open to suggestions
on how to fix that differently. Maybe pass in an array of vnodes
that need to be locked/relocked iff the call will sleep. In which
case the call would look like:

	int VOP_THROTTLE(struct vnode *vp, struct vnode *vpp,
			 int todrop, struct proc *p);

Comments/suggestions?

- Frank

-- 
Frank van der Linden                           fvdl@wasabisystems.com
======================================================================
Quality NetBSD CDs, Support & Service.   http://www.wasabisystems.com/