Subject: softdep pacing
To: None <tech-kern@netbsd.org>
From: Paul Kranenburg <pk@cs.few.eur.nl>
List: tech-kern
Date: 02/20/2001 10:43:53
Ingredients:

	a Sparc SS20 running NetBSD-current
	a `softdep' mounted filesystem.
	start removal of a large tree on it (I did `rm -rf pkgsrc', where
	`pkgsrc' was in fact a checkout of the NetBSD package sources)

After crunching away on this for more than half an hour, the machine
paniced: `out of space in kmem_map'. The subsequent fsck recovered thousands
of orphaned files and directories (in fact, `lost+found' couldn't hold
them all; it took three passes of `fsck+fsdb' to fabricate enough dirs
so that all could be reconnected).

The stack trace was like this: unlink() -> ufs_dirremove() ->
softdep_change_linkcnt() -> inodedep_lookup() -> malloc()

I repeated this experiment and decided to watch `num_inodedep', and sure
enough, it rose to a value way beyond `max_softdeps'. At the time the
machine crashed again it was at about 5*max_softdeps.

Looking at the code in inodedep_lookup(), I note that the attempt to
speed up the syncer seems to have no effect on the softdep worklists
at all since it does not actually request a flush to be started at that
point (see variable `req_clear_inodedeps').

When finally the number of `rushjobs' maxes out, work on the inode queue
is started in request_cleanup(). Yet, at this point the process is
allowing the drain only a fixed amount of time (currently 2 ticks), 
which, evidently, is not adequate in all circumstances.

As a possible improvement, I'd suggest changing softdep_process_worklist()
to continue clearing up various queues until resource usage falls below
the high water marks again. In addition, the fixed interval to sleep
in request_cleanup() should be replaced by something more flexible, e.g.
adapting `tickdelay' dynamically driven by resource usage, or by doing
what everybody else does: sleeping on a channel to be woken up when the
syncer has freed enough resources, i.e. from softdep_process_worklist().

-pk