Subject: Re: Defragmenting a NetBSD drive
To: Robert Elz <kre@munnari.OZ.AU>
From: Todd Whitesel <toddpw@best.com>
List: netbsd-ports
Date: 09/16/1999 03:47:21
> That makes it trivial for creat() (namei when acting for creat()) to
> keep track of the last occupied directory slot, and then trim the directory
> if there are spare blocks at the end.   On the other hand, unlink()
> (namei() acting for unlink()) would have to go on and scan the rest of the
> directory, where to perform its function it doesn't have to do that.
> So, directory trimming is done by creat(), rather than by unlink().

Hmm, this would mean that 'rm *' in a directory is O(N^2) in the number of
directory entries, because the later unlink()s have to scan through lots of
empty directory entries before they get to the next undeleted file.

Could the ffs code be made to cope with directories that get truncated from
the _front_ even if that means they're technically sparse files? Then unlink()
could trim blocks at the front of directories when they empty out, thus
skipping to the meat of the directory much faster.

> It could, but it isn't quite as easy as it seems - the kernel part
> isn't too difficult, but the effects on applications which are scanning
> the directory at the time need careful thought - and that tends to
> end up suggesting that it is best to just leave the directories alone.

Seems to me that an atomic operation could be provided that does some amount
of O(1) garbage collection (like finding the first empty slot in a directory,
and moving the first non-empty slot into it) and returns a boolean to let
you know if you should call again or move on. Then a background defragmenter
could be built like so:
	find / -type d |
	while read dir
	do
		while defrag $dir
		do
		done
	done

> If some human decides to shuffle things around, that's OK, they can
> then understand why some files might not have been properly found when
> something else was scanning the directory - but having the kernel just
> arbitrarily doing that because it feels like it might be useful probably
> isn't such a wonderful idea.

While I agree that the kernel deciding to do it is infinitely worse, I
don't really agree that it's OK because a human decided to do it. As a
programmer I would much rather have some guarantees about not seeing the
same file twice. I seem to recall having problems with 'ps' not giving
real snapshots of the process table either.

Observed "glitches" can be a real problem when you have huge automated
systems trawling through the system doing things -- you get heisenbugs
which are really frustrating. Also, naive user utilities are much easier
to write when you can be certain that you're getting a coherent snapshot
of something at one point in time. In some cases this is really just an
illusion of security, but it still bugs me. I'd love to find clean ways
to improve the situation.

Todd Whitesel
toddpw @ best.com