Subject: Re: ufs filesystem tool: any interest?
To: Michael L. VanLoon -- HeadCandy.com <michaelv@HeadCandy.com>
From: John F. Woods <jfw@jfwhome.funhouse.com>
List: current-users
Date: 08/27/1995 09:23:58
> >> Have you (or someone else) actually measured the "fragmentation" of
> >> real filesystems?
> It's good in theory, but it doesn't work that way in practice with a
> very busy file server.  The servers at Iowa State could get
> significantly fragmented.  I believe they used a commercial
> defragmenter on those machines (DEC Ultrix mostly) occasionally, and
> the benefit was definitely a win.
> I'm sure a full-feed USENET news server can also get hellishly
> fragmented.  Maybe a heavily used database server?

I would think a busy FTP or news server would be the kind of system where
one would rather not have to run a defragmenter every once in a while, due
to uptime demands; also note that a news server isn't really amenable to
"once in a while" defragmentation, since the file turnover is so large.

I guess there are defragmenters that run in "real time", or at least
semi-concurrent with system activity.  I don't know of any for UNIX (in fact,
I don't actually know of any defragmenters at all for UNIX :-), but the
Macintosh Disk Express II program defragments live systems.  However, it waits
for pauses in system activity, so it's able to rely on a quiescent if not
unmounted filesystem.  It also doesn't have to worry (as much) about mmap'ed
files, though I believe the PowerMac uses demand paging, I wonder how they
handle that?  Perhaps the MacOS goes to the effort of recalculating disk
addresses from first principles for each (re)page-in rather than stashing
the block number (I assume DE II doesn't do something REALLY evil like directly
frobbing task map tables.)

I guess one way to do it would be to have a mirrored filesystem (not disk
drive) where a background process continually copies the old filesystem to
a new filesystem, compacting as it goes; then when it catches up, the old and 
new filesystems are exchanged, and the background process sleeps until the
current filesystem becomes fragmented enough to worry about.  This obviously
requires heavy assistance from the kernel, both for the switch operation
itself and for the more subtle page-mapping issue.  You also have the subtle
issue of inode identification; either the inode number has to be the same
after the copy, or the inode number has to be a "file serial number" unrelated
to disk layout (not a bad idea).  You might also need some kind of "process
pause" facility so the background process can INSIST on being able to catch
up once in a while.

Granted, this requires having as much idle disk as you'd like to keep
relatively unfragmented, but if you're worried enough about performance that
you'd run an evil hack like this, you're probably willing to blow the extra
money on disks, anyway :-).