Subject: Re: VPS mailing list, BSD interest?
To: Kevin P. Neal <kpneal@pobox.com>
From: Terry Lambert <terry@lambert.org>
List: current-users
Date: 10/01/1996 11:49:57
> Note that JFS figures into this somehow, and I'm not very clear on this
> (Terry?).

JFS scales linearly for additional space.

> I don't know how FFS or ext2fs will fit into it, or if they will.

Expanding an FFS or ext2fs does not work well because the geometry
dictates the allocation policy.  The result in FFS of incremental
instead of initial expansion would be logarithmically increasing
fragmentation of the FS as each expansion boundry is hit.

We (Artisoft) considered this problem in terms of providing an in
place conversion tool for upgrading a disk from FAT/VFAT/VFAT32 to
FFS.  The conclusion was that if we allocated one cylinder group
at a time, we could in-place convert, but the price of doing so
would be that each cylinder group would be framented all to heck.
It is the existance of multiple cylinder groups that ensures FFS's
architectural immunity to fragmentation.

To counter this, it would be required to either build a "defragmenter"
for FFS (something which is normally unnecessary, since allocation
policy over a static domain prevents fragmentation), or to build the
defragmentation into the conversion tool itself.

For an analogy, an FS which requires defragmentation is like building
a lock manager that uses deadlock detection.  An FS which avoids
fragmentation in the first place because of its architecture is like
bulding a lock manager that uses deadlock avoidance.

If a condition is avoidable (fragmentation is), it's a hell of a lot
less work to avoid it than it is to recover from it.

You can run a gendanken experiment of an FFS with a single cylinder
group, and see that the fragmentation *must* be severe for incremental
increase (as an in place conversion would cause, or as an LVM would
cause).  The difference is whether the defragmentation could be in the
conversion tool (easy) or in a standalone program (harder).  For LVM,
there is no choice: it must be in the standalone program to be runnable
after an extention, and to "compact" data for recovery of an extent
(taking away a PP from an FFS).  If you run it to compact, after the
PP removal, you will need to run it again to defragment.


The ext2fs will have the same issues, for reason of contiguity.  It
is questionable whether it is possible to in-place convert to ext2fs
at all, let alone doing it without fragmentation (breaking up the
extents into non-contiguous blocks).  At the very least, you will
lose most of the value of extent-based storage as a result.

Finally, ext2fs has a similar problem with defragmentation, since
relocaton of extents in an almost full FS has the same problems as
an in place conversion -- it is very difficult, and is generally
expected to take external backup media (a backup/restore will do
just as effective a job on ext2fs as on FFS).


> LFS? (Terry?).

I would suggest asking Margo.  It is my expectation from the shared
code in FFS and LFS (the UFS code for directory management, etc.)
that LFS would have similar problems, at least for directories, as
FFS.  There is an implicit restriction here that directory fragments
must imply file fragments.  The question is what is the storage
increment for a log file, and will the access geometry for an existing
FS make this "not really matter".  My gut feeling is that an LFS
would have similar problems as FFS.


> Is Margo Seltzer around? (Would she be able to contribute any ideas?)
> Her web pages looked cool (I love web pages with white papers online).

She's around; I don't know if she wastes her time listening to us, though.
8-).

> If anybody thinks this is a good idea, but doesn't have time, at least let
> me know that somebody else thinks this is neat stuff. 

LVM is neat stuff the same was CCD is neat stuff.  Both are classes of
operational data storage objects which can be implemented as either
logical-to-physical or logical-to-logical device mappings in a devfs.

Other operational data storage objects you could implement as well:

o	Media perfection; unlike bad144, it can apply to the control
	areas of the disk (disklabel, etc.) as well as to the storage
	areas
o	DOS partitioning
o	BSD disklabel partitioning (can use same ioctl interface as
	DOS or any other partition management mechanism -- ONE "fdisk"
	program for all partitioning)
o	DOS extended partitioning
o	Sun disklabel partitioning
o	SVR4 VTOC partitioning
o	volume spanning (LVM is a subclass of volume spanning)
o	striping
o	mirroring

etc.

Yes, this stuff is neat.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.