Subject: Re: ffs, was Disklabel(5)/(8) ??
To: Hauke Fath <saw@sun0.urz.uni-heidelberg.de>
From: Bill Studenmund <wrstuden@loki.stanford.edu>
List: port-mac68k
Date: 10/08/1996 13:08:31
> This is closely related to the concept of parametrizing the ffs, which
> means: Feed hard disk parameters to high level ffs code so that it can
> optimize disk storage layout (cylinder boundaries are an important part
> here; see SMM:05 "A Fast File System for UNIX"). 
> 
> Unfortunately, with current mass storage technology, the ffs spends
> considerable time on optimization based on meaningless parameters. On
> the other hand, the given scheme is too rigid to describe the real world
> disk; but if you allow 'not coming out even', you have to have
> additional consistency checks. Time <-> space tradeoff...

I agree totally with your comment of "meaningless parameters!" These
optimizations were great when they were implimented, but things have
changed. But, digressing, couldn't the last cylinder group just not be full?
Or could we come up with a dead_blocks file, which ffs and fsck_ffs know
not to touch, and "put" the nonexistant areas in there? I'm not that
familiar w/ ffs, so the question's a bit simplistic. :-)

> > Uhm, why limit yourself to identical -endian machines?
> > Why can't I take a DOS Zip drive and throw it in my Mac's Zip,
> > and read it under NetBSD? msdosfs works on both big & small-endian
> > machines. Also, why not do the same thing for AmigaDos drives. AmigaOS
> > is in the kernel too...
> > 
> > What about a SunOS drive? HP-UX?
> 
> First: Don't mix up 'disklabel' with 'file system' here. For disklabels
> there is no performance issue (nor is there for msdosfs or adosfs -- you
> are usually happy if you can read/write stuff at all).

The masdos example was mainly to show that people have experience with
code for an opposite endian machine. On a same-endian machine, the
relevant routines are just copies, and on opposite-endian machines,
they byte swap. It's a bit slower, but you cant get around it.

> But go to current-user and try to suggest ffs metadata be written in
> network byte order - and watch the little-endian fraction spout flames.
> :-> 

Get the marshmallows! Let's make smores!

> I.e.: With ffs there _is_ a performance tradeoff.

True. Which is why my thought for ffs support would be that there's an
ffs in the kernel which is IDENTICAL to the present implimentation; it's
either big or little endian. But we also would have an opposite-endian
ffs. It would byte-swap everything going in or out. N.B. this method would
not work w/ VAXen as they (or was it the PDP) have a wierd endian-ness;
neither big nor little.

My thought for this is that each read or write of a disk block, inode,
etc. is wrapped in a call to a endian-swapper. Something like:

in_mem->field = SHORTFIX(&from_disk->field);

For simplicity, there'd be one source, and normal ffs is made with
#define SHORTFIX(x) (*(x))
(i.e. stuff which gcc will optomize away), and opposite_fs is made with
#define SHORTFIX(x) swap_short(x)

It'd be gross to actually do, a bit harder to read, and a speed hit
if we end up copying data we wouldn't copy before, but maintainable.

We'd also have mount_bigendffs and mount_smallendffs, which would
mount either a big or little endian ffs (they are just sim-links
to either mount_ffs or mount_oppositeffs as appropriate for the
platform).

Now I'll admit I'm talking bigger than I'm willing to code soon, but it
should be doable.

> > Why not teach ALL ports about ALL disklabel types?
> 
> Kernel bloat?

:-) Make them options! If you're only going to see MacOS disks and native
disks, you don't have to compile the other stuff. And it lives in the source
anyway. The only difference would be that these routines would be in
some common area (or some distinct part of the sys/arch/port tree). The
only bloat would be changes needed to support the idea of having variable
disklabels.

> > We obviously run into a problem when it's hard to tell one
> > label from another (they lack magic numbers like MacOS uses). 
> 
> Do they? An OS has to know its disks _somehow_, hasn't it?

Hmm. I think I was wrong above. But the idea is that I don't know if all
systems keep the magic flags in the first bytes of the disk. We could,
in principle, run into a (pathalogical) case where a partition map on
disk passes two different ID tests. True, if there's a CRC on the label,
I bet only one will pass, but if the ID is just a magic flag, we might
have a problem.

> > Design question for you (maybe it could be a first step to disklabel
> > abstraction): how would you tell the kernel that it should write a
> > MacOS disklabel as opposed to a *BSD disklabel as opposed to a Sun
> > disklabel (I think boot blocks are different) as opposed to an
> > Amigs or an Atari disklabel?
> 
> I don't think you would actually want to _write_ a foreign disklabel
> format. A matter of safety, I suppose, as long as you don't know exactly
> what is in those "RESERVED BY APPLE/SUN/DEC/MICRO$OFT" fields. That's
> one thing, after all, about a native *BSD disklabel: We all know what it
> should look like. 

Point taken. None of these ideas are going to be inplimented all at once.
A lot of care will be needed to make sure we get things right. But I
thought some ports, like the sun, sparc, or alpha, can put enough
correct values in a disklabel that that machine's boot code will
boot from the disk.

Take care,

Bill