Subject: Re: as long as we're hitting FFS...
To: Ted Lemon <mellon@isc.org>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 03/25/1999 15:00:33
On Thu, 25 Mar 1999, Ted Lemon wrote:

> > A type:value chain strikes me as just way too space inefficient and time
> > inefficient a thing to stick in the inode, especially for essentials of
> > how the fs accesses the node. We don't need that kind of flexability, and
> > the overhead is too expensive.
> 
> You have 96 bytes.  You are already going to consume four with a magic
> number.  Why not consume another one with a length?  Indeed, if you're
> worried about space, enforce an implementation restriction that the
> magic number be 24 bits and the length 8 bits, and then there's no net
> loss.  Don't restrict the length to 8 bits in the API - just return an
> error if there's no space to stuff the bits.  If an application or
> layered filesystem needs to store something and there's no space, it
> returns an error on the operation that caused the store to be
> attempted, or works around the lack, depending on the implementation.

Two reasons:

One, the big one, is that we're now building something close to the
resource manager into the fs. With the current proposal, either there's
data in the opaque area, or there isn't. ffs (ufs) stays simple. Doing
anything with length will make it complicated, and need lots of thought.

Second reson: One of the big points of the opaque data standard so far is
that magic numbers are reserved both forward and byte-swapped. So that
when a data user (overlay fs) sees data with its byte-swapped magic
number, it knows to just swap it & move on. I'm not sure how well a
24-bit/8-bit split would work here. ;-)

> As far as I can see, this gives you what you want, and shuts us up, at
> least WRT the notion of sharing the field.   We still are likely to
> use if anybody else comes up with an application for these bits that
> consumes a lot of space (and I can certainly imagine such an
> application) but at least it's better than nothing.

It's not easy to do though, and easier to get wrong. :-)

One side thought I've come up with is that we could teach ufs (both ffs
and lfs) about "resource forks". ufs would treat it as does NTFS - it's
just another stream of data. Anything which wants to treat it as an
arbitrary storer of data would sit atop one of the ufs layers.

A sane way to do this would be to use one of the spare fields in the end
of the "small" dinode to point to a seperate inode which holds the other
fork.

I'd rather not have opaque data, or at least one instance of it, stomp on
this other fork, though, as I'd like to leave it available for general
uses.

> BTW, is there some particular reason not to make the inode a _lot_
> bigger?  Why not jump straight to 512 bytes, or even 1024 or 2048k?
> I'm not being entirely facetious here - having a per-inode dumping
> ground with a small but reasonable amount of space would be a big win.
> If you made the inode 512 bytes, that would give you 256+96=352 bytes
> in which to store arbitrary attributes, which would probably be enough
> in most cases.  This may seem like a lot of space to consume, but the
> inode has been 128 bytes for quite a while, and in that time the
> average system disk has more than quadrupled in size, so it's not as
> unreasonable as it sounds.

Dump would suffer if we went beyond 256 bytes. Then an inode would span
multiple 512-byte blocks (there is already a dump header in with the
inode). Other than that, it would be fine, I think.... Though I'd still
prefer a fixed-length opaque data interface. It would, though, give us
space to move ffs to 64-bit addressed blocks (As Jason points out, you can
avoid this problem by growing block size, though actually frag size. But
at some point we'll not want 256k frags :-)

> One way to do this would be to add two fields to the inode: a link
> field and a magic number field.   The link field would point to
> another inode, and the magic number would identify what's in that
> inode.   Then you could stuff a resource fork in quite easily, and
> indeed you could stuff in more than one resource fork if you wanted.
> Of course, then any applicatin wanting to archive or copy the file
> would have to know about resource forks, but this seems like a good
> general scheme if such a capability is wanted.

As above, I like that idea. But I'd avoid the magic value - what's in
there (be it random data or mega storage for an overlay fs) will probably
be single valued.

Note also that if you're REALLY using overlay fs's in a production
environment, you have to shield users from the underlying layers, and
you'll really have to think about things before you set them up. So we
don't have to make a everything-and-kitchen-sink solution. :-)

Take care,

Bill