tech-kern: Re: type-length-value chains for FFS (was large Inodes)

Subject: Re: type-length-value chains for FFS (was large Inodes)
To: John F. Woods <jfw@jfwhome.funhouse.com>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 03/23/1999 20:45:23
On Tue, 23 Mar 1999, John F. Woods wrote:

> > How is this opaque data shared between competing resource users?
> > What if you have two or more meta-layers? This seems to be not only
> > a solution to a problem that doesn't exist, but a foolish solution
> > at that.
> 
> Well, I assume the problem *does* exist; Jason, Bill, et al may or may not
> have lives, but even if they don't I assume they could have found more
> enjoyable ways of wasting time than fiddling with the FFS layout.
> 
> (I would be interested to see the problem that motivated this solution,
> though; 96 bytes of data that an application has to go out of its way to
> access, and which gets lost (presumably) if you use generic programs like
> "cp" or "tar" to copy the affected files instead of dump and restore?)

State for a layered filesystem.

Right now we support layered filesystems, but only ones which don't store
state over the long term. We want to add support for layered fs's which
need to store a small amount of state to do so in the inode. I think that
a layered fs which needs to store a large amount of state needs to use
files in the underlying fs (which would entail teaching fsck that such
files aren't dangling, and nothing more).

Specifically we have a layered filesystem we're working on which will let
the system migrate user's data to tape and free it, without the user
really noticing until they access the data. They can ls -l a directory,
and see all their files, with the access times and sizes the user
remembers.

The file only gets restored when the user goes to access it. The user
might notice the many-minute delay at that point. :-)

> > I'd much rather see a well thought out *scaleable* solution.
> 
> One problem with scaleable solutions is that they tend to start very big.
> Maybe they were trying to just bang in a quick solution to a small problem,
> with the opportunity to add a big solution to some as-yet unseen problem
> available later...

Yep.

Also, since we don't have a problem which needs that solution in mind, we
aren't too keen on adding it.

> > As a trivial example, one could design a system of opaque meta-chains,
> > the first entry of which was stored in the inode,
> 
> I suggested a similar thing to Bill in private email.

And my concern was the same. That's a LOT of complexity to stick into ffs.

It could be done, but it's not something I'm interested in doing any time
soon.

Actually, if you taught fsck that a file is referenced via opaque data,
you could add this functionality via another stacked fs. :-) The only
change to the design we're proposing now is to have a way to get metadata
of a specific type.

> I have been amused to see the number of people likening this (or rather,
> what this could have been) to MacOS resource forks; I'd have thought
> that a UNIX-oriented crowd like this would have regarded the Mac as the
> Anti-Computer. ;-)  But as Bill pointed out, a MacOS-like resource manager
> is not a simple thing (how many resources to you want to have available in
> a given file?  Few enough that linear search is reasonable?  So many that
> a whiz-bang extensible hash table in the kernel is reasonable?), and so
> one would like to see a distinct set of applications in mind that would
> motivate the effort.  I'll note that the NTFS (in Windows NT, for those
> lucky few who don't know) has the ability to have multiple data streams
> associated with a file (like an arbitrary number of MacOS forks, but none
> with any OS-imposed layout like the resource fork).  However, I'm not
> aware of *any* applications that use multiple data streams (and not
> surprising -- since any real-world NT application pretty much has to
> be willing to run on a FAT filesystem as well).  (That's not to say
> there are _none_, just that I'm not aware of them.)

The resource manager is really cool, but it's not what we were wanting to
do. Also, I'm not sure what it'd be needed for under UNIX. :-)

> Anyway, if I understood Bill's original message, there's a 96-byte area
> for "their" application, plus *another* area of uncommitted u_int32_t's
> (Bill's message said there were 28 of them, but that's too large; maybe
> he meant 8?), some of which are presumably going to be gobbled up for
> extending the timestamps to 64 bits, and a pointer to a (first?) block of
> ACL info, but that still (I think) leaves a few for some hardy developer
> (foolhardy?) to grab a pointer-to-disk-block and run with it.  (Just
> try to announce it earlier than a week before 1.5 freezes. :-) )

Yes, I mis-remembered. There is a flag u_int32_t, and 28 BYTES of data, or
7 u_int32_t's. Added to the 2 spare u_int32_t's we have now, I think we
have enough space. We'd need 3 for Y2038, and one for acl's.

Growing the fs to support more than 2^32 blocks (files > 2 Peta bytes or
so) would require a fair bit of hackery. If we kept NDADDR and NIADDR the
same, we'd need 15 extra int32_t's. Oh, 16, as we also would want to grow
the number of blocks in the file. :-) 17 if we have a block address for
acl's (no need to grow if acl's are kept in a seperate inode).

Take care,

Bill