Subject: Re: Y2038, was as long as we're hitting FFS...
To: Ted Lemon <mellon@isc.org>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 03/25/1999 18:15:49
On Thu, 25 Mar 1999, Ted Lemon wrote:

> > But on the flip side we also seem to have moved towards a policy where if
> > you're going to raise the bar on what has to be done, you have to be
> > willing to deliver on what you've asked for. :-)
> 
> I don't have lots of hours to spare, but I can easily spare one to
> write the code for this.

Let me answer things out of order. At the end of the message, you said:

> Bill, I've proposed a way to do TRT with no additional space used in
> the opaque data field.   Other than epithets like "kitchen sink,"
> what's your problem with this?   Why are we still arguing about this?

Because the comment above is the first indication that someone other than
myself will/would be the one to grow this functionality. :-)

Also, what do you do if the agregate of stored info is more than say the
128 bytes we're proposing growing the inode? Answering this question is
a big part of the reason why I think the general solution is hard. :-)

Also, userland would need to have a library to grovel this. I'd assume
we'd then want to just turn the full inode growth over to this storage. 
For one thing, the ACL pointer would go in here, and dump would need to
know how to find it. Plus we have userland code which needs to look at the
on-disk fs image.

> > Though our problem guided the design, it is a general solution. It's just
> > not a solution to the general problem you're proposing. :-) Nor does it
> > claim to be. It claims to be a solution to the problem of overly fs's
> > needing a small amount of per-inode storage.
> 
> No, it is _not_ a solution to that problem, because it only works if
> you have one such filesystem.   The minute that you have two, it stops
> working.   So it's a solution to your problem, and your problem alone.
> As soon as somebody else tries to use it, all bets are off.

Strictly speacking, only when they use two at once.

> > Because to satisfy the fully general case means sticking a database in the
> > filesystem. That strikes me as HARD. Actually, sticking a database into
> > the inode!
> 
> No, the fully general case means having a buffer with tags and
> lengths, and some code to add and remove such tags, and to fetch them.
> This is _trivial_.  Take a look at nfs_bootdhcp.c for an example
> implementation (admittedly with 8-bit tags, not 24-bit tags).

If you're going to do this, I'd vote for 16-bit magic and 16-bit length.
Also, this design would need to keep everything in a fixed order so that
the parser can figure out what's going on.

> I'm sorry that it strikes you as wrong.   I don't see any way to fix
> that.   Since the resources can be used within the kernel, kernel code
> has to parse them.

Yes, but not necessarily fs code. My idea, which predates your offer to do
this, was that if you needed to stack two or more at once, you stick in a
layer to help out here.

> > Not true. An API which supports variable length storage can just pick up
> > these ops as implicitly refering to 96-byte blobs.
> 
> Right, and people implementing to the API will have to assume that
> their code will not run at NASA/Ames or any other site that deploys
> your application code (the tertiary storage stuff).  This seems like a
> pretty big problem to me.

No, not necessarily. All our layer cares about is if its calls to set &
store data work on the fs it's mounted over. If you (or someone else)
come(s) along and add a broader API, which can support all sorts of stuff,
including the implicitly-96-byte ops, then our layer will be happy. :-)

Take care,

Bill