Subject: Re: Y2038, was as long as we're hitting FFS...
To: Ted Lemon <mellon@isc.org>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 03/25/1999 16:38:50
On Thu, 25 Mar 1999, Ted Lemon wrote:

> Bill, please stop for a moment and tell yourself "this argument isn't
> about what *I* want.   It's about what NetBSD users in general want."
> This statement is true about anything one might consider putting into
> NetBSD.   I know it's been used on me more than once!   :')

But on the flip side we also seem to have moved towards a policy where if
you're going to raise the bar on what has to be done, you have to be
willing to deliver on what you've asked for. :-)

> > And that it was not designed for what a lot of people are wanting it to do
> > - be a kitchen sink respository.
> 
> Right.   It was designed to solve exactly one problem: yours.   That's
> the problem with the design, in a nutshell.   Your problem is a
> reasonable and valid problem - don't get me wrong.   But you've added
> a general machanism that isn't actually general, and now you're
> getting pushback.

Though our problem guided the design, it is a general solution. It's just
not a solution to the general problem you're proposing. :-) Nor does it
claim to be. It claims to be a solution to the problem of overly fs's
needing a small amount of per-inode storage.

> I don't see how this would be a problem.  Because your application is
> as specialized as it is, it should be easy to arrange for your data to
> come first, and just do a sanity check to make sure it did.  The
> performance impact of supporting more than one data hunk in the opaque
> data buffer should be negligible for your application if you do it
> this way.   You get what you want, and you don't preclude other
> applications of the API.
> 
> > I think it's fine to extend the interface a bit. Right now we have test,
> > get, set, and clear operations on the metadata. It seems easy to me to
> > extend them to take a magic number value, with (0) being the catch-all. So
> > then you can deal with different types of data, and even add an overlay fs
> > to store multiple types at once.
> 
> Why not just do it right, so that we don't have to have people mount
> an overlay filesystem to get the correct (that is, general) behaviour
> later?

Because to satisfy the fully general case means sticking a database in the
filesystem. That strikes me as HARD. Actually, sticking a database into
the inode!

That's why.

Also, it strikes me as wrong. The database management should sit above the
fs, not in it. Even Apple did it that way - there's a resource fork which
the resource manager turns into all the fun little resources MacOS
programmers have grown to love. ;-)

And there's the fact that I don't see a problem which needs this
kitchen-sink solution. :-)

> AFAIK, the F_GETLK/F_SETLK call is in POSIX, and that's functionally
> the same as what you're talking about.

Cool!

> Once features have gone in, they can't come out.   Your proposed
> change is will not be compatible with the general solution, so if we
> are ever going to implement the general solution, we have to do it
> now.   Having a flags field won't help.

Yes, it will. Because there's nothing saying that the presence of other
flags can't preclude the use of the opaque data we're proposing. :-) 

> > About (1), most of the proposals have solved a different problem than the
> > one we have in mind.
> 
> Right, and that's what's wrong with your proposal.   Making a major
> change to the kernel that purports to be generic but in fact is not is
> a mistake.   If you don't want it to be generic, you should simply
> reserve the space you need for your application and duke it out that
> way.   If you can't justify doing it that way, you also can't justify
> doing what you're currently doing.

Since our solution is an overlay fs, we can't reach into the ffs inode.
ffs (or lfs) has to have a way to give us this info. To do otherwise is an
even larger hack.

> The problem is not the API.   The problem is the underlying data
> structure.   If we do the underlying data structure wrong, a future
> change to the API will not cure the problem.

Not true. An API which supports variable length storage can just pick up
these ops as implicitly refering to 96-byte blobs.

Take care,

Bill