tech-kern: Re: Large inodes for ffs

Subject: Re: Large inodes for ffs
To: Chris G. Demetriou <cgd@netbsd.org>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 03/24/1999 16:54:52
On 24 Mar 1999, Chris G. Demetriou wrote:

> Bill Studenmund <wrstuden@nas.nasa.gov> writes:
> > Why are you so angry with vnextops? All it does is permit programs to ask
> > the fs to do things to a file's vnode. And to protect against syscall
> > profusion.
> 
> I've not "angry" with it.
> 
> (1) the name blows.  what does "vn" mean in a user-land context?
> userland doesn't know about vnodes, user-land knows about files.

We are not emotionally attached to the name. :-)

> (2) i believe that the user-land interface could easily be exported a
> different, better way that doesn't involve a new syscall.
> e.g. fcntl().  fcntl() is "file control".  it's ioctl-like already...

After your other replies, I agree. Though the thing I did like about the
ioctl-ish interface was that the size was encoded in the command and that
we had a large command space. We could wire that into fcntl, and all would
be well:

The 2^31 bit (MSB for I32 systems) being set indicates it's a
"vnextops"-ish request, being clear indicates it's something fcntl should
take care of itself.

By "vnextops"-ish command I mean it's a request which should be handed off
to the filesystem on which this file resides. We can certainly rename it,
but it's a request different from an ioctl in that it should only hit the
fs. Encoding the size (and VIOD, IN, OUT) in the command is nice because
then the sys_fcntl code can do the copyin/copyout for the drivers.

> (3) per my previous comments (re: ownership/use of the opaque data),
> i'm (a) not convinced that this functionality should be exported to
> user-land _at all_, and (b) if it should, that it should be exported
> by anything other than layered FSes which are kind enough to allow
> access.

How then do we restore a file with opaque data? We don't want a special
restore program for each type (each user of metadata)?

Also, a big component of dmfs, our target application, is that the kernel
does stuff, then calls a userland daemon to come and do something. The
userland has to be able to get at the file to do things. Either vnextops,
or fcntl broadened as above, looks like the cleanest way to export an
interface.

> In other words, completely bogus name, and quite probably very bad
> design.

I agree about the name and disagree about the what the design is trying to
do. :-)

> > Also, where in the kernel should we enforce access to the opaque data? To
> > ffs, it's just a 96-byte blob in the inode which might or might not be
> > there.
> 
> To be honest, i don't know.  but you should, and your design
> document/proposal should state it, and why.

To be blunt, until we have ACL's in FFS, I don't see how we can enforce
access limits to the opaque data. I think it's perfectly fine for an
overlay fs to place restrictions on opaque data access, but that's up to
the overlay fs to do.

Actually, let me amend that. See below.

> Given that the opaque data obviously does have critical meaning to at
> least some parts of the kernel and/or system (the in-kernel overlay
> file system, or the user-land programs which depend on its meaning),
> it seems fairly obvious that it _has_ to have some type of access
> control.
> 
> Yet you've not even indicated that you've as much as thought about the
> issue!

We certainly have thought about it. Given NetBSD's current security model,
there's no easy solution. The only credentials we have are a uid (real and
effective) and group id's. The current version doesn't do checking, but
what more than check for root can we do?

I mean if being root isn't enough, what can we do now?

The only thing which would make sense to me would be something like AFS
tokens. Admittedly I don't know much about them, but the idea is that
there is some sort of per-process credential which is more than uid/gid.
But we don't have that now, and I really don't want to impliment that to
make our project work. :-)

> > My thought is that storing state for an overlay fs is orthogonal to having
> > acl's. :-)
> 
> That's all well and good, but really, acls were only one example of
> the need to make sure the data is protected and access controlled in a
> reasonable manner.
> 
> the 'opaque data' might be as critical to an overlay file system as it
> would have been in an acl-providing overlay in my example.  Further,
> until the underlying file system provides ACLs, i think your design
> for this overlay-usable opaque data is bogus unless it's possible to
> implement ACLs using the opaque data.  I.e. if i can't implement ACLs
> using it, or other similar functionality (really, all the
> functionality i can see maps to the same sets of data integrity and
> access requirements that ACLs would require), then your design is
> bogus.

We are not claiming to have developed a method to be able to store
arbitrary amounts of data in an inode. I think you need to be able to do
that to do ACL's. Given that belief, I don't see how you could do ACL's
with what we want to do.

If you can get an ACL to fit in the metadata size we have proposed, then
you certainly could get ACL's to work.

As a total asside, one of the things I envision as a requirement for
making these overlay fs's work in a production system is to prohibit
access to the base storage fs. Like mount them over directories contained
in one directory, mount the overlays, then chmod ago-rwx the parent
directory. I've not mentioned that here because it is really beyond the
design of changing the size of an inode. :-)

Take care,

Bill