tech-kern: Re: Support for ACLs

Subject: Re: Support for ACLs
To: Robert Elz <kre@munnari.OZ.AU>
From: Lord Isildur <mrfusion@umbar.vaxpower.org>
List: tech-kern
Date: 03/10/2001 15:00:23
what? 

On Fri, 9 Mar 2001, Robert Elz wrote:
> I am going to ignore all the noise from the past day on this issue,
> and concentrate only on the technical issues.

um, that 'noise' was the sound of opinions that differ (evidently) from 
your own. We're here to discuss potential important changes to the 
kernel. I would very uch expect a _lot_ of different opinions, and i wouldnt
consider _any_ of them to be 'noise'. the people who take the time to post
here and read the list do not do so for lack of anything else to do. They
read and contribute their ideas here because they're rather dedicated to 
the NetBSD project and the system itself. 

> The first is to notice that there are two separate problems to solve for
> ACLs - first there's some data that has to be associated with a file
> (either just with the files that need them, or with all files).
> 
> Then there's the actual ACL implementation itself, assuming that it has
> some place to put its data (get it most often).

Both of these issues are certainly important. However, i think it is only
in the minor details that there will be such concern. Overall, the
information to store is obvious: lists of authorizable entity/access rights
pairs. Such a list would have to accompany any inode which has an ACL. 
where to store it is also important, but i think some other issues
also should be first decided upon. 
namely, where on earth to do this? some folks have wanted it in the generic
filesystem code (something i oppose). Some folks want it in the ffs code
(or some other filesystem) which i would , provided it were done right,
and ive already mentioned my opinion on whati think are some constraints on
what is 'right', be better. Others have also suggested using portalfs and
mentioned an existing implementation that works that way. I also consider
this to be a very interesting route, since it would allow ACLs on _anything_
and not have to modify either the generic code nor the ffs code. Theres a
performance hit there, but we shoudl do some testing and probably a good
bit of hacking (since there exists an implementation already) and maybe
it wouldnt be so bad to use after all.

> The messages about file systems used by kernels with and without ACLs,
> and ones like Todd's above concern the first issue.   That's more
> interesting to me anyway...

and to a lot of people. Personally i dont want ACLs and havent a use for 
them. However if theyre going in, i'd like to ensure thatit is done right,
for my idea of 'right' :-)

> I must admit I had never really considered storing ACLs in a file like
> the quota file, the format doesn't really make that all that appealing,
> though it does have some obvious advantages.  The problem would seem to
> be that ACLs are not of any fixed length (usually) and can become
> quite large.   And the indexing method into a big file would seen to have
> to be by inode number (there isn't really anything else suitable), which
> would make a simple implementation cause it to be quite difficult to
> share ACLs between many files (which is usually what you want to do,
> so you can update it once and everything related gets updated).
> 
> So, the file would probably need to be a file of pointers - which solves
> both the variable length problem (indexing directly into an array of variable
> length data structs is a challenging operation), and the sharing problem,
> as multiple pointers could point to the same data.   Now, to keep it in
> one file, you also now need storage allocation mechanisms, garbage
> collection of some form, access control, .... basically you have invented
> a new filesystem in a file.   It can certainly be done, but it seems
> like a lot of work.

Yes, this is a good reason to not necessarily do it that way. However,
it is not as horrible as you think. A single file living in the filesystem
itself is a pretty clean place to store ACL info without changing the
methods of allocation of blocks and such, and this is a Very Good Thing
imho. It means less code to modify, less kludging to do to make a non-ACL
ffs work with an ACL'ed ffs, and it allows some other nice things for
the purposes of auditing and rapid modification, etc, since the single
file is all in one place, in a (hopefully) pretty well-thought-out format,
and since conventional database indexing types of optimization algorithms 
can be used to make it reasonably fast. 
Lets not toss this idea out. i'm actually favoring it as the more elegant
approach right now. 

> Direct inode attachment is the easy way.   The problem with this is that
> it solves exactly one problem, and doesn't help at all with any other
> problems - start doing direct inode attachment for all the data that it
> might be nice to be able to associate with a file, and you soon have
> jumbo inodes, and rather a messy construction.

this is a really gross thing, totally breaks the filesystem code and 
also breaks any compatibility between non-acl and acl implementations. 

> Jason mentioned work that has been done with portals to implement ACLs.

yes
> 
> As a method for working on the first problem (and there are lots of
> issues there that need solutions) that sounds great.  It would certainly
> be the quick way to get ACLs to exist, and be able to test real running
> code with them.

my above comments agree
> 
> As a long term solution, I don't think it really works all that well.
> The problem is that the layered filesystem needs somewhere to store its
> data (the portal needs somewhere to get the ACL data).  In what follows
> I will use ACL as the example - but the same applies to other kinds of
> associated data, like the Dreamcast "Visual Memory" that Marcus mentioned.
> 
> For prototyping that's not a problem, there are lots of things that can
> be done that will work well enough.   But for a real use, the solution
> needs to make sure that a file and its ACL (as well as its inode data) all
> get backed up together, and can all be restored together (it is no good
> having the file here on this tape/CD if its ACL is over there somewhere
> else).  You could put the data all in a big file, but that's then just the
> same problem as the "quota file" issue above - the only difference is
> exactly where in the kernel the ACL checks are done (and that's part of
> the problem I prefer to leave to someone else...) or you could set up a
> whole parallel tree of files, that contain the ACLs, and have the ACL
> layer keep track or where you are in the real tree and where you are n the
> ACL tree.

this parallel tree of files sounds like a royal kludge. it's ugly, breaks
consistency of the use of the filesystem, and also is a disgusting waste of
inodes! 

> 
> That's almost where I started when I decided that "associated files"
> was probably the most general, and reasonable, implementation method
> (though it certainly only works on filesystems whose definitions we
> can change - it might be possible to layer it into the CD filesystems,
> the same way Joliett and Rockridge are done, but attempting to stick
> it into an NTFS, or a MSDOSFS, or a ... would be close to impossible).

if this were the case then it woudl be so much easier to have an acl file
tossed into that cdfs or msdosfs or whatnot, instead of a whole parallel 
tree of files. yuck. 

> That is, the "parallel tree of files" is pretty easy to implement,
> and uses the existing filesystem code to do just about everything,
> which is great, but it means lots of additional overhead to keep
> track of all the kinds of associated data that might exist, there
> isn't just one possible thing (ACLs) but lots - that means a lot of
> trees of data to keep.
> 
> Also, if you're keeping things in sync in the trees by name (which is
> the obvious, and perhaps almost only reasonable way to do it), then
> you get two immediate problems - first, applying an ACL to a
> directory gets hard - you have to invent a new reserved name in the
> directory to contain its ACL, which means there's a reserved name for
> a file that can't be created in the ordinary tree, which is bad.
> And second, it gets to be quite hard to apply ACLs to the ACL files.
> That also isn't great.

this is why you dont want this crap in a parallel tree of files. 

> 
> All of that is what led me to think that linking these additional
> files to the files that need them is the better way to associate
> additional data with files.   The overhead to find the data is then
> quite small, there's no problem at all with file naming, it is easy to
> provide associated data to an associated data file, for example, an
> ACL file can easily have an ACL saying who is allowed to change it,
> what's more, the ACL that the ACL file has can easily be itself...

this breaks the consistency of the linking mechanism. Let's not do this. 

> 
> The implementation of ACLs could still be done in a layered filesystem,
> that implementation could even use different methods for finding
> (and altering) its data, depending on whether or not the associated
> data implementation exists on the filesystem in question or not
> (I suspect that without it, the implementation will be more difficult,
> and more limited).
> 
> That's why in my previous message I said that I had been considering an
> implementation of a thing that might be useful for ACLs, it wouldn't
> be limited to that, and certainly isn't primarily concerned with ACLs.
> 
> To give a few more details (much of this is still just air work at
> the minute, I'm sure there are holes).
> 
> The pointer to on-disc associated data would be 64 bits.  That 
> contains 2 two byte fields, and one 4 byte field.
> 
> Most of the time, the 4 byte field is an inode number.   The two
> byte fields give the type of data stored at that inode (as it
> applies to the file containing the reference)  (Why there are two
> of those I might get to later if I don't run out of time).
> 
> The inode number causes the link count of the referenced inode to
> increase.
> 
> If a file contains just one of these associated files, then the
> pointer would go in the spare 64 bits we have left in FFS inodes.
> If the file contains more than one, then it would grow an
> indirect associated file pointer page (one disk frag/block)
> which would contain the 8 byte pointers (part of the reason that
> there are 2 2 byte fields is to distinguish these cases).  I suspect
> that 1024 associated files would be enough for almost anyone (an 8K
> block, with 8 byte pointers) - even 512 if the blocksise is just 4K
> (of course, if fewer are needed - 99% of the files that have associated
> data) then fragments can be used (so just 512 bytes overhead in those
> comparatively rare files that have more than 1 kind of associated data).
> But we could have associated data files (a whole file, whose purpose is just
> to contain pointers to other associated data files as its data) if
> we want to allow unlmited numbers of associated data files.   They're
> use in the obvious way (and would usually be the sole associated data
> for the file to which they're attached).

associated files? yuck! 
what on earth is that supposed to mean? it's not links, so please dont
go using the linking mechanism for this. this is what a _directory_ is, 
and we already _have_ directories, thank you very much, and they are much
more flexible than this. This stuff is reeking of the Windows mentality
now. 

> 
> Then there'd be a sys call for taking a file and attaching it to
> another file as associated data of type X.   That would be root
> only.  Setuid programs would exist for different kinds of X, would
> validate the file being attached, make sure it is of the correct
> type to be an X (an ACL would probably be a new inode type completely
> which only privileged programs could write in, to make sure its
> data format stayed correct for the kernel to use, or it could just
> be a file owned by root and 600 in the "traditional" permissions)
> and then perform the sys call.

file TYPES?? this is even MORE stinking of the windows mentality! sorry, 
we dont do that here! this is UNIX! 
yes, drectories are special files. Thats a very special case. We don't 
need more of this. 

> 
> Then there'd be another sys call, "give me the associated data
> of type X associated with this file handle (descriptor)", which would
> do an open on the associated file (after checking its permissions
> etc, all the usual stuff) and return a file descriptor.   And there'd
> be a "link the associated file of type X of file Y to name Z" so
> if all names for the associated file have been removed, it can be
> recovered into the filesystem to be operated upon.

yuck. 

> 
> Now, questions of compatability with "non-ACL" kernels were raised.
> That's a very valid question.   First, a non-ACL kernel (a kernel
> that has no idea about associated data) isn't going to care in the
> slightest, the filesystem will still look like one of its old
> filesystems, nothing different at all.   It would have some inodes
> marked allocated that it will never use, and some blocks missing from
> the freelist that it will never be able to find, but that's harmless.

um, no, that is pretty important. 

> 
> Running an old FSCK on such a filesystem is a different question
> though - and one simple answer is "don't do that".  It would be
> simple enough to make a new fsck that could run on any old system
> and would understand the new formats.

I still maintain as a very important condition _zero_ modification of
existing code and systems. 
> 
> But if it were done, the effects would be that any associated data
> files that were linked into the filesystem would have their link
> counts "corrected" to count only the filesystem links.  Running
> a new fsck later would "uncorrect" that harmlessly.

this is kludgy.

> 
> Any associated data files that had no filesystem links left would
> be recovered into lost+found.   Of itself that's harmless, the
> new fsck would then notice that the link counts of those files were
> all wrong, and correct them, just as in the previous case (except
> that the files would have been linked back into the filesystem
> as well, in lost+found).
> 
> The harder problem is the blocks that are used as the indirect blocks
> for the pointers - an old fsck would simply see them as missing
> unallocated blocks, and would reclaim them into the free block bit maps
> and then potentially the filesystem would reallocate them).
> 
> That would mean that a new fsck of the filesys would find lots of
> duplicate blocks between indirect pointer blocks, and data blocks
> in other files (or between indirect pointer blocks, and free blocks).
> That would not be nice - it would at least have to zap all the
> indirect pointer blocks (even if they're still listed as free there's
> no way to know that they haven't been used and freed again since last
> containing pointer data - though a checksum might perhaps avoid some
> of that).  It might also have to zap all the files which used those
> blocks, because knowing the real cause of a duplicate block is never
> easy.
> 
> One solution to this might be to have all the pointer blocks (in
> reality, on most filesystems, there wouldn't be many...) linked into
> a magic file, which exists just so the blocks don't look to be free
> to an old fsck.   Whether that's better than "don't do that" I'm
> not sure...   It raises other problems of its own (not the least being
> that one file can only have one (group of) frgments, at its end, and only
> when it is a small file - so the extra data code would probably need its
> own fragmentation handling code (always allocate blocks, and then hand
> out fragments as pointers as needed).   Messy.

yes, extremely. 

> 
> That is it for associated data in general I think.

this whole mechanism is unneccesary if you consider the ACL information to
merely be meta-data, not store it in the filesystem itself, and merely
use a flag on an inode to indicate the _presence_ of ACL information
somewhere else for it. The inode itself is the key for the search for
the ACL information and then it's a conventional database thats stored
soemwhere thatis _not_ the filesystem. You use one single bit in the
inode to flag the presence of ACLs, then you can have filesystems where only
_some_ file have ACLs and not all, where filesystems that dont understand
ACLs wont give a hoot if the bit is set and will create new files which 
simply dont have ACLs, where filesystem code that does dig the ACLs will
not mind that sme files might not have ACLs, where the mechanism for
storing ACLs can grow and change independently of the filesystem, where
authorization mechanisms can also be changed independently (hell, someone
might then choose to store the ACL information off on a server somewhere,
too :-)) , and the filesystem code itself need not be modified to care. 
(well, that is, to merely use such a filesystem. To restrict access based on
ACLs woudl require mods of course but like i mentioned thouse would not 
interefere with unmodified fs code using the same filesystems) 

plus fsck woudlnt need to change. 
It might still be necessary to change something like dump and restore,
depending on whether theypreserve the unused bits in the inode information
or not. They should be preserved anyhow. 

> Keeping backwards compatability this way (I mean, so you can take
> new stuff and use it with old systems) is always hard, and there
> always gets to be a point where it simply isn't worth doing any more.

I dont think we're anywhere near this point. 

> 
> In this case, assuming any of this was to actually happen, the problem
> could be lessened by creating the new fsck in advance, and shipping it
> with systems on which none of the data will ever actually exist, so
> it is there and available, on systems that don't need it - but so that
> if later, a new filesystem is brought in contact with that system,
> then its fsck wouldn't mangle the world.
> 
> For example, such a new fsck could be shipped with 1.6, 1.5.2, 1.4.4
> (assuming that those releases ever get made, I assume there will never
> be a 1.3.4 now...) and the associated data filesystem only shipped
> with 1.7.

He's talking about 1.6 and 1.7 and 1.5 is barely out!

One of the things that i think has been really eroding NetBSD is this fast
release cycle. We feel compelled somehow to have a new version number so
quickly. A lot of the smaller ports dont have the manpower to keep up. I've
seen hardware support and bug fixes be forced to take the back seat to 
compliance with the new version, and leave older version horribly broken
in really esoteric ways which should not be the case. We should relax
the time between versions. Also, this very sentence , mentioning 1.6 and 1.7 
like this, belies an attitude which i find really scary, of ignoring
what is going on right now and just obsessing over ever-increasing version
numbers. Ive said it before and here it is again: increasing version
numbers and longer lists of features do _not_ necessarily equate to 
progress! 

> On the other hand "not the BSD way" and noise like that I will simply
> ignore (it isn't as if that comment is even correct).

Seeing how this is Net**BSD**  I think it is _extremely_ relevant. 
If you want linux, you know where to find it. not here!! 

happy hacking,
Isildur