Subject: UDF model of EAs and subfiles, accesstimes and permissions rules.
To: Gordon Waidhofer <gww@traakan.com>
From: Reinoud Zandijk <reinoud@netbsd.org>
List: tech-kern
Date: 06/30/2005 16:41:29
--Q68bSM7Ycu6FN28Q
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hiya folks, hiya Gordon,

On Wed, Jun 29, 2005 at 04:53:04PM -0700, Gordon Waidhofer wrote:
> I think it possible to find receptive persons at Sun to
> that idea. Still, I'm not sure there would be an actual
> formal change. I'm told there is a fair bit of internal
> debate, still, on what the full semantics are. For example,
> does writing to a subfile update the primary mtime/ctime?
> Should it? Is there one set of access control for all
> streams, or are the per-stream access controls? Nobody knows.
> Expect Sun and Microsoft to both advance answers to
> these kinds of questions. And, yes, I believe there is
> ample room for BSD/Linux communities to participate in the
> formation. I believe a credible effort would find warm welcome.

the UDF standard ("Universal Disk Format", based on the June 1997 Ecma-167 
3rd revision) has such problems like updating of access-times and access 
permissions on subfiles specified. I'll come to that later.

Apart from its on-disc structures i think it would be wise to use or at 
least study its semantics and definitions as a starting reference for its 
at least a coherent specification that has been `out there in the wild' for 
quite some time.

Interestingly it is pretty like the proposed linux/solaris mix we came to. 
I'll try to give a summary of UDF attributes/subfile support and handling 
to give an indication.

In UDF there are 4 seperate spaces associated with a file entry (inode):
   1) main file data stream i.e. the `normal' file specified by extents on 
      disc.

   2) space internal in the file entry. Due to sectorsize blocking, space
      is left at the end of the de (e)fe. This space can be used for 
      in-node attribute storage and/or in-node file storage for small
      files/directories.

   3) attribute file entry associated with the file. In this file as in the 
      internal space for attributes in space 2, the attributes are 
      concatinated into one stream with free space markers etc.etc. to 
      allow for growing attributes and for creation/deletion.

   4) `streamdir': a directory file entry specifying a subfiles directory. 
      Selected files solely used for file system implementation can be made 
      `invisible' to userland.

All four spaces can be present at the same time. For performance reasons 
all small (system) attributes like extra timestamps, alternate file access 
permission (not allowed for UDF), MacFinder stuff, Mac volumeinfo, device 
minor/major info, DVD CGMS info, OS/400 dirinfo, OS/2 extended attributes 
preferably live in space 2 though they can be moved to space 3. Space 4 is 
meant for all named associated (sub)file-based storage but will cost at 
least an inode for each attribute whereas space 3 will cost just one.

Streams 2 and 3 allow for system, `implementation use' and `application 
use' extended attributes. Each space but the system space has named 
attributes BUT their name length is fixed to upto say 22 chars. The system 
attributes are numerical and store fully specified (small) attributes as 
summed up above that can be given fixed names. Main problem with stream 3 
is that it can get a bit fragmented and thus might need (auto)compacting at 
times though for the rare in size growing attributes extra logical space 
can be reserved.

Comming back on subfile permissions, UDF states :

"The UID, GID, and permissions fields of the main File Entry shall apply to 
all Named Streams associated with the main stream. At the time of creation 
of a Named Stream the values of the UID, GID and permissions fields of the 
main File Entry should be used as the default values for the corresponding 
fields of the Named Stream. Implementations are not required to maintain or 
check these fields in a Named Stream."

On modificantion and acess times, UDF states :

"The modification time field of the main Extended File Entry should be 
updated whenever any associated named stream is modified. The Access Time 
field of the main Extended File Entry should be updated whenever any 
associated named stream is accessed. The SETUID and SETGID bits of the ICB 
Tag flags field in the main Extended File Entry should be cleared whenever 
any associated named stream is modified."

That later about SETUID/SETUID bits is not clear to me but prolly an extra 
safety-net? (ICB-tag is generic UDF `inode' header)

Furthermore UDF preferably stores NT ACLs and UNIX ACLs (most likely 
refering to posix ACLs) in space 4 under fixed names for version 2.60+. 
This is not optimal IMHO since it thus will cost at least 2 discblocks for 
each annotated file extra (!) They dont have to be exported as inodes 
though but still. Are NT/posix ACL's that lengthy at times that its worth 
such spillage? Big bummer is that the (lengthy?) ACL's can not be shared 
between files since hardlinking in subfiles is NOT allowed.

If ACL's are short they prolly will also be allowed in one of the other 
streams but i haven't explicitly seen a reference to it yet. I'll ask 
around.

> My vote would be to prefer subfileopen() and openat(....O_SUBFILES).
> Subfileopen() is the important one. Keep X_ATTR and attropen()
> as deprecated compatibility. But that's just one vote.

ditto!

> And there may be some rumblings out there. The NFSv4 Linux/BSD folks
> would be good contacts (U Mich). They also did a lot of work on
> NFSv4 on the BSDs. I disuaded, or at least delayed, the U Mich
> folks coupling NFSv4 OPENATTR to Linux/BSD extended attributes.
> The inadequate capacity between those and what Solaris/NetApp/NT
> provide for NFSv4 OPENATTR was pretty convincing. Similarly
> the Samba team might of documents on their requirements and
> opinion of the Solaris model.

The only advantage of seeing both `file attributes' and `subfiles' as a 
more generic `attribute' is that it doesn't care about what 
implementation/model is behind them. This approach however DOES have a 
fundamental problem : it requires explicit and strict content definition of 
some named `attributes' to be defined and no definition for the rest!

Could it be that for NFSv4 it was decided that it was too early to be 
strict and thus allowed all to be transfered thus putting the restrictions 
on content solely on the peers? ("left open for further standardisation")

> > Do you think we could make a standardised form acceptable? I could try to
> > work out a full specification if only for *BSD and Linux...
> 
> A solid stake-in-the-ground would go a long way. Any proposal
> necessarily needs at bit of history including clarification on
> the difference between attributes and subfiles, the activities
> of Solaris, NetApp, and NT, and justification for interface
> names. Some statement of requirements would be good, including
> UDF/NFSv4/HFS+ issues. I would include a plea for convergence.
> 
> After that, a pilot implementation would be good. Demonstrate
> the adequecy of the interfaces for UDF, HFS+, and NFSv4.
> I wouldn't count on that being adopted overnight. But it
> would at least give folks something concrete to examine
> and discuss.

Indeed. Maybe NetBSD can fullfill that reference implementation? AFAIK we 
as NetBSD are not bound by `legacy' yet and we as NetBSD would get best of 
both worlds.

> > BTW, i see the specs on the Sun website are from Solaris 9 (aka
> > SunOS 5.9)
> > from 2001(!) ... do they still evolve Solaris ? Are there newer specs?
> 
> As far as I know, Solaris 10 made no changes in this area.
> NFSv4 is being deployed by Sun now with the usual caveats.
> Expect changes.

Good... ran into trouble lately trying to compile my UDFclient on Solaris 
for the first time... it didn't have bswap() ok but most of all i ran into 
the missing d_fileno and d_type in struct dirent; prolly System-V oddity.

Thanks for keeping with me for such a length ;)
Reinoud

--Q68bSM7Ycu6FN28Q
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (NetBSD)

iQEVAwUBQsQEkoKcNwBDyKpoAQLdXAf/V7xr3YZTjdImmbLJjdgLm5sjXUs6LxXY
M87uO9hbiZfCYJouqv2awrqNueOvkJzp+O8d28AMKnG0P32jahWE5zdFdeCWsqTQ
TO6wC/l9kn1aHZBbrYTD7Zm66228SWgkBqTDbEqVNxRWlXipl6g5E0TB09H5Ul0d
qNg4OV6ISmC6Q/G0llWpA91BZNbWc2dHwphk01Mg99Bs6d99dGvhhsMZrFwQL2D0
dZJFa6Tyj+SXI0B3lRexT1TKqRzIPywGr+EODjoUk8qywe/01hx1RELCF4rhqRQY
eR+2/pbaAsupd464EBvdaQNWtImBheEcRzT32jrjglpkcO74r7DWqQ==
=xJ5n
-----END PGP SIGNATURE-----

--Q68bSM7Ycu6FN28Q--