Subject: Re: How to resolve the filename(s) for a vnode?
To: None <tech-kern@NetBSD.org>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 10/18/2005 14:02:42
--UugvWAfsgieZRqgk
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Oct 17, 2005 at 10:32:28PM +0200, Gerhard Sittig wrote:
> [ sorry for the lengthy message, but after quite some time of fiddling
>   around and testing several approaches the situation is not easy any
>   longer, I'm stuck and need a little push into the right direction ]
>=20
> I'm currently porting Dazuko to NetBSD (this project allows for on
> access AV scans or rule set based access control or file access
> monitoring and logging or postprocessing created files or adding a
> trashcan with unerase capability or whatever one could come up with when
> given a transparent hook into filesystem activity and the possibility to
> deny access -- see http://www.dazuko.org for details).  The idea is to
> intercept file accesses and allowing or denying them after having the
> access checked by userland daemons.

Sounds a lot like dmfs, the Data Migration File System, which I worked on=
=20
when I was at NASA/Ames. Our main thing was to move the file to tertiary=20
storage, and then restrict access until the file was fully restored.

> On the other supported platforms (Linux 2.2 to 2.6, optionally with
> RSBAC, FreeBSD 4 and 5, OpenBSD is in progress IIUC) the open(2) and
> friends syscalls get hooked.  This works fine in general but is not too
> nice (it's even a no-no in Linux 2.6 although LSM has its own ugly
> problems) and fails for file accesses which bypass the syscalls (e.g. in
> kernel NFS servers, maybe compat emulations).  The best approach is
> RSBAC which has hooks for all relevant operations with already available
> parameters.
>=20
> So I thought that a VFS layer would be the way to go -- it's nice and

It is.

> clean and can be applied transparently.  I took a spare ia32 machine and
> put NetBSD 2.0.2 on it.  Writing a device driver was easy with the lkm
> howto I found via your website (the fibo example from bmeurer).  Writing
> a VFS layer went quite smoothly because nullfs is a good starting point.
> I managed to teach mount_dazuko(8) to cope with auto mounters and
> transparent stacking via simple fstab(5) options.  Automatically
> unmounting automatically stacked volumes is not complete yet but can be
> implemented by passing mount options around.  So far everything looks
> good and works in principle.
>=20
>=20
> But some issues still prevent my implementation from being really useful
> (or even being used in production). :(
>=20
> Resolving the filename (at least one of them) for a vnode is not
> supported by the OS directly.  FreeBSD has vn_fullpath(), MacOS X has
> vn_getpath(), Linux has __d_path() although it's not public, NetBSD
> seems to lack such a routine.  From the CVS logs of later versions (3
> and 4, I currently use 2.0.2) I could not see that such a thing was
> introduced -- and would be glad to be wrong here.  So I had to roll a
> resolving routine myself.  That's where even more problems are bubbling
> up. :)

The file name will not work right. At all. Don't use it.

I'm sorry that this will mean a change to the project, but something will=
=20
have to be done.

What you want are inode numbers and generation numbers. That's what dmfs=20
used. And we were the follow-on to NAStore 2, a tertiary storage system=20
put into production at NAS (Acronym meaning changes periodically, but it's=
=20
the high-performance computing center at NASA/Ames). The system used two=20
obsolete Convex computers, and had cubic meters of disks hooked up. The=20
disks were 9 MB, and were RAIDed in a RAID 5 with like 30 disks in a=20
stripe (the Convexes had vector coprocessors so the XOR on a ~30 element=20
RAID system was no problem). The upshot was it had 2 TB file systems, so=20
that's about 225 disks per computer. And I think each computer had 2 file=
=20
systems.

The point is it was a VERY LARGE and very successful system. It ran for=20
almost a decade, and was finally turned off as the Convex systems were not=
=20
y2k-compliant.

Inode numbers worked great. File names would NOT have worked at all.

Unfortunately inode numbers aren't exposed to userland. However, they come
conveniently bundled (with generation numbers) in file handles. And thanks
to my dmfs work, file handles ARE readily exposable to userland. :-)

In your VFS layer, when you have a node you want to ask about, all you=20
have to do is call VFS_VPTOFH() and you'll have the file system-specific=20
part of the file handle. Add the 16-bits of mount-point specific info=20
NetBSD adds, and you have the file handle.

If userland needs to adjust the file any (needed for tertiary storage, not=
=20
necessarily for what you're doing), fhopen() can let root open the file=20
w/o knowing the path name (yes, that's why only root can do it).

> There is no direct link from a vnode to its parent directory.  I do
> understand that this limitation comes from the fact that a vnode (i.e.
> a file) may be looked up with different pathnames from different
> directories because hard links exist.  Multiple filenames may refer to
> the same file on disk.  That's why one cannot say that one of them is
> "the" parent directory.  Correct me if I'm wrong (maybe v_nclist is what
> I'm missing here, but see below, it's not really a public interface and
> has another limitation).

You're right about why there is no "parent directory." As der Mouse noted,=
=20
you can also have an unlinked-and-open file, which will have zero parents.=
=20
:-)

And then there are all the ugly things that can happen with parent=20
directories. You can rename a patent directory. If you preserve access=20
permissions by path name, you now have to scan your whole database and=20
change entry names. And you have to catch removals and links. I can have=20
~/foo and ~/bar/baz. I can then remove ~/foo and link ~/bar/baz to ~/foo.=
=20
Any old permissions for ~/foo have to go away, and ~/foo should now have=20
the permissions of ~/bar/baz.

File handles are MUCH more convenient as you avoide all the the name-space=
=20
tracking issues.

What you might do as an intermediate step is turn the file handles (well=20
fsid part of it only; mount point info can change from boot to boot) into=
=20
hex strings (sprintf(buffer, "%02x%02x%02x%02x%02x%02x%02x%02x", the bytes=
=20
in the fsid)) and pass those to the userland tools for now.

One other thing dmfs did was it would do nightly scans of the file system=
=20
when it was deciding what to move off to tape. At that time, it noted the=
=20
path names for files. They were intended as diagnostic aids rather than=20
canonical IDs, and worked well in that regard.

Take care,

Bill

--UugvWAfsgieZRqgk
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFDVWLyWz+3JHUci9cRAs34AJ9QlC2GOn1oQBMbRGFEVdy4GwsHdwCghzKZ
ytDg8EPzRiDTLyt3xWXwIUQ=
=Z1rk
-----END PGP SIGNATURE-----

--UugvWAfsgieZRqgk--