tech-kern: How to resolve the filename(s) for a vnode?

Subject: How to resolve the filename(s) for a vnode?
To: None <tech-kern@NetBSD.org>
From: Gerhard Sittig <Gerhard.Sittig@gmx.net>
List: tech-kern
Date: 10/17/2005 22:32:28
[ sorry for the lengthy message, but after quite some time of fiddling
  around and testing several approaches the situation is not easy any
  longer, I'm stuck and need a little push into the right direction ]

I'm currently porting Dazuko to NetBSD (this project allows for on
access AV scans or rule set based access control or file access
monitoring and logging or postprocessing created files or adding a
trashcan with unerase capability or whatever one could come up with when
given a transparent hook into filesystem activity and the possibility to
deny access -- see http://www.dazuko.org for details).  The idea is to
intercept file accesses and allowing or denying them after having the
access checked by userland daemons.

On the other supported platforms (Linux 2.2 to 2.6, optionally with
RSBAC, FreeBSD 4 and 5, OpenBSD is in progress IIUC) the open(2) and
friends syscalls get hooked.  This works fine in general but is not too
nice (it's even a no-no in Linux 2.6 although LSM has its own ugly
problems) and fails for file accesses which bypass the syscalls (e.g. in
kernel NFS servers, maybe compat emulations).  The best approach is
RSBAC which has hooks for all relevant operations with already available
parameters.

So I thought that a VFS layer would be the way to go -- it's nice and
clean and can be applied transparently.  I took a spare ia32 machine and
put NetBSD 2.0.2 on it.  Writing a device driver was easy with the lkm
howto I found via your website (the fibo example from bmeurer).  Writing
a VFS layer went quite smoothly because nullfs is a good starting point.
I managed to teach mount_dazuko(8) to cope with auto mounters and
transparent stacking via simple fstab(5) options.  Automatically
unmounting automatically stacked volumes is not complete yet but can be
implemented by passing mount options around.  So far everything looks
good and works in principle.


But some issues still prevent my implementation from being really useful
(or even being used in production). :(

Resolving the filename (at least one of them) for a vnode is not
supported by the OS directly.  FreeBSD has vn_fullpath(), MacOS X has
vn_getpath(), Linux has __d_path() although it's not public, NetBSD
seems to lack such a routine.  From the CVS logs of later versions (3
and 4, I currently use 2.0.2) I could not see that such a thing was
introduced -- and would be glad to be wrong here.  So I had to roll a
resolving routine myself.  That's where even more problems are bubbling
up. :)

There is no direct link from a vnode to its parent directory.  I do
understand that this limitation comes from the fact that a vnode (i.e.
a file) may be looked up with different pathnames from different
directories because hard links exist.  Multiple filenames may refer to
the same file on disk.  That's why one cannot say that one of them is
"the" parent directory.  Correct me if I'm wrong (maybe v_nclist is what
I'm missing here, but see below, it's not really a public interface and
has another limitation).

Calling cache_revlookup() misses for regular files, it only seems to hit
for directories.  I did notice the NAMECACHE_ENTER_REVERSE option but
it's off by default.  That's when I had to try using the v_nclist field
of the vnode which holds the parent cache.  But since the
namecache_slock lock is private to sys/kern/vfs_cache.c there may be
(surely is) a race in the lookup code path.  And I'm not yet confident
that the cache will never invalidate still valid but unused data or if
it will be big enough to hold all relevant data over the filesystem's
complete lifetime (timespan it's mounted).

Another problem is that the namecache(9) manpage suggests that only
short names (length below some 31 characters) are held in the cache.
Although longer names may be added by means of cache_enter(), these
entries get corrupted later when they are above some 40 characters and
"interesting" things may happen when they exceed NCHNAMLEN.  I don't
feel like hooking the lookup vop, too, and manage another cache -- in
parallel to what the OS does -- with long names (and introduce the
problems of implementing caching incorrectly, introducing inconsistency,
etc).  Adding "fake" cache entries (like 0 characters long names, just
to have the vnode and parent directory pair in the cache and be able to
tell it's "a special entry") probably breaks existing cache code like
cache_lookup() and cache_revlookup(), etc.  But the inability to lookup
names with components longer than 31 characters is an ugly limitation in
a logger and even a show stopper for security related software.  Doing
what the getcwd() syscall does (fallback to readdir() on the parent
directory) only works when the parent directory is known -- which is not
the case when looking up the last pathname component (getcwd() has the
advantage of starting on a directory, namely '.').

A minor problem is that the creation of new files (like the shell
command "echo TEXT > FILE" would do) will call the create, then open and
close vops.  But after a successful create vop on the lower layer the
cache entry is missing when the open vop runs.  I could get around this
by inserting the cache entry myself, although the code is ugly and
assumes too much about the cache's internals.


Problems which exist, too, but surely can be solved are:

When a vnode has multiple parent directories, the filename resolving
routine should return multiple filenames (this could be solved by using
a "tree" for the pathname components which usually degenerates into a
simple list for non aliased filenames, I think of a mbuf like chain or
something similar).

Since I'm not at all familiar with the VFS subsystem the implementation
surely does the vnode locking wrong.  Getting the code reviewed (after
its logic has stabilized) would be needed.


Should it be helpful for the discussion I may post the resolving routine
here (it's a private project of mine to tinker around with this topic
when there is spare time left).

But I guess the problems are well listed above.  The most beautiful
solution would be to have the OS resolve the filename in a vn_fullpath()
routine.  But getting a pointer how to work around the above issues or
where to learn how to fix them is great, too. :)  I did dig in the
/usr/src/sys/ tree and search Google and netbsd.org, but I'm stuck ATM.


virtually yours
Gerhard Sittig
-- 
     If you don't understand or are scared by any of the above
             ask your parents or an adult to help you.