filesystem namespace regions, or making mountd less bozotic

To: tech-kern%netbsd.org@localhost
Subject: filesystem namespace regions, or making mountd less bozotic
From: David Holland <dholland-tech%netbsd.org@localhost>
Date: Wed, 5 Dec 2012 21:29:06 +0000

I am tired of PR 3019 and its many duplicates, so I'd like to see a
scheme that allows managing arbitrary subtrees of the filesystem
namespace in a reasonably useful manner.

The immediate application is nfs exports and mountd; however, I expect
the resulting mechanism will also be useful for handling chroots and
possibly also inotify-type mechanisms.

This is a bunch of not fully formed thoughts, so it probably has
holes. Please pick away.

The basic idea is that there are a number of cases where we want to be
able to name a subtree of the filesystem namespace and apply some
property to it -- export permissions, for example. The basic problem
with this is that because it's a non-local phenomenon, given an
arbitrary directory vnode you can check if it's part of a region
either by searching upwards, which isn't atomic, or by caching the
info in the vnode, which can become invalid after a rename(2)
elsewhere in the system.

To cope with this I propose something like the following:

   1. Build a high-level layer that can compose high-level regions
      out of low-level regions, such that any particular portion of
      the namespace never belongs to more than one low-level region
      but high-level regions can be arranged arbitrarily, and low-
      level regions don't need to span filesystems. There are a few
      issues here but it's mostly straightforward, so the rest of
      this mail will talk only about low-level regions.

   2. Declare a 'struct region' or some such suitable name to hold
      the info for each low-level region. This includes pointers to
      high-level region structures or client data or whatever, plus
      the control information described below.

   3. In each directory vnode, add a pointer, which I'll call
      v_region, that is either null or points to the current region.
      Also add a flag, in a suitable flags word, which I'll call
      V_REGIONLEADER, that marks the top directory of a region.

   4. Make struct region refcounted; each vnode pointing to a region
      holds a reference. When no references are left, the structure
      is freed.

   5. Each struct region can be marked invalid. If a region is
      invalid, it means it has been superseded; references to it
      should, when encountered, be replaced with references to the
      newer version if any.

   6. When a new region is established, allocate the region structure,
      set the vnode at the top of the subtree to point to it, and mark
      that vnode V_REGIONLEADER. If the region leader was already in a
      region, then the new region is a subregion of the old; the high-
      level region code is supposed to allow for this. (It is also
      necessary to invalidate the old region.)

   7. During lookup, region info propagates downward; that is, if D is
      in region A, and D contains D1, then when D1 is looked up, D1
      can also be tagged as being in region A. Unless D1 is a region
      leader, of course. getcwd should also propagate region info
      downward. (Region info can also be propagated upward when
      looking up "..", but this is probably not of much value.) If D1
      points to a stale version of its current region, or to no
      region, and D does not, this will provide D1 with the most up to
      date version. Region info can also be propagated at mkdir time.

   8. At rename time, regions are invalidated if necessary. If
      renaming A/B to C/D, B's region is invalidated if C belongs to a
      different region from A. Because regions are pointers, this
      invalidation also invalidates the region info for the entire
      subtree of B.

   9. If a region leader is removed, the region is invalidated and
      also logically ceases to exist. The high-level region code is
      supposed to be able to cope with this.

   10. If a region leader is renamed, the region likewise logically
      ceases to exist. The region is invalidated and the region leader
      vnode ceases to be a region leader. Alternatively, for some
      applications it might make more sense to update the high-level
      region data to reflect the new name.

   11. Depending on the application, if a high-level region contains
      or is represented by a nonexistent low-level region, and mkdir
      causes the region to exist again (by name), it might be
      desirable to create a new low-level region and tag the new
      directory with it. This might be problematic, however.

   12. When it becomes necessary to test region membership on a
      directory, if the directory points to a valid region that region
      can be used. If the directory points to an invalid region, the
      region info can be explicitly updated by calling getcwd. If the
      region pointer is null, the directory is part of no region.

I think this covers all the bases, and it shouldn't be horribly
expensive to implement.

Since all that happens during rename is pointer comparison and
region invalidation, if the region's valid flag is updated with atomic
ops there should be no problem interacting with rename locking.

However, there may be locking problems in how the low-level and
high-level region code need to interact. Also, propagating region
information during lookup and getcwd may create additional
complications.

-- 
David A. Holland
dholland%netbsd.org@localhost

Follow-Ups:
- Re: filesystem namespace regions, or making mountd less bozotic
  - From: Robert Elz
- Re: filesystem namespace regions, or making mountd less bozotic
  - From: David Laight

Prev by Date: Re: core statement on fexecve, O_EXEC, and O_SEARCH
Next by Date: Re: core statement on fexecve, O_EXEC, and O_SEARCH
Previous by Thread: 70,000 TLB shootdown IPIs per second
Next by Thread: Re: filesystem namespace regions, or making mountd less bozotic
Indexes:

Home | Main Index | Thread Index | Old Index