tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: exact semantics of union mounts (and TRYEMULROOT)

    Date:        Mon, 10 Jul 2017 17:40:54 +0000
    From:        David Holland <>
    Message-ID:  <>

  | Union mounts are complicated in this regard because when the directory
  | involved is a union mount point, some layer of the union mount needs
  | to be chosen to invoke the filesystem-level operation;

I don't think so, directory ops all happen at the upper level (or nowhere).

  | Directory operations can be divided into five categories:
  |  - lookup (ordinary directory traversal, operations like stat, open
  |       without O_CREATE, etc.)
  |  - nonexclusive create (open without O_CREATE)

You mean without O_EXCL, it has to have O_CREATE or it isn't a create
at all, just an open of an existing file, which is just a lookup (doesn't
matter if is a read, write, or read-write open).

  |  - exclusive create (mkdir, symlink, open with O_CREATE|O_EXCL, etc.)
  |  - remove (rmdir, unlink)
  |  - rename

Forget rename(), the relevant operation is link() - rename is just
link(),unlink() with idempotent semantics.

  | For lookup,

Agreed, no question.   And to answer mouse, if there's a whiteout found,
the search terminates, and the file was not found.

  | For nonexclusive create, we should do the same, and if we run out of
  | layers start at the top again

No, if the file does not exist, it is created, in the top level, there
is no "start again"

  | For an exclusive create, however, we need to ascertain that the name
  | doesn't exist before we try creating anything.

As you do for nonexclusive create - the only difference is what happens
when the name does exist.   For one it is an error, for the other the
open just uses the existing file.

  | Various security
  | properties depend on exclusive create actually being exclusive, and I
  | don't think having union mounts weaken this is healthy.

Of course.

  | So I think we need to test all layers before creating anything.

Of course.

  | (It also means we need to lock all layers,

The top layer needs to be locked, and remain that way, I expect (though
we just have a normal race if it is unlocked, then locked again later,
only effect would be, I think, that the top directory would need to be
checked again in case the file appeared in the meantime.)   That is, anyone
creating the same file name will put it in the upper layer, and it is
just a question of who gets there first, which is something we do not need
to answer, just make sure there is only one winner.   If someone at the
same time is creating a file in the alternative name for the under layer
then "so what", that's not a problem.

  | Once we've ascertained that the name doesn't exist, we use the topmost
  | read-write layer;

Huh?  Where does that come from?   You use the upper layer.  If for any
reason the file cannot be created there, the operation fails.   No second

  | For remove, I think the correct thing to do is to descend until we
  | find the topmost layer where the target name exists, if any,

We look see if the file exists, yes, if not there is nothing to do (error.)

  | and then operate at that layer.

No, all changes in the top layer, the file is "removed" by creating
a whiteout, which will then cause any lookup to fail.

  | And for rename, [...]

Just consider it as link+unlink (and keep the locking to make it idempotent,
which tends to be the complex part...)

The unlink part was just covered.   For link() the first filename is
just a lookup, and no different than any other.   The second filename is
then processed as for an exclusive create.   Simple (except there needs
to be the EXDEV check added.)

  | Plan 9 has a mount flag (mount -c) that it uses to pick the layer
  | where new objects get created, rather than going by readonly vs.
  | read-write; we don't have that but could implement it.

We don't have the option, but we do have the picked layer - the top one.

  | Does this seem reasonable?

Far too complicated.   Keep it simple.


Home | Main Index | Thread Index | Old Index