Subject: Re: Misc layerfs questions
To: de SAINT LEGER Rodolphe <>
From: Bill Studenmund <>
List: tech-kern
Date: 02/25/2002 15:21:12
On Mon, 25 Feb 2002, de SAINT LEGER Rodolphe wrote:

> Hi,
> I'm actually working on a new union filesystem, using the layerfs,
> mainly to make it nfs exportable, and also to try to make the shadow
> directories created only on write operations, a find wouldn't make
> shadow directories (or at least it could be an option like the
> above/below option).


> As I'm not experienced about the VFS functions (especially locking)
> I'm looking in the code of the following filesystems to get informations:
> nullfs, overlayfs, genfs (layerfs), and of course union

Please note, genfs and layerfs are different things. genfs is a set of
standard routines for all file systems, and layerfs is a set specifically
for layered ones. They are in the same directory, but they are different
things. :-)

> I've got the following questions :
> first, I've think there is a possible bug in the layerfs code, in the
> layerfs_lookup routine (this is actually the first routine I'm hanging
> around (the vfsops are already done for the moment), At the end of the
> function, we got a:
> error = layer_node_create(...

PDIRUNLOCK takes care of this, or its supposed to. The idea is that rather
than bend over backwards, let's just say what happened. Did we unlock the
parent? Yes or no. The caller reacts accordingly.

Note that this assumes that all the vnodes in the stack share the same
lock. This is true for overlay, null, and uname, but won't be true for

> if this works, no problem, but in an error case, nothing more is done,
> we just return the error code

Right. The fact we unlocked the parent has been noted, and the cleanup
routines should notice. vfs_lookup() does.

> I think that it can cause (where the layer_node_create fails) a vnode
> in the lower layer to stay locked (or vref) for ever (due to possible
> lock flags in the cnp structure) ...

Note: we do get the reference counting right AFAICT. PDIRUNLOCK notes that
we got the locking "off."

> I must say that I'm not familiar with lookup lock flags and I don't
> think I can make a piece of code that handle unlocking myself. Also
> I'm perhaps making a mistake, sorry for this if it is the case.

No problem. This flag is there to save you the trouble. :-)

> second, I've noticed that UNION were defined in other places of the
> kernel, I think this is for a mount -o union stuff, but I'm not sure,
> any clues ???

No, UNION is for the union file system. It is used in the readdir routines
to make sure that we look in the right places. MNT_UNION is for mount -o
union. It's kinda confusing.

> third, I want to use at least the maximum of the layerfs, it includes
> layer_node_create(). layer_node_create() calls a layer_node_alloc()
> defined in the layer_mount structure, All the hashlist stuff is done
> in the layer_node_alloc(). The main reason is that in a case of the
> layerfs improvment, the new unionfs would be improved too.
> The problem is that union handle two (lower) layers and the
> layer_node_alloc only takes one, wich would make me to write both
> layer_node_create() and layer_node_alloc().
> does someone have a solution to pass union specific parameters to
> layer_node_alloc() throught layer_node_create() ? If not is it planned
> to implement such a feature ?

For now I'd suggest writing your own layer_node_create() and your own
layer_node_alloc(). The current routines weren't designed for fan-in
layered fs's. Later, we can see about changing the create and allocate

> The last problem I encounter is for shadow directories (".." lookups).
> I found a trick (I have not implemented it yet) which permit without
> any problem to create shadow directories only if necessary (not on cwd
> operation but on create, delete, rename, ... ops). I haven't seen for
> now how it is done in the union (the real one) but the main idea is
> that for one layerfs vnode we have two hashes (one for the upper union
> layer and one for the lower layer).

I think you should make the shadow directories as you go. Think about
what will happen if you don't. Say you have the file
/unionfs/some/directory/tree/file, and you want to write to file. If you
don't create the shadow dirs as you go, you could well have to create the
shadow for /unionfs/some, /unionfs/some/directory, and
/unionfs/some/directory/tree. And you will be doing this at a time when
you only have the vnode for /unionfs/some/directory/tree/file.

So you have to path walk on the underlying file system to the r/o node
under /unionfs (under unionfs's root). Then you have to forward walk,
looking to see if shadow directories exist, creating them if they don't.

In principle, each time you do this, you have to go to /unionfs's root.
If you are lucky, one of the intermediate nodes might still be around, so
you don't have to go all the way up. But since some node intensive things
can clean out the vnode cache, you can't be sure. And you can't stop even
if you find some nodes unless they have shadow dirs.

And then there's the locking dance you'll have to be doing. The rule of
vnode locking is that you have to lock vnodes in root-to-leaf order. So
you'll have to lock your vnode for the fs root (/unionfs above). Then
you'll be able to get /unionfs/some. Then you'll get the the lock for
/unionfs/some/directory. Only then can you lock
/unionfs/some/directory/tree. Oh, and when I say "lock" above, that means
locking both the upper, lower, and union nodes.

That's a REAL butttload of work. It would be much easier to just make the
shadow dirs as you go. You can easily impose the right locking (make sure
that the shadow parent exists and stays locked for creating the new shadow
as you go) since all you need to do is keep the shadow parent locked,
create if you need to, and unlock it if the requester wanted the parent

> Is it safe to reference twice the same layerfs vnode with two lower
> vnodes hash values ?

Depends on how you write your code. :-) hash tables are just a convenience
for finding things. :-) If your code works with it, you should be fine.

> Many thanks for help

No problem. I wish you the best of luck with this project. Many of us have
wanted to redo unionfs. :-)

Take care,