tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Vnode API cleanup pass 2a



On Tue, Jan 07, 2014 at 11:30:40AM +0100, J. Hannken-Illjes wrote:
 > > (also, while this is minor I think I'd rather have vop_mkdir_args_v2
 > > and/or vop_mkdir_desc_v2 rather than vop_mkdir2_args and
 > > vop_mkdir2_desc. The version doesn't need to be part of the operation
 > > name.)
 > 
 > Looks very good -- changed vnode_if.sh to create xxx_v2_args. Diff attached.

Would you mind committing the versioning logic right away? It has
clear value and it would be nice for it to appear in a separate
changeset.

 > >>> Also, since what you're doing is basically unlocking before return in
 > >>> every implementation, and then relocking after return at every call
 > >>> site, it's possible to do one op at a time; it seems like that would
 > >>> be a good idea in case unexpected problems crop up.
 > >> 
 > >> This is not what I'm doing.
 > > 
 > > It still seems like a good idea to do one op at a time, or at least do
 > > one op first.
 > 
 > This is missing any arguments for the second time

...because it's vfs and everything in vfs is delicate? Small
incremental changes are safer; if something weird goes wrong, it's
easier to deal with the smaller the change was. (If nothing else, it's
easier to rever only the part that broke; it's also easier to bisect
later.)

 > and you ignored my
 > arguments after "This is not what I'm doing.".

That's because they aren't really relevant to working incrementally.

 > >>> Unlocking and relocking should not itself be a problem since AFAIK
 > >>> there isn't any reason for those locks to be held in the first place;
 > >>> however, I think I'd better go review all the call sites just in case.
 > >>> 
 > >>> Plus I think it would be advisable to make this change first, so that
 > >>> in case we do find a site where it matters we can lock the returned
 > >>> object before anyone else can see it:
 > >>> 
 > >>> #
 > >>> -#% mkdir      dvp     L U U
 > >>> +#% mkdir      dvp     L L L
 > >> 
 > >> Where should it matter? VOP_MKDIR is only used from:

In the case of VOP_SYMLINK it definitely matters, as the new symlink
isn't filled until after VOP_SYMLINK returns. Since the locking
protocol for the containing directory is to unlock it *before*
returning, it then becomes possible to see the empty symlink. And that
isn't so good.

mkdir is less clear, but:

 > >> external/cddl/osnet/dist/uts/common/fs/zfs/zfs_replay.c:zfs_replay_create()

This doesn't appear to do anything with the new dir (right now it just
panics...)

 > >> sys/kern/vfs_syscalls.c:do_sys_mkdirat()

This does nothing and is ok.

 > >> sys/nfs/nfs_serv.c:nfsrv_mkdir()

This extracts the nfs file handle for the new dir and calls
VOP_GETATTR. If we don't leave the directory locked, these actions are
no longer atomic with the directory creation. I don't know whether
this is safe or not. My guess is not. In the absence of clear
information it seems prudent to not introduce this race.

 > >> sys/fs/union/union_vnops.c:union_mkdir()

This looks ok.

 > >> sys/fs/union/union_subr.c:union_mkshadow()

...but this doesn't, the caller is trying to create the directory and
also bind it into the onionfs layering atomically. Of course, being
onionfs, it's already only approximate and adding another race to it
may not make things any worse; but it's probably better not to.

 > >> sys/rump/librump/rumpvfs/rumpvnode_if.c:RUMP_VOP_MKDIR()

this shouldn't matter.

 > > I have been putting off answering this because I ought to examine all
 > > these, and I haven't managed to do so yet but figured I ought to
 > > answer the rest.

It looks to me like it would definitely be better to first fix the
VOP_MKDIR interface to not release the directory lock within the call.

Certainly for VOP_SYMLINK.

Therefore, it's probably best to fix all the creation ops.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index