tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Vnode API cleanup pass 2a



On Jan 13, 2014, at 8:39 AM, David Holland <dholland-tech%netbsd.org@localhost> 
wrote:

> On Tue, Jan 07, 2014 at 11:30:40AM +0100, J. Hannken-Illjes wrote:
>>> (also, while this is minor I think I'd rather have vop_mkdir_args_v2
>>> and/or vop_mkdir_desc_v2 rather than vop_mkdir2_args and
>>> vop_mkdir2_desc. The version doesn't need to be part of the operation
>>> name.)
>> 
>> Looks very good -- changed vnode_if.sh to create xxx_v2_args. Diff attached.
> 
> Would you mind committing the versioning logic right away? It has
> clear value and it would be nice for it to appear in a separate
> changeset.

Done.

<snip>
>>>>> Unlocking and relocking should not itself be a problem since AFAIK
>>>>> there isn't any reason for those locks to be held in the first place;
>>>>> however, I think I'd better go review all the call sites just in case.
>>>>> 
>>>>> Plus I think it would be advisable to make this change first, so that
>>>>> in case we do find a site where it matters we can lock the returned
>>>>> object before anyone else can see it:
>>>>> 
>>>>> #
>>>>> -#% mkdir      dvp     L U U
>>>>> +#% mkdir      dvp     L L L
>>>> 
>>>> Where should it matter? VOP_MKDIR is only used from:
<snip>
>>>> sys/nfs/nfs_serv.c:nfsrv_mkdir()
> 
> This extracts the nfs file handle for the new dir and calls
> VOP_GETATTR. If we don't leave the directory locked, these actions are
> no longer atomic with the directory creation. I don't know whether
> this is safe or not. My guess is not. In the absence of clear
> information it seems prudent to not introduce this race.

RFC 1813 is a bit vague here:

  post_op_attr

  This structure is used for returning attributes in those operations that are
  not directly involved with manipulating attributes. One of the principles of
  this revision of the NFS protocol is to return the real value from the 
indicated
  operation and not an error from an incidental operation. The post_op_attr 
structure
  was designed to allow the server to recover from errors encountered while
  getting attributes.

  This appears to make returning attributes optional. However, server 
implementors
  are strongly encouraged to make best effort to return attributes whenever 
possible,
  even when returning an error.

  post_op_fh3

  One of the principles of this revision of the NFS protocol is to return the 
real
  value from the indicated operation and not an error from an incidental 
operation.
  The post_op_fh3 structure was designed to allow the server to recover from 
errors
  encountered while constructing a file handle.

  This is the structure used to return a file handle from the CREATE, MKDIR, 
SYMLINK,
  MKNOD, and READDIRPLUS requests. In each case, the client can get the file 
handle by
  issuing a LOOKUP request after a successful return from one of the listed 
operations.
  Returning the file handle is an optimization so that the client is not forced 
to
  immediately issue a LOOKUP request to get the file handle.

The "reference implementation" from Solaris is not atomic:

  VOP_MKDIR(dvp, &vp)
  VOP_GETATTR(vp)
  makefh(vp)
  VN_RELE(vp)
  VN_RELE(dvp)

>>>> sys/fs/union/union_subr.c:union_mkshadow()
> 
> ...but this doesn't, the caller is trying to create the directory and
> also bind it into the onionfs layering atomically. Of course, being
> onionfs, it's already only approximate and adding another race to it
> may not make things any worse; but it's probably better not to.


Looking closer it makes things better here.  Currently unionfs will
lock in the wrong order (uppervp -> upperdvp) while it now does it right.

If the shadow directory we just created would disappear because another
thread removed it outside of unionfs we had much more problems than
a race here.


Even though it looks like we don't need creation ops to return with
locked directory at the moment it may be better to do this change now
so atomic operations like the one seen in NFS server are possible.

Thus we would end up with:

-#% create     dvp     L U U
+#% create     dvp     L L L
-#% create     vpp     - L -
+#% create     vpp     - U -
-       IN LOCKED=YES WILLPUT struct vnode *dvp;
+       IN LOCKED=YES struct vnode *dvp;
-#% mknod      dvp     L U U
+#% mknod      dvp     L L L
-#% mknod      vpp     - L -
+#% mknod      vpp     - U -
-       IN LOCKED=YES WILLPUT struct vnode *dvp;
+       IN LOCKED=YES struct vnode *dvp;
-#% mkdir      dvp     L U U
+#% mkdir      dvp     L L L
-#% mkdir      vpp     - L - 
+#% mkdir      vpp     - U - 
-       IN LOCKED=YES WILLPUT struct vnode *dvp;
+       IN LOCKED=YES struct vnode *dvp;
-#% symlink    dvp     L U U
+#% symlink    dvp     L L L
-#% symlink    vpp     - L -
+#% symlink    vpp     - U -
-       IN LOCKED=YES WILLPUT struct vnode *dvp;
+       IN LOCKED=YES struct vnode *dvp;


Please comment soon, I will prepare a new diff then.

--
J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig 
(Germany)



Home | Main Index | Thread Index | Old Index