NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: procfs difference between NetBSD and Linux



(please keep me in the Cc: since I'm not currently subscribed to -users)

On Fri, Jun 04, 2021 at 11:41:44PM +0700, Robert Elz wrote:
 > Replying to myself...
 > 
 >   | I think I am going to experiment with simply removing that error case
 >   | and see if anything breaks.
 > 
 > but that cannot work, the issue is that the operations in question return
 > the parent vnode, which, when a mount point has been crossed, isn't possible.
 > Simply returning success in that case won't work at all for all the other
 > uses of the CREAT vnop, which expect that parent vnode.

Right, it's not a "check" but a necessary piece of sanity: for create
ops, what you're _actually_ returning from namei is the parent
directory that you want to do the creation in.

 > At the minute I'm thinking it might be a deficit in the   
 > design of the vnops

It is, and one of the goals of the namei cleanup (that remains
stalled) is to fix that. Basically the right thing to do is move the
final lookup step out of namei. However, that's not happening anytime
soon and wouldn't be suitable for pullup either.

Note that since mounted objects are normally directories, opening them
with O_CREAT doesn't make sense and no issue arises.

It surprises me that FreeBSD and OpenBSD don't have the same problem,
because they have the same "logic" in their namei -- "create" ops do a
full lookup including the final step if it exists -- and need to cope
somehow with that final step being a mount point. We have made some
changes at various points to exactly what happens with mount points,
and I guess one of them must have ended up more restrictive. Or maybe
FreeBSD will explode if you concoct some related scenario that we
discovered at some point :-)

Anyway, since samba apparently insists on doing something stupid, we
need a hack that can go into -9. Much as I hate to add a namei flag,
ISTM that the only reasonably noninvasive way to deal with this is to
mark the O_CREAT-but-not-O_EXCL case and allow it to return with
ni_dvp == NULL if this case is reached.

The patch below has not even been compile-tested and so may need some
adjustments (and might conceivably break rump) but should address the
problem in a way that will, with luck, not explode anything else.

 > My guess at the minute is that to fix this we need a new vnop, OCREATE
 > (optional create) (or something), which works identically to the CREATE
 > operation, except that it doesn't fail if it cannot return the parent
 > vnode - and then callers (which would probably only be open()) using
 > this new op would need to deal with that when it happens.

Adding another op is a better way to do it, but that makes the patch
much more invasive :-|  (and also much more risky for pullup because
it requires auditing every reference to CREATE to make sure it also
addresses the new case...)

   ------

Index: sys/kern/vfs_lookup.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_lookup.c,v
retrieving revision 1.225
diff -u -r1.225 vfs_lookup.c
--- sys/kern/vfs_lookup.c	29 Dec 2020 22:13:40 -0000	1.225
+++ sys/kern/vfs_lookup.c	4 Jun 2021 20:04:58 -0000
@@ -1823,14 +1826,23 @@
 
 			switch (cnp->cn_nameiop) {
 			    case CREATE:
+				if (cnp->cn_flags & NONEXCLHACK) {
+					/*
+					 * open with O_CREAT but not O_EXCL,
+					 * which is not actually a create op
+					 * if the target exists. Allow this
+					 * to return a null parent dir, as
+					 * it won't actually be used.
+					 */
+					break;
+				}
 				return EEXIST;
 			    case DELETE:
 			    case RENAME:
 				return EBUSY;
 			    default:
-				break;
+				panic("Invalid nameiop\n");
 			}
-			panic("Invalid nameiop\n");
 		}
 
 		/*
Index: sys/kern/vfs_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_vnops.c,v
retrieving revision 1.214
diff -u -r1.214 vfs_vnops.c
--- sys/kern/vfs_vnops.c	9 Nov 2020 18:09:02 -0000	1.214
+++ sys/kern/vfs_vnops.c	4 Jun 2021 20:04:58 -0000
@@ -161,6 +161,8 @@
 		if ((fmode & O_EXCL) == 0 &&
 		    ((fmode & O_NOFOLLOW) == 0))
 			ndp->ni_cnd.cn_flags |= FOLLOW;
+		if ((fmode & O_EXCL) == 0)
+			ndp->ni_cnd.cn_flags |= NONEXCLHACK;
 	} else {
 		ndp->ni_cnd.cn_nameiop = LOOKUP;
 		ndp->ni_cnd.cn_flags |= LOCKLEAF;
@@ -183,7 +185,12 @@
 	error = veriexec_openchk(l, ndp->ni_vp, pathstring, fmode);
 	if (error) {
 		/* We have to release the locks ourselves */
-		if (fmode & O_CREAT) {
+		/*
+		 * 20210604 dholland passing NONEXCLHACK means we can
+		 * get ni_dvp == NULL back if ni_vp exists, and we should
+		 * treat that like the non-O_CREAT case.
+		 */
+		if ((fmode & O_CREAT) != 0 && ndp->ni_dvp != NULL) {
 			if (vp == NULL) {
 				vput(ndp->ni_dvp);
 			} else {
@@ -202,7 +209,10 @@
 	}
 #endif /* NVERIEXEC > 0 */
 
-	if (fmode & O_CREAT) {
+	/*
+	 * 20210604 dholland ditto
+	 */
+	if ((fmode & O_CREAT) != 0 && ndp->ni_dvp != NULL) {
 		if (ndp->ni_vp == NULL) {
 			vattr_null(&va);
 			va.va_type = VREG;
Index: sys/sys/namei.src
===================================================================
RCS file: /cvsroot/src/sys/sys/namei.src,v
retrieving revision 1.58
diff -u -r1.58 namei.src
--- sys/sys/namei.src	30 May 2020 20:16:14 -0000	1.58
+++ sys/sys/namei.src	4 Jun 2021 20:04:58 -0000
@@ -154,7 +154,8 @@
 					   in ni_erootdir */
 NAMEIFL	LOCKSHARED	0x00000100	/* want shared locks if possible */
 NAMEIFL	NOCHROOT	0x01000000	/* no chroot on abs path lookups */
-NAMEIFL	MODMASK		0x010001fc	/* mask of operational modifiers */
+NAMEIFL	NONEXCLHACK	0x02000000	/* open wwith O_CREAT but not O_EXCL */
+NAMEIFL	MODMASK		0x030001fc	/* mask of operational modifiers */
 /*
  * Namei parameter descriptors.
  */

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index