Re: CVS commit: src/sys/kern

To: Christos Zoulas <christos%zoulas.com@localhost>, dholland%netbsd.org@localhost
Subject: Re: CVS commit: src/sys/kern
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Sun, 02 Oct 2022 02:39:00 +0700

    Date:        Sat, 1 Oct 2022 13:00:04 -0400
    From:        Christos Zoulas <christos%zoulas.com@localhost>
    Message-ID:  <8BD3A408-77FD-402C-8D6C-AD4B4A5E860A%zoulas.com@localhost>

  | This is what I meant:
  |
  | RCS file: /cvsroot/src/sys/kern/kern_descrip.c,v
  | retrieving revision 1.251
  | diff -u -u -r1.251 kern_descrip.c
  | --- kern_descrip.c      29 Jun 2021 22:40:53 -0000      1.251
  | +++ kern_descrip.c      1 Oct 2022 16:56:44 -0000
  | @@ -1162,6 +1162,7 @@
  |         KASSERT(ff->ff_allocated);
  |         KASSERT(fd_isused(fdp, fd));
  |         KASSERT(fd >= NDFDFILE || ff == (fdfile_t *)fdp->fd_dfdfile[fd]);
  | +       ff->ff_exclose = (fp->f_flag & O_CLOEXEC) != 0;
  |
  |         /* No need to lock in order to make file initially visible. */
  |         ff->ff_file = fp;

That's not going to work in the situation we're in, as nothing will have
set O_CLOEXEC in fp->f_flag (or shouldn't have anyway, right, O_CLOEXEC isn't
that kind of a flag, it belongs to the descriptor, not the file*).

f_flag is set in vfs_syscalls.c in open_setfp() - which is never called
in the cloning device case, that's the underlying problem I believe.

Even when it is called, the code is:

	fp->f_flag = flags & FMASK;

where FMASK is (from fcntl.h)

#define FMASK           (FREAD|FWRITE|FCNTLFLAGS|FEXEC)

and

#define FCNTLFLAGS      (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FDSYNC|FRSYNC|FALTIO|\
                         FDIRECT|FNOSIGPIPE)

which all looks exactly as it should be to me - and note that O_CLOEXEC
(which has no Fxxxx equivalent name - it doesn't need one) is not there.

So, fp->f_flag isn't set at all (is probably 0), and even if it were
O_CLOEXEC would not be there, and should not be.

For the vnode actually being opened, none of this matters, as the open
lasts as long (actually not as long) as the open() sys call - when that
returns, the device being opened has been closed again, so what the
file that refers to it looks like (or would have looked like) really
doesn't matter at all, it never becomes visible.   That's my guess as
to why the open_setfp() call is missing in that case.

But what got forgotten (or deliberately was not done) was anything to
affect the modes of the clone which is opened.

  | What does it mean when the open specifies O_CLOEXEC
  | and ff->ff_exclose is false? Can that happen? Is that desirable?

It is what is currently happening whenever we open a cloning device.
(Or that is what it looks like to me).   Desirable, no idea, I didn't
define the semantics of cloning device opens, nor which of the open
flags should apply to the clone that is created.

  | I am fine with the locking to stay where it is. I guess it is probably
  | not needed after dup/clone, since the underlying vnode is shared...

Assuming you mean dup(2) (and dup2()) and clone(2) then no - those have
no way to pass the relevant locking flags, if you have a fd and want to
apply a lock, fcntl() is what does that (and the fcntl operations that
duplicate fds do not also apply locks).

The only issue is O_EXLOCK and O_SHLOCK (and O_CLOEXEC) for cloned devices.

I suspect it makes sense for O_CLOEXEC to be applied in that case, it
makes little sense for open("/dev/ptmx", O_RDWR|O_CLOEXEC) to succeed,
returning an open fd which does not have cloexec set (which is the issue,
along with O_NONBLOCK) which started all of this discussion.

The locking flags I am less sure about.   I don't see how they can fail
to succeed if applied, as the vnode for the device has just been created,
nothing else can possibly have any kind of lock on it.   Whether there's
any benefit in applying the lock so that the node is locked for later,
I don't know - but it certainly should do no harm to do that.

It seems clear to me that what we need is (something like)

Index: vfs_syscalls.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.555
diff -u -r1.555 vfs_syscalls.c
--- vfs_syscalls.c	12 Feb 2022 15:51:29 -0000	1.555
+++ vfs_syscalls.c	1 Oct 2022 19:27:15 -0000
@@ -1763,6 +1763,9 @@
 		error = fd_dupopen(dupfd, dupfd_move, flags, &indx);
 		if (error)
 			return error;
+		error = open_setfp(l, fp, XXXvp, indx, flags);
+		if (error)
+			return error;
 		*fd = indx;
 	} else {
 		error = open_setfp(l, fp, vp, indx, flags);

where XXXvp needs to be extracted from somewhere (it isn't vp, as vp==NULL)
except that what follows in the else case is ...

                if (error)
                        return error;
                VOP_UNLOCK(vp);
                *fd = indx;

That VOP_UNLOCK(vp) is what is bothering me,   It tells me that open_setfp()
is expecting to be called with vp locked - but in the first case (the cloning
case) there is no VOP_UNLOCK() call (and what's more, fd_dupopen() cannot
do it, as it has no vp arg).   That means, I believe, that when vn_open()
returns in the normal case, vp is returned locked, but in the cloning case
the vnode that was created for the clone is not locked.

I'm not sure what is the right way to find the vnode, or how it should
properly be locked so open_setfp() can do its thing.   If I knew all of
that I would have made an attempt at fixing this already.   We need
someone who really understands what is happening here, and the right
way to handle it all (which very likely is nothing like I just suggested).

kre

Follow-Ups:
- Re: CVS commit: src/sys/kern
  - From: Robert Elz
- Re: CVS commit: src/sys/kern
  - From: Christos Zoulas

References:
- Re: CVS commit: src/sys/kern
  - From: Christos Zoulas
- Re: CVS commit: src/sys/kern
  - From: Christos Zoulas
- Re: CVS commit: src/sys/kern
  - From: Christos Zoulas
- Re: CVS commit: src/sys/kern
  - From: Christos Zoulas
- Re: CVS commit: src/sys/kern
  - From: Robert Elz
- Re: CVS commit: src/sys/kern
  - From: Robert Elz
- Re: CVS commit: src/sys/kern
  - From: Robert Elz
- Re: CVS commit: src/sys/kern
  - From: Robert Elz

Prev by Date: Re: CVS commit: src/sys/kern
Next by Date: Re: CVS commit: src/external/cddl/osnet/dist/uts/common/fs/zfs
Previous by Thread: Re: CVS commit: src/sys/kern
Next by Thread: Re: CVS commit: src/sys/kern
Indexes:

Home | Main Index | Thread Index | Old Index