tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Problem with syscall_disestablish() - PR kern/50430



For a detailed discussion of the problem, please read the PR [1]

In a nut-shell, we cannot rely on the struct lwp's l_sysent member
to determine if a particular syscall is "active", since it could be
overwritten during signal handling.  As a result, it is possible to
unload modules which have active syscalls, and when those syscalls
are resumed they'll return into memory that most likely no longer
contains the syscall code.  (A big "Thanks!" goes out to Masao
Uebayashi for identifying this bug.)

An initial draft of a proposed solution is also described in some
detail in the PR, but here is an updated description:

1. In struct sysent, add a new member which contains a flag bit to
   identify the syscall's origin.  When a package of syscalls is
   added by syscall_establish(), an unused bit is allocated and
   stored with each sysent.  If all 64 bits are in use, we will
   return ENFILE and the syscall package will not be established.

   (This bit is local to each emulation - struct emul - so there
   can be up to 64 syscall packages loaded for each emul.)

2. In struct lwp we add a mask which is the logical OR from all
   syscall struct sysent's that have been invoked by the lwp.

3. Also in struct lwp, add a "syscall depth counter" so we can
   determine if the lwp has multiple active syscalls (ie, an
   active signal handler has an active syscall!).

   Both of these new struct lwp members are updated in sycall() in
   sys/syscallvar.h (the same place as where the original l_sysent
   member us updated).

4. In syscall_disestablish() we add a check.  If any lwp has more
   than one active syscall, we check to see if that lwp has ever
   called any syscall with the same flag/mask bit as the syscall
   package being removed.  If so, we disallow the removal.  (For
   any lwp which has only zero or one active syscall, we continue
   to check if that syscall - in l_sysent - is the one being
   removed.)


The attached file XXX.diffs implement this solution.  I have tested
it (on amd64 only) and everything seems to work, at least as well as
before.  I do NOT have a sample program to actually reproduce the
actual bug, so I cannot 100% confirm that the bug has been squashed.
(If someone wants to provide a sample program, it would be very
much appreciated!  I'm also working on one.)

Constructive comments would be greatly appreciated.


In addition to more thorough testing (including a test program to
validate the fix), I do have a few questions that need resolving
before these changes can be considered complete:

1. When a new lwp is created for an existing process, should it's
   mask-of-all-syscalls-ever-used be reset to zero?  Or should the
   new lwp inherit the mask value from the existing lwp?  (I think
   that currently, the new struct lwp is zeroed when allocated.)

2. Is there any way for a lwp, once it has started execution, to
   "move to" a different emulation?  If it did, we would lose track
   of which syscalls it had previously called in the original emul,
   so we would need to make the bits/masks more global, applying to
   all emulations.

3. The implementation allows for up to 64 mask bits (per emulation);
   is this enough?  Or should we allow a larger number?  Does the
   limit need to be dynamic? (blecchhh!)

4. I'm currently returning ENFILE if we run out of mask bits.  Others
   have suggested different error returns, perhaps ESRCH or ENOMEM,
   might be better.  Any other suggestions?

5. I'm wondering if we should add a

	#ifndef __HAVE_MINIMAL_EMUL
	KASSERT(code < em->e_nsysent);
	#endif

   in both syscall_establish() and syscall_disestablish()?  For many
   ports this would have no effect - __HAVE_MINIMAL_EMUL is defined
   (which eliminates the e_nsysent member from struct emul) - but it
   still might be useful from a documentation perspective.)


Oh, if/when this gets committed, it will require a bump of the kernel
version 7.99.xx due to the changes in internal structures.  (Most
importantly, struct emul is shared by the kernel and several exec_*
and compat_*  modules.)



+------------------+--------------------------+-------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+------------------+--------------------------+-------------------------+
Index: kern/kern_syscall.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_syscall.c,v
retrieving revision 1.11
diff -u -p -r1.11 kern_syscall.c
--- kern/kern_syscall.c	9 May 2015 05:56:36 -0000	1.11
+++ kern/kern_syscall.c	17 Nov 2015 02:34:08 -0000
@@ -55,6 +55,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_syscall
 #include <sys/xcall.h>
 #include <sys/ktrace.h>
 #include <sys/ptrace.h>
+#include <sys/bitops.h>
 
 int
 sys_nomodule(struct lwp *l, const void *v, register_t *retval)
@@ -104,10 +105,12 @@ sys_nomodule(struct lwp *l, const void *
 }
 
 int
-syscall_establish(const struct emul *em, const struct syscall_package *sp)
+syscall_establish(struct emul *em, const struct syscall_package *sp)
 {
 	struct sysent *sy;
+	uint64_t mask;
 	int i;
+	int bit;
 
 	KASSERT(kernconfig_is_held());
 
@@ -122,6 +125,7 @@ syscall_establish(const struct emul *em,
 	 * it can become busy and we could be unable to remove it
 	 * on error.
 	 */
+	mask = 0;
 	for (i = 0; sp[i].sp_call != NULL; i++) {
 		if (sy[sp[i].sp_code].sy_call != sys_nomodule) {
 #ifdef DIAGNOSTIC
@@ -130,16 +134,26 @@ syscall_establish(const struct emul *em,
 			return EBUSY;
 		}
 	}
+	/*
+	 * find the first available package-mask bit
+	 */
+	bit = ffs64(~em->e_scpkgs);
+	if (bit == 0)
+		return ENFILE;
+	mask = 1 << (bit - 1);
+
 	/* Everything looks good, patch them in. */
+	em->e_scpkgs |= mask;
 	for (i = 0; sp[i].sp_call != NULL; i++) {
 		sy[sp[i].sp_code].sy_call = sp[i].sp_call;
+		sy[sp[i].sp_code].sy_pkg_mask = mask;
 	}
 
 	return 0;
 }
 
 int
-syscall_disestablish(const struct emul *em, const struct syscall_package *sp)
+syscall_disestablish(struct emul *em, const struct syscall_package *sp)
 {
 	struct sysent *sy;
 	uint64_t where;
@@ -166,14 +180,18 @@ syscall_disestablish(const struct emul *
 	 * Run a cross call to cycle through all CPUs.  This does two
 	 * things: lock activity provides a barrier and makes our update
 	 * of sy_call visible to all CPUs, and upon return we can be sure
-	 * that we see pertinent values of l_sysent posted by remote CPUs.
+	 * that we see pertinent values of l_scdepth, l_sysent and l_scmask
+	 * posted by remote CPUs.
 	 */
 	where = xc_broadcast(0, (xcfunc_t)nullop, NULL, NULL);
 	xc_wait(where);
 
 	/*
 	 * Now it's safe to check l_sysent.  Run through all LWPs and see
-	 * if anyone is still using the system call.
+	 * if anyone is still using the system call.  We only need to
+	 * check the mask bits once, since we assume that the package of
+	 * syscalls was established as a single package, thus all the
+	 * sy[].sy_pkg_mask bits will be identical.
 	 */
 	for (i = 0; sp[i].sp_call != NULL; i++) {
 		mutex_enter(proc_lock);
@@ -181,6 +199,11 @@ syscall_disestablish(const struct emul *
 			if (l->l_sysent == &sy[sp[i].sp_code]) {
 				break;
 			}
+			if (i == 0 &&
+			    l->l_scdepth > 1 &&
+			    l->l_scmask & sy[sp[i].sp_code].sy_pkg_mask) {
+				break;
+			}
 		}
 		mutex_exit(proc_lock);
 		if (l == NULL) {
@@ -197,6 +220,23 @@ syscall_disestablish(const struct emul *
 		}
 		return EBUSY;
 	}
+	/*
+	 * Success - now we can clear the pkg_mask bits so we can
+	 * re-use the value later
+	 *
+	 * We assume that syscall packages are properly grouped, and
+	 * that established packages are disestablished in their
+	 * entirety.  This allows us to simply clear the bit in the
+	 * emul's mask.  Otherwise we'd have to scan all entries in
+	 * the e_sysent table, requiring us to know how many entries
+	 * there are, and thus having to provide e_nsysent for all
+	 * systems, regardless of __HAVE_MINIMAL_EMUL
+	 */
+	if (sp[0].sp_call != NULL)
+		em->e_scpkgs &= ~sy[sp[0].sp_code].sy_pkg_mask;
+
+	for (i = 0; sp[i].sp_call != NULL; i++)
+		sy[sp[i].sp_code].sy_pkg_mask = 0;
 
 	return 0;
 }
Index: sys/lwp.h
===================================================================
RCS file: /cvsroot/src/sys/sys/lwp.h,v
retrieving revision 1.170
diff -u -p -r1.170 lwp.h
--- sys/lwp.h	31 Mar 2015 01:10:02 -0000	1.170
+++ sys/lwp.h	17 Nov 2015 02:34:08 -0000
@@ -187,6 +187,8 @@ struct lwp {
 	int		l_pflag;	/* !: LWP private flags */
 	int		l_dupfd;	/* !: side return from cloning devs XXX */
 	const struct sysent * volatile l_sysent;/* !: currently active syscall */
+	uint64_t	volatile l_scmask; /* !: set mask of used syscalls */
+	int		volatile l_scdepth; /* !: syscall nesting depth */
 	struct rusage	l_ru;		/* !: accounting information */
 	uint64_t	l_pfailtime;	/* !: for kernel preemption */
 	uintptr_t	l_pfailaddr;	/* !: for kernel preemption */
Index: sys/proc.h
===================================================================
RCS file: /cvsroot/src/sys/sys/proc.h,v
retrieving revision 1.323
diff -u -p -r1.323 proc.h
--- sys/proc.h	24 Sep 2015 14:33:31 -0000	1.323
+++ sys/proc.h	17 Nov 2015 02:34:08 -0000
@@ -141,6 +141,7 @@ struct emul {
 	int		e_nsysent;	/* Number of system call entries */
 #endif
 	struct sysent	*e_sysent;	/* System call array */
+	uint64_t	e_scpkgs;	/* Established syscall packages */
 	const char * const *e_syscallnames; /* System call name array */
 					/* Signal sending function */
 	void		(*e_sendsig)(const struct ksiginfo *,
Index: sys/syscallvar.h
===================================================================
RCS file: /cvsroot/src/sys/sys/syscallvar.h,v
retrieving revision 1.11
diff -u -p -r1.11 syscallvar.h
--- sys/syscallvar.h	24 Sep 2015 14:34:22 -0000	1.11
+++ sys/syscallvar.h	17 Nov 2015 02:34:08 -0000
@@ -52,8 +52,8 @@ struct syscall_package {
 };
 
 void	syscall_init(void);
-int	syscall_establish(const struct emul *, const struct syscall_package *);
-int	syscall_disestablish(const struct emul *, const struct syscall_package *);
+int	syscall_establish(struct emul *, const struct syscall_package *);
+int	syscall_disestablish(struct emul *, const struct syscall_package *);
 
 static inline int
 sy_call(const struct sysent *sy, struct lwp *l, const void *uap,
@@ -61,8 +61,11 @@ sy_call(const struct sysent *sy, struct 
 {
 	int error;
 
+	l->l_scmask |= sy->sy_pkg_mask;
 	l->l_sysent = sy;
+	l->l_scdepth++;
 	error = (*sy->sy_call)(l, uap, rval);
+	l->l_scdepth--;
 	l->l_sysent = NULL;
 
 	return error;
Index: sys/systm.h
===================================================================
RCS file: /cvsroot/src/sys/sys/systm.h,v
retrieving revision 1.269
diff -u -p -r1.269 systm.h
--- sys/systm.h	29 Oct 2015 00:27:08 -0000	1.269
+++ sys/systm.h	17 Nov 2015 02:34:08 -0000
@@ -121,6 +121,7 @@ extern struct sysent {		/* system call t
 	short	sy_argsize;	/* total size of arguments */
 	int	sy_flags;	/* flags. see below */
 	sy_call_t *sy_call;     /* implementing function */
+	uint64_t sy_pkg_mask;	/* mask for run-time installed syscalls */
 	uint32_t sy_entry;	/* DTrace entry ID for systrace. */
 	uint32_t sy_return;	/* DTrace return ID for systrace. */
 } sysent[];


Home | Main Index | Thread Index | Old Index