tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

globals in (out of) crt0



While discussing the Emacs problems with Joerg (see PR 51654 for the
gory details) the conclusion we came to is that the best real fix (as
opposed to the workaround of disabling ASLR) is to shift the
__ps_strings global from crt0 to libc; then it won't be referenced
directly by programs, won't be in the linked program's BSS, and won't
be accidentally saved when Emacs dumps. Instead, libc will provide a
private entry point to assign the value of __ps_strings. (This also
turns out to simplify the piece of libc initialization that is
crashing Emacs.)

In the name of tidiness one can also move the __progname and environ
globals. It is my belief that this will not break compat, because
these symbols always needed to be exported from the executable so libc
could manipulate them. The patch does cause __progname to shift from
libc .bss to .data so it's initialized to ""; the copy relocation to
provide the initialization won't be in old executables, but old
executables have an already-initialized copy, so this shouldn't cause
compat problems either.

(If I'm missing something in this analysis, please speak up.)

The following not-yet-tested draft patch does this.

Note that while this won't fix extant Emacs binaries (the only way to
do that is to find a nonabusive way to have _libc_init detect whether
it's in a statically-linked executable) I expect it to make future
Emacs binaries tolerate different stack addresses; it should be
possible to use ASLR with Emacs as long as one doesn't try to link it
as a PIE. There's some chance it might also fix the problem where
32-bit Emacs binaries built in a chroot on a 64-bit build host don't
run on native 32-bit machines.

Unanswered questions:

 - I put declarations for the hooks in unistd.h. I'd rather use a
private header so they aren't exposed, but there isn't currently one
shared by libc and crt0. Should I set one up?

 - I put the hooks in the .text.startup section like _libc_init but
I'm not sure if that's actually appropriate -- what is and isn't
supposed to go in .text.startup?


   ------

# HG changeset patch
# Parent  f22616a0944a190d204bcce997a8d5a7c199b180
Shift the environ, __progname, and __ps_strings globals to libc only.
Don't define or reference them in crt0; instead use private libc hooks
to assign them from inside libc.

Besides making crt0 tidier this has three effects:
   - unless referenced directly from an executable the symbols remain
     private to libc;
   - in the specific case of __ps_strings it simplifies the
     initialization of __libc_dlauxinfo, which no longer has to worry
     about the various ordering interactions of the ways in which
     _libc_init gets called (see comments in initfini.c for details);
   - because of these points it avoids the problem where Emacs dumps
     core early in startup if the stack mapping is different from what
     it was when temacs ran during the build, fixing PR 51654 while
     allowing (limited) ASLR to be used.

There's some chance that this might also fix the problem where an
Emacs built in a 32-bit chroot on amd64 crashes on i386.

Moving these items out of crt0 will not break existing binaries that
already contain the symbols: the symbols must be dynamically exposed
so libc can get to them because all of them are directly used by
libc. (environ is used by getenv, __progname by err*, __ps_strings
by setproctitle.)

In a prior libc:

000000000037df48 B environ
000000000037dc98 B __ps_strings
000000000037e350 B __progname

In this libc:

000000000037ef48 B environ
000000000037ec98 B __ps_strings
000000000036c418 D __progname

__progname changes from .bss to .data because it needs to be
initialized to point to "". Old binaries won't have the copy
relocation for this; however, old binaries have an already-initialized
copy they got from the crt0.o they were linked with. So this will
still work.

diff -r f22616a0944a include/unistd.h
--- a/include/unistd.h	Sat Nov 26 19:32:27 2016 -0500
+++ b/include/unistd.h	Sat Nov 26 22:12:15 2016 -0500
@@ -392,6 +392,11 @@
 quad_t	 __syscall(quad_t, ...);
 int	 undelete(const char *);
 
+/* these are for the private use of crt0.o */
+void	__assign_environ(char **);
+void	__assign_progname(const char *);
+void	__assign_ps_strings(/*struct ps_strings */ void *);
+
 #if 1 /*INET6*/
 int	 rcmd_af(char **, int, const char *,
 	    const char *, const char *, int *, int);
diff -r f22616a0944a lib/csu/common/crt0-common.c
--- a/lib/csu/common/crt0-common.c	Sat Nov 26 19:32:27 2016 -0500
+++ b/lib/csu/common/crt0-common.c	Sat Nov 26 22:12:15 2016 -0500
@@ -70,12 +70,6 @@
 extern unsigned char __etext, __eprol;
 #endif /* MCRT0 */
 
-char		**environ;
-struct ps_strings *__ps_strings = 0;
-
-static char	 empty_string[] = "";
-char		*__progname = empty_string;
-
 __dead __dso_hidden void ___start(void (*)(void), const Obj_Entry *,
 			 struct ps_strings *);
 
@@ -135,19 +129,21 @@
 
 	if (ps_strings == NULL)
 		_FATAL("ps_strings missing\n");
-	__ps_strings = ps_strings;
+	__assign_ps_strings(ps_strings);
 
-	environ = ps_strings->ps_envstr;
+	__assign_environ(ps_strings->ps_envstr);
 
 	if (ps_strings->ps_argvstr[0] != NULL) {
-		char *c;
-		__progname = ps_strings->ps_argvstr[0];
-		for (c = ps_strings->ps_argvstr[0]; *c; ++c) {
+		char *pn, *c;
+
+		pn = ps_strings->ps_argvstr[0];
+		for (c = pn; *c; ++c) {
 			if (*c == '/')
-				__progname = c + 1;
+				pn = c + 1;
 		}
+		__assign_progname(pn);
 	} else {
-		__progname = empty_string;
+		/* nothing - libc initializes it to "" by default */
 	}
 
 	if (&rtld_DYNAMIC != NULL) {
@@ -174,5 +170,6 @@
 	atexit(_fini);
 	_init();
 
-	exit(main(ps_strings->ps_nargvstr, ps_strings->ps_argvstr, environ));
+	exit(main(ps_strings->ps_nargvstr, ps_strings->ps_argvstr,
+	    ps_strings->ps_envstr));
 }
diff -r f22616a0944a lib/libc/gen/getprogname.c
--- a/lib/libc/gen/getprogname.c	Sat Nov 26 19:32:27 2016 -0500
+++ b/lib/libc/gen/getprogname.c	Sat Nov 26 22:12:15 2016 -0500
@@ -42,12 +42,14 @@
 #include "namespace.h"
 
 #include <stdlib.h>
+#include <unistd.h> /* for __assign_progname proto */
 
 #ifdef __weak_alias
 __weak_alias(getprogname,_getprogname)
 #endif
 
-const char *__progname;
+static const char __empty[] = "";
+const char *__progname = __empty;
 
 const char *
 getprogname(void)
@@ -55,3 +57,9 @@
 
 	return (__progname);
 }
+
+void __section(".text.startup")
+__assign_progname(const char *pn)
+{
+	__progname = pn;
+}
diff -r f22616a0944a lib/libc/misc/initfini.c
--- a/lib/libc/misc/initfini.c	Sat Nov 26 19:32:27 2016 -0500
+++ b/lib/libc/misc/initfini.c	Sat Nov 26 22:12:15 2016 -0500
@@ -40,6 +40,7 @@
 #include <sys/exec.h>
 #include <sys/tls.h>
 #include <stdbool.h>
+#include <unistd.h> /* for __assign_ps_strings proto */
 
 void	_libc_init(void) __attribute__((__constructor__, __used__));
 
@@ -77,26 +78,51 @@
 struct ps_strings *__ps_strings;
 
 /*
+ * Hook for new crt0 to assign __ps_strings without referring to it
+ * directly. This also sets __libc_dlauxinfo based on it.
+ *
+ * This hook is now provided instead of having crt0 reference
+ * __ps_strings directly for two reasons: (1) it allows the setup of
+ * __libc_dlauxinfo to proceed when the value for __ps_strings becomes
+ * available. Otherwise in _libc_init it has to deal with the relative
+ * ordering of the assignment of __ps_strings and the calls to
+ * _libc_init. (See discussion below.) And, (2) it avoids referring to
+ * the __ps_strings symbol directly in crt0, which in general avoids
+ * having __ps_strings live in the linked program's data/bss, which
+ * avoids an issue where the value of __ps_strings was getting saved
+ * by Emacs during the build (when Emacs dumps itself out) which was
+ * causing problems.
+ */
+void __section(".text.startup")
+__assign_ps_strings(void *p)
+{
+	__ps_strings = p;
+	__libc_dlauxinfo = __ps_strings->ps_argvstr +
+	    __ps_strings->ps_nargvstr + __ps_strings->ps_nenvstr + 2;
+}
+
+/*
  * _libc_init is called twice.  One call comes explicitly from crt0.o
  * (for newer versions) and the other is via global constructor handling.
- *
- * In static binaries the explicit call is first; in dynamically linked
- * binaries the global constructors of libc are called from ld.elf_so
- * before crt0.o is reached.
+ * Global constructor handling can happen at either of two points: in
+ * dynamically linked executables the global constructors of libc are
+ * invoked from ld.elf_so before crt0.o is reached, but in statically
+ * linked executables the global constructors of libc become part of
+ * the global constructors of the main program and are called *from*
+ * crt0.o.
  *
- * Note that __ps_strings is set by crt0.o. So in the dynamic case, it
- * hasn't been set yet when we get here, and __libc_dlauxinfo is not
- * (ever) assigned. But this is ok because __libc_dlauxinfo is only
- * used in static binaries, because it's there to substitute for the
- * dynamic linker. In static binaries __ps_strings will have been set
- * up when we get here and we get a valid __libc_dlauxinfo.
+ * The consequence of this is that in static binaries the explicit
+ * call from crt0 happens before the global constructor call, but in
+ * dynamically linked binaries the global constructor call happens
+ * first.
  *
- * This code causes problems for Emacs because Emacs's undump
- * mechanism saves the __ps_strings value from the startup execution;
- * then running the resulting binary it gets here before crt0 has
- * assigned the current execution's value to __ps_strings, and in an
- * environment with ASLR this can cause the assignment of
- * __libc_dlauxinfo to receive SIGSEGV.
+ * The call from crt0 that sets __ps_strings (using the hook
+ * immediately above) always happens before the explicit call of
+ * _libc_init from crt0, but the ordering of it relative to the
+ * /first/ call to _libc_init also depends on whether the program is
+ * statically linked. Therefore, any apparatus that depends on the
+ * __ps_strings value should be initialized above rather than in
+ * _libc_init.
  */
 void __section(".text.startup")
 _libc_init(void)
@@ -107,10 +133,6 @@
 
 	libc_initialised = 1;
 
-	if (__ps_strings != NULL)
-		__libc_dlauxinfo = __ps_strings->ps_argvstr +
-		    __ps_strings->ps_nargvstr + __ps_strings->ps_nenvstr + 2;
-
 	/* For -fstack-protector */
 	__guard_setup();
 
diff -r f22616a0944a lib/libc/stdlib/_env.c
--- a/lib/libc/stdlib/_env.c	Sat Nov 26 19:32:27 2016 -0500
+++ b/lib/libc/stdlib/_env.c	Sat Nov 26 22:12:15 2016 -0500
@@ -44,6 +44,7 @@
 #include <stdlib.h>
 #include <stddef.h>
 #include <string.h>
+#include <unistd.h> /* for __assign_environ proto */
 
 #include "env.h"
 #include "local.h"
@@ -404,3 +405,10 @@
 {
 	rb_tree_init(&env_tree, &env_tree_ops);
 }
+
+/* Initialize the value of "environ" itself; called from crt0 */
+void __section(".text.startup")
+__assign_environ(char **env)
+{
+	environ = env;
+}



-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index