[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/51654 (wrong ps_strings breaks emacs20)
The following reply was made to PR kern/51654; it has been noted by GNATS.
From: David Holland <dholland-bugs%netbsd.org@localhost>
Subject: Re: kern/51654 (wrong ps_strings breaks emacs20)
Date: Sun, 27 Nov 2016 00:26:17 +0000
On Sat, Nov 26, 2016 at 09:15:01AM +0000, Martin Husemann wrote:
> The only way I see we could provide backward compat is to mark binaries
> like emacs with a special "fixed VA layout" note including a version
> number (similar to the compiler memory model notes used on sparc64)
> and then exec them with special compat VA layout.
That or back out the stack change that broke them all.
Anyway, we figured out what's going on:
(1) _libc_init is called twice, but the comment on it (since updated)
was wrong. It is called from crt0 explicitly and from global
constructor handling, but the order of these is inverted for
static binaries vs. dynamically linked binaries. In a dynamically
linked binary the global constructors for libc are called from
the dynamic linker before crt0 gets control at all. (This is why
a broken emacs cores before __start.)
(2) Therefore, in a dynamic executable, _libc_init is reached before
__ps_strings is initialized by crt0. Ordinarily, then it will be
null and the logic that sets __libc_dlauxinfo will be skipped.
This is fine because __libc_dlauxinfo is only used in static
executables; it's there to help substitute for not having the
dynamic linker image.
(3) However, when emacs dumps it saves the static data of crt0, which
is part of the linked program, but not the static data of libc,
(4) Therefore when starting the dumped emacs, libc is not initialized
yet, but the value from the previous startup is still sitting in
__ps_strings. Then _libc_init sees that it's not null and tries
to set __libc_dlauxinfo from it. If the stack location is not the
same as it was for the previous execution, the old __ps_strings
can point off the end of the stack and this results in SIGSEGV.
Thanks to Joerg for providing necessary explanations.
Disabling ASLR for both the original temacs binary and the dumped
emacs binary works around the problem (this fix has been put in place
for emacs20/21 and I think later ones do it on their own) but it isn't
particularly desirable. (While emacs will never run as a pie, it
should still be possible to randomize the stack and library mappings.
temacs itself runs fine with ASLR enabled.)
There are several possible real fixes, but they all have problems:
(a) Have emacs explicitly zero __ps_strings before dumping. This
will work, but exposes the dirty laundry and makes it that much
harder to tidy the internals up later. However, it works
transparently for all or nearly all NetBSD versions going a long
(b) Give emacs some otherwise-private function to call before
dumping, something like __crt_reset(). On the plus side, this
doesn't expose the details, but on the minus side it would
require configure tests and other fussing to use properly.
(c) Add another variable to crt0 for libc to use in a way that allows
it to tell that the __ps_strings value is bogus. E.g. add "int
__initcount" to crt0 and have _libc_init increment it; then if
on entry to _libc_init it's > 0 we know we're running after
dumping and shouldn't trust __ps_strings. This has the advantage
of not being exposed, but it's messy and adds more crud to crt0
which we don't want.
(d) Misuse #ifdef __PIC__ or similar to detect at compile time in
_libc_init whether the current execution is statically linked.
(This is abusive, and also, a binary statically linked against
libc_pic.a wouldn't work properly.)
(e) Come up with some other way to distinguish being dynamically or
statically linked in _libc_init, either at compile time or
runtime. The problem is: what/how? I don't have any bright ideas,
but if someone else does, speak up: the advantage of this is that
(unlike the other alternatives) it can fix the crash for already-
existing Emacs binaries.
(f) Kick __ps_strings (and also __environ and __progname) out of
crt0.o entirely, and instead provide private entry points in libc
for crt0 to assign them. This has a number of tidiness advantages
going forward, but Joerg is worried that it might generate compat
David A. Holland
Main Index |
Thread Index |