NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/51654 (wrong ps_strings breaks emacs20)

The following reply was made to PR kern/51654; it has been noted by GNATS.

From: David Holland <>
Subject: Re: kern/51654 (wrong ps_strings breaks emacs20)
Date: Sun, 27 Nov 2016 00:26:17 +0000

 On Sat, Nov 26, 2016 at 09:15:01AM +0000, Martin Husemann wrote:
  >  The only way I see we could provide backward compat is to mark binaries
  >  like emacs with a special "fixed VA layout" note including a version
  >  number (similar to the compiler memory model notes used on sparc64)
  >  and then exec them with special compat VA layout.
 That or back out the stack change that broke them all.
 Anyway, we figured out what's going on:
  (1) _libc_init is called twice, but the comment on it (since updated)
      was wrong. It is called from crt0 explicitly and from global
      constructor handling, but the order of these is inverted for
      static binaries vs. dynamically linked binaries. In a dynamically
      linked binary the global constructors for libc are called from
      the dynamic linker before crt0 gets control at all. (This is why
      a broken emacs cores before __start.)
  (2) Therefore, in a dynamic executable, _libc_init is reached before
      __ps_strings is initialized by crt0. Ordinarily, then it will be
      null and the logic that sets __libc_dlauxinfo will be skipped.
      This is fine because __libc_dlauxinfo is only used in static
      executables; it's there to help substitute for not having the
      dynamic linker image.
  (3) However, when emacs dumps it saves the static data of crt0, which
      is part of the linked program, but not the static data of libc,
      which isn't.
  (4) Therefore when starting the dumped emacs, libc is not initialized
      yet, but the value from the previous startup is still sitting in
      __ps_strings. Then _libc_init sees that it's not null and tries
      to set __libc_dlauxinfo from it. If the stack location is not the
      same as it was for the previous execution, the old __ps_strings
      can point off the end of the stack and this results in SIGSEGV.
 Thanks to Joerg for providing necessary explanations.
 Disabling ASLR for both the original temacs binary and the dumped
 emacs binary works around the problem (this fix has been put in place
 for emacs20/21 and I think later ones do it on their own) but it isn't
 particularly desirable. (While emacs will never run as a pie, it
 should still be possible to randomize the stack and library mappings.
 temacs itself runs fine with ASLR enabled.)
 There are several possible real fixes, but they all have problems:
  (a) Have emacs explicitly zero __ps_strings before dumping. This
      will work, but exposes the dirty laundry and makes it that much
      harder to tidy the internals up later. However, it works
      transparently for all or nearly all NetBSD versions going a long
      way back.
  (b) Give emacs some otherwise-private function to call before
      dumping, something like __crt_reset(). On the plus side, this
      doesn't expose the details, but on the minus side it would
      require configure tests and other fussing to use properly.
  (c) Add another variable to crt0 for libc to use in a way that allows
      it to tell that the __ps_strings value is bogus. E.g. add "int
      __initcount" to crt0 and have _libc_init increment it; then if
      on entry to _libc_init it's > 0 we know we're running after
      dumping and shouldn't trust __ps_strings. This has the advantage
      of not being exposed, but it's messy and adds more crud to crt0
      which we don't want.
  (d) Misuse #ifdef __PIC__ or similar to detect at compile time in
      _libc_init whether the current execution is statically linked.
      (This is abusive, and also, a binary statically linked against
      libc_pic.a wouldn't work properly.)
  (e) Come up with some other way to distinguish being dynamically or
      statically linked in _libc_init, either at compile time or
      runtime. The problem is: what/how? I don't have any bright ideas,
      but if someone else does, speak up: the advantage of this is that
      (unlike the other alternatives) it can fix the crash for already-
      existing Emacs binaries.
  (f) Kick __ps_strings (and also __environ and __progname) out of
      crt0.o entirely, and instead provide private entry points in libc
      for crt0 to assign them. This has a number of tidiness advantages
      going forward, but Joerg is worried that it might generate compat
 David A. Holland

Home | Main Index | Thread Index | Old Index