tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Proposal: getexecpath(3)



PROPOSAL: New libc function

	const char *getexecpath(void);

returns the pathname that was passed to execve(2), unmodified.

Thoughts?


DETAILS:

This will be implemented by a new ELF auxiliary vector entry
AT_EXECPATH (or maybe AT_NETBSD_EXECPATH).  Programs can use this to
get files relative to exactly the access path to the executable that
the caller used.  Call it ${execpath} in the following expressions for
comparison.

This path may not be absolute.  Users who want an absolute path can do
roughly:

	$(pwd)/${execpath}

Users who want some notion of canonical absolute path can do roughly:

	$(cd $(dirname ${execpath}) && pwd)/$(basename ${execpath})

or use readlink(1), realpath(3), &c.

For programs run with fexecve(2), the answer is NULL because there is
no well-defined reliable answer.


STATUS QUO:

We have several methods to get something similar, but nothing exactly
the same, and none of them is defined independently of concurrent file
system activity:

- AT_SUN_EXECNAME is currently passed as $(pwd)/${execpath}.
  => From 2007 to 2015, AT_SUN_EXECNAME was ${execpath} if absolute
     and omitted altogether otherwise.
  => Since what I propose is currently always a suffix of
     AT_SUN_EXECNAME, we can simply add the new auxiliary vector entry
     as a pointer into the same buffer.

- /proc/self/exe is a symlink to whatever $(pwd)/${execpath} resolved
  to at exec time.

- /proc/curproc/file is the executable vnode itself (as if it were
  hard-linked there).

- sysctl {CTL_KERN, KERN_PROC_ARGS, -1, KERN_PROC_PATHNAME} gives what
  $(pwd)/${execpath} resolved to at exec time.

- For programs run with fexecve, all of these instead return what

	$(cd $(dirname ${execpath}) && pwd)/$(basename ${execpath})

  resolved to at exec time using vnode_to_path (with the caveat that
  namecache eviction may lead this to fail altogether).

Of these methods, there is no reliable way to recover exactly the
original path that was given to exec, because a program given
/foo/bar/baz can't distinguish whether $(pwd) was /foo and ${execpath}
was bar/baz or $(pwd) was /foo/bar and ${execpath} was baz.

In contrast, with procfs mounted, it is possible to recover the
vnode_to_path method even for programs without fexecve: open
/proc/self/file and fcntl F_GETPATH.  And, with just getexecpath(), it
is always possible to recover the $(pwd)/${execpath} currently passed
as AT_SUN_EXECNAME, by simply prepending getcwd() output (and without
adding new races, either).


OTHER SYSTEMS:

Other operating systems also have similar but slightly different
methods -- and I would guess that they can all fail to give any answer
at all in some cases of fexecve:

- FreeBSD's AT_EXECPATH is ${execpath} verbatim if it is absolute, or
  roughly what

	$(cd $(dirname ${execpath}) && pwd)/$(basename ${execpath})

  resolves to at _exec_ time if it is relative.

  . FreeBSD ELF AT_EXECPATH:
    https://cgit.freebsd.org/src/tree/sys/kern/imgact_elf.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n1467
  . imgp->execpathp initialization:
    https://cgit.freebsd.org/src/tree/sys/kern/kern_exec.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n1700
  . imgp->execpath initialization:
    https://cgit.freebsd.org/src/tree/sys/kern/kern_exec.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n501

- FreeBSD's sysctl {CTL_KERN, KERN_PROC_ARGS, KERN_PROC_PATHNAME, -1},
  /proc/self/exe, and /proc/self/file all give roughly what

	$(cd $(dirname ${execpath}) && pwd)/$(basename ${execpath})

  resolves to at _query_ time, rather than at exec time --
  specifically, the directory is resolved at exec time and its vnode
  is persistently stored in the struct proc, but the pwd is resolved
  at query time.  (If the directory or file has been deleted something
  else happens.)

  I have seen applications explicitly prefer the AT_EXECPATH semantics
  (passing absolute paths through verbatim) because the sysctl and
  /proc semantics `may not return the desired path if there are
  multiple hardlinks to the file'.

  Note that /proc/self/file is _not_ a `hard link' to the executable
  file -- it has the same semantics as /proc/self/exe.

  Note that the MIB ordering is different from NetBSD.  (Yes, I've
  found this bug in pkgsrc patches that were evidently not tested!)

  . sysctl kern.proc_args.pathname:
    https://cgit.freebsd.org/src/tree/sys/kern/kern_proc.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n3389
    https://cgit.freebsd.org/src/tree/sys/kern/kern_proc.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n2325

  . /proc/*/exe:
    https://cgit.freebsd.org/src/tree/sys/fs/procfs/procfs.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n193
    https://cgit.freebsd.org/src/tree/sys/fs/procfs/procfs.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n74

  . proc_get_binpath:
    https://cgit.freebsd.org/src/tree/sys/kern/kern_proc.c?id=35a2229b67914ff1a4bae6334ad5015aa603967a#n2254

- I think Linux /proc/self/exe has the same semantics as FreeBSD
  /proc/self/exe but the code is unclear and I got bored of chasing it
  or experimenting.

- Solaris's AT_SUN_EXECNAME and getexecname() is ${execpath} but with
  intermediate ./ and ../ components simplified.  Not necessarily
  absolute.  I don't think any symlinks are resolved, just
  intermediate ./ and ../ components, but I'm not sure about symlinks.

  . Oracle documentation:
    https://docs.oracle.com/cd/E36784_01/html/E36874/getexecname-3c.html

  . illumos source reference:
    o lookuppn simplifies intermediate ./ and ../ components: https://github.com/illumos/illumos-gate/blob/d3fbc1f35b71e399da966ef9ed66f66762d4afba/usr/src/uts/common/fs/lookup.c#L504-L547
    o Resolved path is copied to args->pathname in exec: https://github.com/illumos/illumos-gate/blob/d3fbc1f35b71e399da966ef9ed66f66762d4afba/usr/src/uts/common/os/exec.c#L357
    o args->pathname is fed into AT_SUN_EXECNAME in exec: https://github.com/illumos/illumos-gate/blob/d3fbc1f35b71e399da966ef9ed66f66762d4afba/usr/src/uts/common/os/exec.c#L1750-L1765

- macOS's _NSGetExecutablePath gives something that I'm not sure is
  guaranteed to be absolute, but it is documented _not_ to resolve
  symlinks (and I'm guessing may not resolve intermediate ./ or ../
  components either):

  https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/dyld.3.html

  I also got bored trying to chase through the code at
  <https://github.com/apple-opensource/dyld> after this route:

  . _NSGetExecutablePath:
    https://github.com/apple-opensource/dyld/blob/e3f88907bebb8421f50f0943595f6874de70ebe0/dyld3/APIs.cpp#L679-L698
  . AllImages::imagePath(const closure::Image *):
    https://github.com/apple-opensource/dyld/blob/e3f88907bebb8421f50f0943595f6874de70ebe0/dyld3/AllImages.cpp#L987-L997
  . Image::path():
    https://github.com/apple-opensource/dyld/blob/e3f88907bebb8421f50f0943595f6874de70ebe0/dyld3/Closure.cpp#L209-L215
  . pathWithHash, looks like ELF auxv equivalent but I don't know
    where that gets passed in, nothing obvious turned up in a quick
    search of <https://github.com/apple-opensource/xnu>:
    https://github.com/apple-opensource/dyld/blob/e3f88907bebb8421f50f0943595f6874de70ebe0/dyld3/Closure.h#L81


POSTSCRIPT:

This came up while I was investigating why lang/racket stopped
building on NetBSD (which turned out to be because it was trying to
resolve /proc/curproc/file as if it were a symlink, and then trying to
open its own data files relative to that -- under /proc/curproc):

https://github.com/racket/racket/issues/5122

The investigation led me to file a PR (still open) for disagreement
between static executables and dynamic executables over what the main
object name should be according to dl_iterate_phdr, which led me to
find that FreeBSD's /proc/self/exe is slightly different from ours,
and so on:

PR lib/58865: static and dynamic dl_iterate_phdr disagree on main
object name (https://gnats.NetBSD.org/58865)

Maybe we should also record the directory vnode of each process's
executable so a variant of the vnode_to_path logic can be made to work
without relying on the namecache, like FreeBSD does for its semantics.
But it's not clear to me that some notion of canonical absolute path
is the right thing; I think the verbatim access path used by the
execve(2) caller is more likely to be useful, easier to understand,
and clearer to define.


Home | Main Index | Thread Index | Old Index