tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Trivial program size inflation

> Date: Sat, 1 Jul 2023 15:11:56 -0000 (UTC)
> From: (Michael van Elst)
> crt0 pulls in
> - atexit
> - environment
> - static TLS
> - stack guard
> which all more or less pull in jemalloc, stdio and string functions.
> You need to replace these with dummies (compile with -fno-common)
> and of course, your program must not make use of the functionality...

A quicker way to address most of it is to just define your own malloc:

$ cat null.o
#include <stddef.h>
void *malloc(size_t n) { return NULL; }
void *realloc(void *p, size_t n) { return NULL; }
void *calloc(size_t n, size_t sz) { return NULL; }
void free(void *p) {}
int main(void) { return 0; }
$ cc -g -O2 -static -o null null.c
$ size null
   text	   data	    bss	    dec	    hex	filename
  26724	   3208	   3184	  33116	   815c	null

This still has printf, rbtree, string, atomic, &c., but not jemalloc,
giving a ~20x size reduction from half a megabyte to 25 KB or so.

If someone really wants to do the work to reduce the overhead without
providing an alternative malloc, or reduce more than you get with an
alternative malloc, here are some avenues that might be worth pursuing
without incurring too much overhead:

> int atexit(void) { return 0; };

The runtime startup logic, csu, relies on atexit.  But perhaps csu
could use an internal __atexit that reserves 4 or 5 static slots, and
the libc atexit uses the last one to call handlers in slots that are
dynamically allocated by malloc.

As long as your program doesn't call atexit, this only uses a fixed
amount of space from csu and won't bring in malloc.

> char *__allocenvvar() { return 0; };
> bool __canoverwriteenvvar() { return true; };
> size_t __envvarnamelen() { return 0; };
> void *__findenv() { return 0; };
> void *__findenvvar() { return 0; };
> void __freeenvvar() { };
> ssize_t __getenvslot() { return 0; };
> void __libc_env_init() { };
> bool __readlockenv() { return true; };
> bool __unlockenv() { return true; };
> bool __writelockenv() { return false; };

Programs that use only getenv don't need any of the machinery to
allocate environment slots.  The logic that getenv uses could be
isolated to its own .c file with no allocation.

This more or less requires splitting up __getenvslot into two separate
functions, one for the allocate=true case and the other for the
allocate=false case, with a .h file to mediate the global state
between the two .c files.

__libc_env_init (which is what pulls all this in even if you don't use
getenv, setenv, &c.) could perhaps be a weak symbol with a strong
alias in the .c file that does allocation and modification.

> void __libc_rtld_tls_allocate() { };
> void __libc_rtld_tls_free() { };
> void __libc_static_tls_setup() { };
> void __libc_tls_get_addr() { };

I'm stumped about this one.  In principle the linker has enough
information to decide whether __libc_static_tls_setup is needed (which
is what, in _libc_init, pulls all this in), but in practice I don't
know of any path that would let us conditionalize its use on whether
the object has any static TLS relocations.  Maybe rtld could be
responsible for mmapping the initial thread's static TLS space so libc
is not but I'm not sure if that will work without a lot of effort.

> void __chk_fail() { };
> void __guard_setup() { };
> void __stack_chk_fail() { };
> void __stack_chk_fail_local() { };
> int __stack_chk_guard;

This calls syslog_ss, which brings in xsyslog.c.  Not sure if that
brings in malloc or anything exciting beyond vsnprintf_ss (which
itself shouldn't malloc or be exciting, since it has to be

But if it does, maybe the call in stack_protector.c __fail to
syslog_ss could be defined in terms of some weak symbol
__stack_chk_log which is defined by xsyslog.c using syslog machinery,
with a fallback to write to STDERR_FILENO; that way it only even tries
to use syslog if anything else in the program already uses syslog.

(But I'm not going to do this work, and I'm not sure if there's going
to be a good way to kick malloc out of the static TLS business without
toolchain and/or rtld support.)

Home | Main Index | Thread Index | Old Index