tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Trivial program size inflation



    Date:        Sun, 2 Jul 2023 15:51:06 -0400 (EDT)
    From:        Mouse <mouse%Rodents-Montreal.ORG@localhost>
    Message-ID:  <202307021951.PAA07056%Stone.Rodents-Montreal.ORG@localhost>

  | For example, a program that calls printf but never uses any
  | floating-point values at all will not, in theory, need floating point
  | support.  But we do not have any mechanism by which anything can
  | discover that no floating-point printf formats are used and thus bring
  | in a printf variant that doesn't actually support floating point; this
  | means that a bunch of floating-point stuff will be brought in even
  | though it will never actually get used.

First, a different printf that doesn't support floats isn't needed,
printf (itself) has essentially no knowledge of anything related to
floats.

When everything used to be static (ie: back in my time...) a lot of
effort was expended making small programs stay small (both RAM for the
executing binary, and disc space for the executable file, were scarce
resources) by careful crafting of what was in libc.a and the order it
all appeared.   Keeping that correct took much work, and it was very easy
to end up with multiple symbol definition errors from linking an innocuous
(and correct) program that just happened to be slightly different than
had been expected.

The issue above was solved by having dummy versions of the floating point
to string conversion routines (which did nothing, and so were very small).
The compiler helped, by inserting a reference to a well known symbol, if
the program being compiled contained any float or double references.
The real floating conversion routines defined that symbol, the dummy ones
did not.   libc was constructed (as far as is relevant here) with the
real conversion routines first, then printf, then the dummy conversion
routines following.

If the program used any floating point then the compiler inserted undefined
reference to the magic symbol would cause the real conversion routines to
be linked (as they would be if explicitly called by the program, but that's
very unlikely).   Then printf would be linked (we're assuming the program
uses printf, or this issue isn't relevant).   If the floating point conversion
routines were already linked, they satisfy the undefined symbols in the printf
object file(s).   If they weren't, those remained undefined until the dummy
routines were encountered, later in libc, at which point they'd be loaded.
Since we know the program isn't using floating point to get to that point,
they'd never be called.

Note that this isn't quite "discover that no floating-point printf formats
are used" - there was never an attempt to do that, but if the program
does

	printf("%f", x);

what is 'x' in a valid program?   What can it be that the compiler would
not know that the program is using floating point?

Even "*(double *)&long_var" is enough for floats to be considered used.

If you manage to call printf with a floating format, and pass it something
that the compiler does not believe is, or is to be treated as, any kind
of floating point data (even if it happens to be) and the program uses no
floats elsewhere, anywhere, then you loose...   Trivial to fix, you just
declare some float variable, somewhere.

I don't know if current compilers provide this kind of assistance, or not,
but they could.

Similar, but different case specific, work can be done to handle all of
the other (largish) systems ... eg: when a program exits, exit() or something
it calls, needs to make sure all stdio buffers are flushed (typically
by doing fclose() on each of them, but the close part isn't as important,
the exit sys call accomplishes that - but that cannot ensure than unwritten
buffered data has been flushed to files first).   That means that you get
large chunks of stdio linked, even if your program doesn't include <stdio.h>
or use any of it (and since stdio uses malloc() you get that as well).
You can attempt to avoid this by calling _exit() instead of exit(), but
as "falling off the end of main" is defined as a call of exit(0), the
run time support doesn't know that exit() won't be needed, and links it
anyway (even if the compiler knows the program will never simply fall
off the end of main()).

With enough work that can be handled as well.

And then on to the next problem ...   and the next ...

Since in practice, almost no-one uses static linking for almost anything
any more (except via crunchgen for /rescue, which has so much linked in
that the whole of libc is a drop in the bucket, and most of it is needed,
by something, anyway) there aren't many people willing to attempt to
manage all of this, and keep it working.    Believe me, while possible,
it isn't easy - and the smallest changes in the oddest of places can
require a lot of work, and playing around, to keep it all working properly.

For some of this you need linker/binary format support so the library
can have routines which define symbols, which resolve references in the
program if that routine is linked - but for which the presence of the
symbol is not advertised, so the routine will not be linked just because
the symbol is unreferenced and it is defined in that routine - something
else needs to cause the routine to be linked first.

Personally, I don't see any point, and I know I won't be working on that
kind of thing, ever again - had enough of that, back when it really mattered,
long long ago.   If you link static binaries, without doing everything needed
to avoid it, and if no-one has done the work to make the static libc able to
handle all of this, and you're not willing to mangle your source code with
all the dummy routines that others have been suggesting, so that the libc
versions never let linked, then you're going to get big binaries.  Live
with it, or do the work to fix it yourself.

kre



Home | Main Index | Thread Index | Old Index