NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

shared libraries vs. static binaries (Was: Postfix and local mail delivery - still relevant in 2020?)



At Mon, 8 Jun 2020 08:58:51 +0100, Sad Clouds <cryintothebluesky%gmail.com@localhost> wrote:
Subject: Re: Postfix and local mail delivery - still relevant in 2020?
>
> On Sun, 07 Jun 2020 15:12:56 -0700
> "Greg A. Woods" <woods%planix.com@localhost> wrote:
>
> > However when you put _all_ the code for _all_ the system's programs
> > into one single lone binary, with no shared libraries, then _all_
> > text pages are shared entirely for _all_ processes all of the time,
> > no matter what program they are running as.
>
> OK thanks for the explanation. So as long as it is a single executable,
> text pages are shared, but if you create multiple executables on disk,
> i.e. different inodes, then you do get duplicate text pages loaded into
> RAM?

Yes, though "duplicate text pages" is a misnomer -- there will be some
fragments of code duplicated into text pages in each binary, and each of
those pages from each binary will have to be separately paged in when a
process starts from each binary.  So there will be duplicate code in
separate pages in memory when you have two different programs running,
but each uses, say, some functions from libc.  (but only one copy per
process for each given program)

In my personal experience the tradeoff of having all static-linked but
separate binaries per program is still well worthwhile (provided you're
not using an extremely RAM-starved system).

The disk use is probably the biggest loss.  ~2GB for i386 install
(without x11, another 600MB for x11).  Link time and time to build sets
and their checksums is also longer, but only developers do that every
day.  :-)  There's currently still also a fair chunk of local diffs in
my source tree to make it all work, and that requires merging on every
update, but for now that's just my own personal overhead.

However because the static linker is very efficient when given the kinds
of libraries NetBSD has (where each function is in its own compilation
unit and so there's usually only one function per .o file), each program
contains only just the code for the functions it needs, and also because
demand paging for executables is also a wondrous idea that means each
running process can have a relatively small resident set size of text
pages in memory.  (It could be even more efficient if less-used code
were put into separate pages, but that's a tricky proposition -- though
I've heard of experiments with re-adjusting page layout after having
collected stats from typical use of the system.)

So, for example, that little Soekris i386 machine with only 512MB of RAM
running a fully static linked individual binary install still has only
3% of its memory (4400 pages) in use for executable code pages once it's
running a normal multi-user boot and with a couple of shell sessions
open from locally running xterm processes (open to the Xserver on my
desktop machine).

Meanwhile because nothing needs to run ld.so for every process exec, the
startup time of commonly used processes (i.e. ones that can be cached in
cached executable pages) is incredibly fast and snappy, especially for a
system with a 500MHz clock.  It "feels" faster for command-line use than
a machine nearly 10 times as fast that's running all dynamic-linked
binaries (and don't even get me started on the brain damage of paging
that happens in an even more aggressively dynamic-linked OS like macOS
when you exec or re-exec a big program that uses several libraries and
frameworks, even with "prelinking", e.g. a modern browser).  Starting
static-linked X11 processes like xterm is incredibly fast, as they
typically need even more libraries than, for example, the postfix
programs.  Long long ago I posted some raw performance tests, taken on
an old Sun3 I think, for an X program that didn't have to actually open
a connection and could be run in a shell loop for timing, but still
needed a half dozen libraries or more.  If I remember right the loop
running the static-linked version was nearly two orders of magnitude
faster.

The real fun begins with a bigger more modern server, e.g. my Dell
systems with 32GB of RAM and 3,160MHz CPUs.  Now the savings from not
having to run ld.so for every process exec is amazing, and while one
might lose a few percent of that vast memory to exec pages, the tradeoff
is still very good.  A static-linked tool-chain still means faster
builds (despite the longer link and sets times)!

--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpCW2wolH907.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index