Subject: Re: shared library support
To: None <tech-kern@NetBSD.ORG>
From: None <jiho@postal.c-zone.net>
List: tech-kern
Date: 03/17/1998 11:26:38
>> One thing I do know is that with the Mach vm system, shared libraries
>> don't share code pages properly.  I've tested things pretty thoroughly, 
>> and that conclusion sifts out consistently.  Code pages from core process 
>> files are properly shared; those from shared library files are not.  You 
>> are literally better off to run a completely statically linked system, 
>> than to use any shared libraries (especially with X).  
>
> hmm, well, i haven't observed anything along these lines.   if you look
> at how both static and dynamic binaries are run you'll note that it all
> goes through the same memory-mapping mechanism.  for the static part of
> a demand-paged binary, we are routed through vmcmd_map_pagedvn() in
> kern/exec_subr.c to vm_mmap().   for programs that use shared libs, 
> ld.so uses mmap(2) to map the libs in.   the code path on that would
> be sys_mmap() -> vm_mmap() [all in vm/vm_mmap.c].    so, both static
> and dynamic code-related memory mappings boil down to vm_mmap() calls.

Exactly, and you can even make sure that both code paths apply the same vm
features (i.e., shared vs. private vs. copy-on-write).  There are basically
three points that distinguish the two cases:

  1. In the exec (core process) case, the proc structure gets a record of the
     vnode of the core process file on disk.  But nothing I know of pays any
     attention to this in the vm system, nor does it seem to affect related
     matters like vnode lookup.

  2. That core process vnode is flagged as VTEXT.  This is used in the open()
     call to prevent write access to an active demand-paged process file.  But
     I found nothing in the vm system that looks at this, and forcing shared
     libraries to get flagged VTEXT makes no difference.

  3. The core process file gets mapped at the beginning of user vm space, while
     shared library files get mapped opportunistically in the user mmap area,
     which starts atop the process heap limit.  This looks most promising to me
     as a clue to the cause, but I've never found anything in the vm code that
     would appear to behave differently as a result of this distinction.

So I'm stumped.


> what sort of tests did you run when looking into this?

There are two, one from a Linux user and one NetBSD-specific from myself.

The Linux user reported on one of their news groups that with a large Motif
client under X, he was able to start more instances statically linked than with
shared libraries.  He pointed out that with HP-UX the reverse was true, and
indeed, when things are working properly it always should be so.  That's simple
common sense.  Aside from any kernel vm similarities, the one thing Linux and
NetBSD have in common is the GNU compiler suite.

My own test uses the tiny program included at the end of this message.  If the
preceding hint from Linux didn't convince you, perform the following steps:

  1. Compile two versions of the program, one static, one shared.  To compile
     the static case use the following command:

       gcc -O2 -static -nostartfiles -o <case1> /usr/lib/scrt0.o <source>.c

     By default crt.o is used, which is the shared library startup file and
     causes useless code to get built in.  This is a tightly controlled
     experiment, here.  The shared case is simply:

       gcc -O2 -o <case2> <source>.c

  2. Reboot the system, and open a second VT (you do have virtual terminal
     support compiled into your kernel).  Start 'systat vmstat', and note down
     the initial vm statistics.  Pay special attention to the free page count,
     the active page count, the wired page count, and the total page count.

  3. Switch back to the first VT, and start a & instance of <case1>.  Switch to
     the second VT, wait for the numbers to stabilize, and note the same set of
     page counts.

  4. Repeat step 3 several times.

  5. Start over from step 2, but substitute <case2>.

  6. Evaluate the page count behavior for the two cases.  After factoring out
     the kernel process management overhead (wired pages, some 2 dozen), you
     will discover that each new <case1> instance caused 1 stack page to move
     from the free list to the active list, while each new <case2> instance
     caused many additional pages (a few dozen) to move from the free list to
     the active list.

I welcome serious argument, as I really want to solve this puzzle.

Here follows the test program.


-------CUT HERE---------------------------------------CUT HERE----------------- 

#include <sys/types.h>
#include <unistd.h>


#define NONE   0
#define TRUE   1


int main(int argc, char *argv[])
  {
  pid_t pid=getpid();

  while(TRUE)
    { sleep(5); }

  return(NONE);
  }

---------CUT HERE-------------------------------------CUT HERE-----------------


--Jim Howard  <jiho@mail.c-zone.net>


----------------------------------
E-Mail: jiho@mail.c-zone.net
Date: 17-Mar-98
Time: 11:26:38

This message was sent by XFMail
----------------------------------