Subject: Re: shared library support
To: None <jiho@postal.c-zone.net>
From: Chuck Cranor <chuck@dworkin.wustl.edu>
List: tech-kern
Date: 03/18/1998 11:06:22
Jim Howard <jiho@mail.c-zone.net>:
>So I'm stumped.
well, let's sort out exactly what is going on based on your test program.
first, rather than using "systat" or "vmstat" let's use the OS source
code instead so we can find out _exactly_ who is mapping what. using
an i386 UVM+PMAP_NEW kernel, i've added a function:
void pmap_dump(struct pmap *pmap, vm_offset_t start_va, vm_offset_t end_va)
this function dumps out all active user-level mappings in a specified
range of a pmap. if "start_va == end_va" it dumps out all user-level
mappings.
> 1. Compile two versions of the program, one static, one shared.
> gcc -O2 -static -nostartfiles -o <case1> /usr/lib/scrt0.o <source>.c
> gcc -O2 -o <case2> <source>.c
ok:
# size c.stat c.dyn
text data bss dec hex
4096 4096 0 8192 2000 c.stat
4096 4096 0 8192 2000 c.dyn
#
each program has one page of text, one page of data, no BSS, and
most likely one page of stack.
> 2. Reboot the system, and open a second VT (you do have virtual terminal
> support compiled into your kernel). Start 'systat vmstat', and note down
> the initial vm statistics. Pay special attention to the free page count,
> the active page count, the wired page count, and the total page count.
ok, but rather than systat, let us use ddb and pmap_dump():
# ./c.stat &
[1] 20
#
Stopped at _Debugger+0x4: leave
db> ps/a
PID COMMAND STRUCT PROC * UAREA * VMSPACE/VM_MAP
20 c.stat 0xf0416200 0xf2f23000 0xf0420300
<<rest of processes deleted>>
so we know the process' vm_map is at 0xf0420300 and we can dump its
current mappings:
db> show map/f 0xf0420300
MAP 0xf0420300: [0x0->0xeffbf000]
pmap=0xf03fa680, #ent=5, sz=33562624, ref=1, main=T, version=10
- 0xf041fb00: 0x1000->0x2000: obj=0xf041e600/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=5/7, inh=1, wc=0, adv=0
- 0xf041f180: 0x2000->0x3000: obj=0xf041e600/0x1000, amap=0xf041f400/0
map=F, submap=F, cow=T, nc=F, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf0417280: 0xedbfe000->0xef9fe000: obj=0x0/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=0/7, inh=1, wc=0, adv=0
- 0xf041fec0: 0xef9fe000->0xefbf0000: obj=0x0/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf041f7c0: 0xefbf0000->0xefbfe000: obj=0x0/0x0, amap=0xf041ff80/0
map=F, submap=F, cow=T, nc=F, prot(max)=7/7, inh=1, wc=0, adv=0
note that the process has 5 mappings. they are:
[1] text (read-only/copy-on-write)
[2] data (read-write/copy-on-write)
[3] stack reserved area (prot = 0) [in case we grow ulimit]
[4] stack reserved area (prot = r/w)
[5] currently allocated stack
now let's see what is in the process' pmap mappings:
db> call pmap_dump(0xf03fa680,0,0)
va 0x1000 -> pa 0x89c000 (pte=0x89c425)
va 0x2000 -> pa 0x89f000 (pte=0x89f467)
va 0xefbfd000 -> pa 0x89a000 (pte=0x89a467)
0x29f000
there you've got a page of text, a page of data, and a page of stack.
note that this doesn't count the pages of memory currently being used
as page tables for the process.
> 3. Switch back to the first VT, and start a & instance of <case1>. Switch to
> the second VT, wait for the numbers to stabilize, and note the same set of
> page counts.
# jobs
[1] + Running ./c.stat
# ./c.stat &
[2] 21
#
Stopped at _Debugger+0x4: leave
db> ps/a
PID COMMAND STRUCT PROC * UAREA * VMSPACE/VM_MAP
21 c.stat 0xf0416800 0xf2f26000 0xf041ec00
20 c.stat 0xf0416200 0xf2f23000 0xf0420300
<< etc ... >>
db> show map/f 0xf041ec00
MAP 0xf041ec00: [0x0->0xeffbf000]
pmap=0xf041ffc0, #ent=5, sz=33562624, ref=1, main=T, version=10
- 0xf041f340: 0x1000->0x2000: obj=0xf041e600/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=5/7, inh=1, wc=0, adv=0
- 0xf041f600: 0x2000->0x3000: obj=0xf041e600/0x1000, amap=0xf041fd80/0
map=F, submap=F, cow=T, nc=F, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf041f980: 0xedbfe000->0xef9fe000: obj=0x0/0x0, amap=0x0/-264631984
map=F, submap=F, cow=T, nc=T, prot(max)=0/7, inh=1, wc=0, adv=0
- 0xf041f280: 0xef9fe000->0xefbf0000: obj=0x0/0x0, amap=0x0/5
map=F, submap=F, cow=T, nc=T, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf041f4c0: 0xefbf0000->0xefbfe000: obj=0x0/0x0, amap=0xf041f480/0
map=F, submap=F, cow=T, nc=F, prot(max)=7/7, inh=1, wc=0, adv=0
db> call pmap_dump(0xf041ffc0,0,0)
va 0x1000 -> pa 0x89c000 (pte=0x89c425)
va 0x2000 -> pa 0x8c8000 (pte=0x8c8467)
va 0xefbfd000 -> pa 0x8c5000 (pte=0x8c5467)
0x8a1000
db> c
note that the only shared page in the static case is the text.
> 5. Start over from step 2, but substitute <case2>.
now we can repeat for the dynamic case:
# ldd c.dyn
c.dyn:
-lc.12 => /usr/lib/libc.so.12.20 (0x4001a000)
# ./c.dyn &
[1] 22
# ~Stopped at _Debugger+0x4: leave
db> ps/a
PID COMMAND STRUCT PROC * UAREA * VMSPACE/VM_MAP
22 c.dyn 0xf0416e00 0xf2f23000 0xf0420100
<< etc ... >>
db> show map/f 0xf0420100
MAP 0xf0420100: [0x0->0xeffbf000]
pmap=0xf03fa9c0, #ent=12, sz=34103296, ref=1, main=T, version=27
- 0xf041fb40: 0x1000->0x2000: obj=0xf041e000/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=5/7, inh=1, wc=0, adv=0
- 0xf041f100: 0x2000->0x3000: obj=0xf041e000/0x1000, amap=0xf041f740/0
map=F, submap=F, cow=T, nc=F, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf041f940: 0x40002000->0x4000e000: obj=0xf0420c00/0x0, amap=0x0/-264632432
map=F, submap=F, cow=T, nc=T, prot(max)=5/7, inh=1, wc=0, adv=0
- 0xf041f900: 0x4000e000->0x4000f000: obj=0xf0420c00/0xc000, amap=0xf041ff40/0
map=F, submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
- 0xf04178c0: 0x4000f000->0x40018000: obj=0x0/0x0, amap=0xf041f1c0/0
map=F, submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
- 0xf0417280: 0x40019000->0x4001a000: obj=0xf0420a00/0x1000, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=1/7, inh=1, wc=0, adv=0
- 0xf041f2c0: 0x4001a000->0x40078000: obj=0xf0420600/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=5/7, inh=1, wc=0, adv=0
- 0xf041f440: 0x40078000->0x4007b000: obj=0xf0420600/0x5e000, amap=0xf041fec0/0
map=F, submap=F, cow=T, nc=F, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf041f7c0: 0x4007b000->0x40087000: obj=0x0/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf0417340: 0xedbfe000->0xef9fe000: obj=0x0/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=0/7, inh=1, wc=0, adv=0
- 0xf041f680: 0xef9fe000->0xefbf0000: obj=0x0/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf0417ac0: 0xefbf0000->0xefbfe000: obj=0x0/0x0, amap=0xf041f640/0
map=F, submap=F, cow=T, nc=F, prot(max)=7/7, inh=1, wc=0, adv=0
now here we have 12 mappings:
[1] text
[2] data
[3] ld.so text
[4] ld.so data
[5] ld.so bss
[6] ld.so.hints (second page --- i'm not sure why this is still mapped)
[7] libc text
[8] libc data
[9] libc bss
[10] stack reserved area (prot = 0) [in case we grow ulimit]
[11] stack reserved area (prot = r/w)
[12] currently allocated stack
and as you can see, there are quite a few more mappings:
note that any PTE ending in "5" is "read-only" and any PTE ending in
"7" is read-write.
db> call pmap_dump(0xf03fa9c0,0,0)
va 0x1000 -> pa 0x8ef000 (pte=0x8ef425) <<< program text
va 0x2000 -> pa 0x8f2000 (pte=0x8f2467) <<< program data
va 0x40002000 -> pa 0x849000 (pte=0x849425) <<< ld.so text
va 0x40003000 -> pa 0x85c000 (pte=0x85c425) <<< ld.so text
va 0x40004000 -> pa 0x853000 (pte=0x853425) <<< ld.so text
va 0x40005000 -> pa 0x860000 (pte=0x860425) <<< ld.so text
va 0x40006000 -> pa 0x850000 (pte=0x850425) <<< ld.so text
va 0x40007000 -> pa 0x855000 (pte=0x855425) <<< ld.so text
va 0x40008000 -> pa 0x851000 (pte=0x851425) <<< ld.so text
va 0x40009000 -> pa 0x852000 (pte=0x852425) <<< ld.so text
va 0x4000a000 -> pa 0x854000 (pte=0x854425) <<< ld.so text
va 0x4000b000 -> pa 0x84d000 (pte=0x84d425) <<< ld.so text
va 0x4000c000 -> pa 0x84e000 (pte=0x84e425) <<< ld.so text
va 0x4000e000 -> pa 0x8f4000 (pte=0x8f4467) <<< ld.so data
va 0x4000f000 -> pa 0x8f5000 (pte=0x8f5467) <<< ld.so bss
va 0x40010000 -> pa 0x8f6000 (pte=0x8f6467) <<< ld.so bss
va 0x40011000 -> pa 0x8f7000 (pte=0x8f7467) <<< ld.so bss
va 0x40012000 -> pa 0x8f8000 (pte=0x8f8467) <<< ld.so bss
va 0x40013000 -> pa 0x8f9000 (pte=0x8f9467) <<< ld.so bss
va 0x40014000 -> pa 0x8fa000 (pte=0x8fa467) <<< ld.so bss
va 0x40015000 -> pa 0x8fc000 (pte=0x8fc467) <<< ld.so bss
va 0x4001e000 -> pa 0x873000 (pte=0x873425) <<< libc text
va 0x4004f000 -> pa 0x879000 (pte=0x879405) <<< libc text
va 0x40052000 -> pa 0x8ff000 (pte=0x8ff425) <<< libc text
va 0x40053000 -> pa 0x874000 (pte=0x874405) <<< libc text
va 0x40054000 -> pa 0x87e000 (pte=0x87e405) <<< libc text
va 0x40061000 -> pa 0x878000 (pte=0x878405) <<< libc text
va 0x40064000 -> pa 0x87f000 (pte=0x87f425) <<< libc text
va 0x40065000 -> pa 0x884000 (pte=0x884405) <<< libc text
va 0x40066000 -> pa 0x881000 (pte=0x881405) <<< libc text
va 0x40067000 -> pa 0x882000 (pte=0x882405) <<< libc text
va 0x40069000 -> pa 0x87a000 (pte=0x87a405) <<< libc text
va 0x4006a000 -> pa 0x876000 (pte=0x876405) <<< libc text
va 0x4006b000 -> pa 0x86a000 (pte=0x86a425) <<< libc text
va 0x4006c000 -> pa 0x86d000 (pte=0x86d425) <<< libc text
va 0x4006d000 -> pa 0x869000 (pte=0x869425) <<< libc text
va 0x4006e000 -> pa 0x861000 (pte=0x861425) <<< libc text
va 0x4006f000 -> pa 0x866000 (pte=0x866425) <<< libc text
va 0x40070000 -> pa 0x862000 (pte=0x862425) <<< libc text
va 0x40071000 -> pa 0x864000 (pte=0x864425) <<< libc text
va 0x40072000 -> pa 0x867000 (pte=0x867425) <<< libc text
va 0x40073000 -> pa 0x86b000 (pte=0x86b425) <<< libc text
va 0x40074000 -> pa 0x863000 (pte=0x863425) <<< libc text
va 0x40075000 -> pa 0x865000 (pte=0x865425) <<< libc text
va 0x40076000 -> pa 0x868000 (pte=0x868425) <<< libc text
va 0x40077000 -> pa 0x86c000 (pte=0x86c425) <<< libc text
va 0x40078000 -> pa 0x8fb000 (pte=0x8fb467) <<< libc data
va 0x40079000 -> pa 0x8fd000 (pte=0x8fd467) <<< libc data
va 0x4007a000 -> pa 0x8fe000 (pte=0x8fe467) <<< libc data
va 0xefbfd000 -> pa 0x8ed000 (pte=0x8ed467) <<< program stack
0x8c9000
db> c
ok, now we start a second c.dyn program
# jobs
[1] + Running ./c.dyn
# ./c.dyn &
[2] 23
#
Stopped at _Debugger+0x4: leave
db> ps/a
PID COMMAND STRUCT PROC * UAREA * VMSPACE/VM_MAP
23 c.dyn 0xf0416c00 0xf2f26000 0xf0420400
22 c.dyn 0xf0416e00 0xf2f23000 0xf0420100
<< etc ... >>
db> show map/f 0xf0420400
MAP 0xf0420400: [0x0->0xeffbf000]
pmap=0xf041f180, #ent=12, sz=34103296, ref=1, main=T, version=27
- 0xf0417640: 0x1000->0x2000: obj=0xf041e000/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=5/7, inh=1, wc=0, adv=0
- 0xf041fc80: 0x2000->0x3000: obj=0xf041e000/0x1000, amap=0xf041f6c0/0
map=F, submap=F, cow=T, nc=F, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf0417540: 0x40002000->0x4000e000: obj=0xf0420c00/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=5/7, inh=1, wc=0, adv=0
- 0xf041f200: 0x4000e000->0x4000f000: obj=0xf0420c00/0xc000, amap=0xf041ff80/0
map=F, submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
- 0xf041f780: 0x4000f000->0x40018000: obj=0x0/0x0, amap=0xf041f240/0
map=F, submap=F, cow=T, nc=F, prot(max)=3/7, inh=1, wc=0, adv=0
- 0xf041fdc0: 0x40019000->0x4001a000: obj=0xf0420a00/0x1000, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=1/7, inh=1, wc=0, adv=0
- 0xf041f4c0: 0x4001a000->0x40078000: obj=0xf0420600/0x0, amap=0x0/-264632432
map=F, submap=F, cow=T, nc=T, prot(max)=5/7, inh=1, wc=0, adv=0
- 0xf03fa780: 0x40078000->0x4007b000: obj=0xf0420600/0x5e000, amap=0xf041fb80/0
map=F, submap=F, cow=T, nc=F, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf041f480: 0x4007b000->0x40087000: obj=0x0/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf041fc40: 0xedbfe000->0xef9fe000: obj=0x0/0x0, amap=0x0/-264632432
map=F, submap=F, cow=T, nc=T, prot(max)=0/7, inh=1, wc=0, adv=0
- 0xf041f8c0: 0xef9fe000->0xefbf0000: obj=0x0/0x0, amap=0x0/0
map=F, submap=F, cow=T, nc=T, prot(max)=7/7, inh=1, wc=0, adv=0
- 0xf0417780: 0xefbf0000->0xefbfe000: obj=0x0/0x0, amap=0xf041fac0/0
map=F, submap=F, cow=T, nc=F, prot(max)=7/7, inh=1, wc=0, adv=0
ok, we've got the same 12 regions, now let's look at the mappings
db> call pmap_dump(0xf041f180,0,0)
va 0x1000 -> pa 0x8ef000 (pte=0x8ef425) << text (SHARED)
va 0x2000 -> pa 0x928000 (pte=0x928467) << data (private)
va 0x40002000 -> pa 0x849000 (pte=0x849425) << ld.so text (SHARED)
va 0x40003000 -> pa 0x85c000 (pte=0x85c425) << ld.so text (SHARED)
va 0x40004000 -> pa 0x853000 (pte=0x853425) << ld.so text (SHARED)
va 0x40005000 -> pa 0x860000 (pte=0x860425) << ld.so text (SHARED)
va 0x40006000 -> pa 0x850000 (pte=0x850425) << ld.so text (SHARED)
va 0x40007000 -> pa 0x855000 (pte=0x855425) << ld.so text (SHARED)
va 0x40008000 -> pa 0x851000 (pte=0x851425) << ld.so text (SHARED)
va 0x40009000 -> pa 0x852000 (pte=0x852425) << ld.so text (SHARED)
va 0x4000a000 -> pa 0x854000 (pte=0x854425) << ld.so text (SHARED)
va 0x4000b000 -> pa 0x84d000 (pte=0x84d425) << ld.so text (SHARED)
va 0x4000c000 -> pa 0x84e000 (pte=0x84e425) << ld.so text (SHARED)
va 0x4000e000 -> pa 0x92a000 (pte=0x92a467) << ld.so data (private)
va 0x4000f000 -> pa 0x92b000 (pte=0x92b467) << ld.so bss (private)
va 0x40010000 -> pa 0x92c000 (pte=0x92c467) << ld.so bss (private)
va 0x40011000 -> pa 0x92d000 (pte=0x92d467) << ld.so bss (private)
va 0x40012000 -> pa 0x92e000 (pte=0x92e467) << ld.so bss (private)
va 0x40013000 -> pa 0x92f000 (pte=0x92f467) << ld.so bss (private)
va 0x40014000 -> pa 0x930000 (pte=0x930467) << ld.so bss (private)
va 0x40015000 -> pa 0x932000 (pte=0x932467) << ld.so bss (private)
va 0x4001e000 -> pa 0x873000 (pte=0x873425) << libc text (SHARED)
va 0x4004f000 -> pa 0x879000 (pte=0x879405) << libc text (SHARED)
va 0x40052000 -> pa 0x8ff000 (pte=0x8ff425) << libc text (SHARED)
va 0x40053000 -> pa 0x874000 (pte=0x874405) << libc text (SHARED)
va 0x40054000 -> pa 0x87e000 (pte=0x87e405) << libc text (SHARED)
va 0x40061000 -> pa 0x878000 (pte=0x878405) << libc text (SHARED)
va 0x40064000 -> pa 0x87f000 (pte=0x87f425) << libc text (SHARED)
va 0x40065000 -> pa 0x884000 (pte=0x884405) << libc text (SHARED)
va 0x40066000 -> pa 0x881000 (pte=0x881405) << libc text (SHARED)
va 0x40067000 -> pa 0x882000 (pte=0x882405) << libc text (SHARED)
va 0x40069000 -> pa 0x87a000 (pte=0x87a405) << libc text (SHARED)
va 0x4006a000 -> pa 0x876000 (pte=0x876405) << libc text (SHARED)
va 0x4006b000 -> pa 0x86a000 (pte=0x86a425) << libc text (SHARED)
va 0x4006c000 -> pa 0x86d000 (pte=0x86d425) << libc text (SHARED)
va 0x4006d000 -> pa 0x869000 (pte=0x869425) << libc text (SHARED)
va 0x4006e000 -> pa 0x861000 (pte=0x861425) << libc text (SHARED)
va 0x4006f000 -> pa 0x866000 (pte=0x866425) << libc text (SHARED)
va 0x40070000 -> pa 0x862000 (pte=0x862425) << libc text (SHARED)
va 0x40071000 -> pa 0x864000 (pte=0x864425) << libc text (SHARED)
va 0x40072000 -> pa 0x867000 (pte=0x867425) << libc text (SHARED)
va 0x40073000 -> pa 0x86b000 (pte=0x86b425) << libc text (SHARED)
va 0x40074000 -> pa 0x863000 (pte=0x863425) << libc text (SHARED)
va 0x40075000 -> pa 0x865000 (pte=0x865425) << libc text (SHARED)
va 0x40076000 -> pa 0x868000 (pte=0x868425) << libc text (SHARED)
va 0x40077000 -> pa 0x86c000 (pte=0x86c425) << libc text (SHARED)
va 0x40078000 -> pa 0x931000 (pte=0x931467) << libc data (private)
va 0x40079000 -> pa 0x933000 (pte=0x933467) << libc data (private)
va 0x4007a000 -> pa 0x934000 (pte=0x934467) << libc data (private)
va 0xefbfd000 -> pa 0x925000 (pte=0x925467) << stack (private)
doing a little counting you can see that each dynamic program has
50 physical pages of memory mapped in. Of those 50 pages, 37 of
them are shared between the two processes, and 13 of them are private
to each process.
clearly for a small program like your example, the extra shared library
overhead of having data and bss for ld.so and libc is going to cost (in
this case 11 extra private pages).
however, it can be seen from the above that the text pages for both
ld.so and libc are in fact shared between processes. so i would
say that the shared library system is working "as expected" and i
believe we can discount the GNU tools as contributing factors in the
process memory usage.
chuck
ps- here is pmap_dump in case anyone wants to try playing with it
some:
void pmap_dump __P((struct pmap *, vm_offset_t, vm_offset_t));
/*
* pmap_dump: dump all the mappings from a pmap
*
* => caller should not be holding any pmap locks
*/
void pmap_dump(pmap, sva, eva)
struct pmap *pmap;
vm_offset_t sva, eva;
{
pt_entry_t *ptes, *pte;
vm_offset_t blkendva;
/*
* if end is out of range truncate.
* if (end == start) update to max.
*/
if (eva > VM_MAXUSER_ADDRESS || eva <= sva)
eva = VM_MAXUSER_ADDRESS;
/*
* we lock in the pmap => pv_head direction
*/
PMAP_MAP_TO_HEAD_LOCK();
ptes = pmap_map_ptes(pmap); /* locks pmap */
/*
* dumping a range of pages: we dump in PTP sized blocks (4MB)
*/
for (/* null */ ; sva < eva ; sva = blkendva) {
/* determine range of block */
blkendva = i386_round_pdr(sva+1);
if (blkendva > eva)
blkendva = eva;
if (!pmap_valid_entry(pmap->pm_pdir[pdei(sva)])) /* valid block? */
continue;
pte = &ptes[i386_btop(sva)];
for (/* null */; sva < blkendva ; sva += NBPG, pte++) {
if (!pmap_valid_entry(*pte)) continue;
printf("va %#lx -> pa %#x (pte=%#x)\n", sva, *pte & PG_FRAME, *pte);
}
}
/*
* done!
*/
pmap_unmap_ptes(pmap);
PMAP_MAP_TO_HEAD_UNLOCK();
return;
}