Subject: Shrinking NetBSD - deep final distribution re-linker
To: None <tech-userlevel@netbsd.org>
From: Ian Zagorskih <ianzag@megasignal.com>
List: tech-userlevel
Date: 10/19/2004 16:04:16
Sorry for quite confusing subject, just i don't know how exactly express my 
ideas :) Not sure where exactly should i post this message..

Pre-history. We're using NetBSD in various embedded devices and sometimes data 
storage resources are limited (usually it's some kind of flash). Today on the 
base of standard NetBSD distribution i can freely build target installations 
with relatively small size, about ~8 Mb. This includes non-stripped 
and non-compressed kernel (about 1Mb) and some sub-sets 
of /bin, /sbin, /usr/bin and so on. Cut off is done on the basis "what do i 
need for a simple terminal server to administer it and write/run simple 
scripts". Right now all works just fine :)

Well, cannot say i really need to shrink existing NetBSD installations for our 
projects. Flash is relatively cheap today and if you use USB/Compact 
Flash/Disk On Module cards you most likely the smallest card you can get is 
at least 8Mb. On the other hand, i feel that theoretically i can make a much 
smaller but fully operational NetBSD installation. Just to the sake of the 
pure art :)

One of the hugest file in installation is libc and some other shared 
libraries. I have a feeling that huge part of libc is actually not used by 
any application so it just wastes space. And if i'd removed this part nothing 
would break but it saves space.

Sure, i can build custom libc from source code switching off this or these 
source sub-trees. But from my point of view this way brings some minor and 
major technical problems. Starting from "what to remove?" and ending with how 
to update custom source tree from NetBSD's CVS. I would prefer to leave 
sources tree alone in unmodified state as i fetched it from the vendor.

So why not to walk opposite way and not to shrink binary libc itself ? I have 
a set of dynamically linked executables in ELF format. I can read table of 
imported symbols from each of them and build a common table of used symbols. 
Next i can search for this symbols in some predefined set of shared 
libraries (resolve them). For example, 100 symbols was located in libc while 
libc itself contains 500 exported symbols. From my point of view, nothing 
stops me to manually "re-link" libc and drop this unused 400 symbols so now 
libc contains only 100 actually required symbols. This way i can "re-link" 
other shared libraries which are used by my executables.

Well, i hope you got the idea i'm talking about :) At the end of this 
operation i should make a set of custom shared libraries which contains only 
used code.

Sure, there are at least two problems i see ATM:
1) Custom libs quite likely willn't work with new apps which probably require 
missing symbols. So i need to re-link final set every time i fix it.
2) I don't see a way how to determine symbols accessed with dlopen()/dlsym(). 
Let's forget about them at this moment.

My first question is - does anybody saw anything like i'm talking about ? Some 
kind of "ELF cleanup toolkit" i'd say. The idea itself is quite obvious. 
Please note that this is not like making a single executable from a set or 
making gziped executabes with custom startup code.

My second question is - where i failed in my ideas ? :) What kind of problems 
i'll finally face with cutting off shared libraries like i pointed above ?

Well, if anybody is interested in using NetBSD as a base platform for embedded 
designs in environment with limited resources i would be glad to discuss 
various technical ideas/anything.

Thanks all.

// wbr