Subject: Re: What's a "real" elf loader like ?
To: Cherry G. Mathew <cherry@zyx.in>
From: Quentin Garnier <cube@cubidou.net>
List: tech-kern
Date: 06/17/2006 10:52:06
--lqaZmxkhekPBfBzr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

[Sorry, this is longish, but it's been on my mind for a while.]

On Fri, Jun 16, 2006 at 04:01:59AM +0530, Cherry G. Mathew wrote:
> I was wondering if someone could put down briefly and to the point,=20
> what the NetBSD "elf" related shortcomings are.
>=20
> Here's my shallow understanding of the situation.
>=20
> - libsa has no standalone support for kernel modules, does not=20
> understand all elf header types.
> - There are shortcomings related to kernel module linking, which uses=20
> userspace ld ( for what ? How ? )

Your question is not about having an ELF loader in the kernel (and maybe
also in libsa--but that's another debate), it is about the current
status of kernel modules.

Frankly, our LKMs suck.  The dependency on ld(1) is terrible, because
it means LKMs depend on comp.tgz, even though modload(8) is in /sbin.
That's the main problem we have to fix.

The other problem is that lkm(4) is from another age, and has several
shortcomings.  If you want a module that has several roles, you'll have
to do a lot by yourself and be careful when loading and unloading.  This
interface also has no knowledge whatsoever of dependency between
modules;  it will happily let you unload a module that other loaded
modules depend on.

ksyms(4) also has a few issues when dealing with LKMs.  There's an open
PR about its incredible slowness, for instance.

However, the current LKM scheme is very simple:  modules are relocatable
ELF files, which are linked to the kernel after modload(8) has retrieved
a base address from the kernel.

I've been thinking of a new LKM system for a while now, and now I'll
describe what I would like to see, what are the inconveniences of it,
and I'll compare it briefly with FreeBSD's KLD.

Just like Eric (and completely independently), I made a patch to
config(1) something like two years ago to allow specification of modules
directly from inside the kernel tree.  The idea was to have as many
modules as possible.

Having good support for modules is a good thing nowadays, but in some
environments, you really want to avoid them.  My idea at the time (it is
still mostly the same) was to have modules as relocatable files.  Then
you'd just link them together to build the kernel binary.  I think this
would be quite neat, actually:  we would distribute a minimal kernel and
all the modules, and if the user wants to have a monolithic kernel that
only has the drivers for his hardware, he'd just have to re-do the final
link stage with the relevant modules, and drop the rest.  No need to
fetch the sources and recompile.  Adding a 3rd-party binary module would
work the same way.  That's why I like relocatable images, it's a very
flexible way to manage modules.

In my config(1) patch, the granularity for the modularity was the
attribute.  I don't remember if I had started allowing modularity for
options and related (deffs), but it was at least planned.  I should have
the code somewhere anyway.

The kernel options description files could have stuff like this:

    modular defattr foo

    modular device bar

    device baz
    modular attach baz at fol with baz_fol

which would mark repectively the attributes foo, bar and baz_fol as
potentially modular.  Then the user would have e.g.:

    module baz* at fol?

which would have the module baz_fol.o created.

If a file depends on more than one attribute (all have to be modular to
produce a module), the module would group the attributes together (and
all files depending on any subset of those attributes) into a module
creatively named.  That was the weak part, but a long time goal is to
separate more clearly sources to have less dependency of that kind.
Modularity is not a friend of #ifdefs.

So, that's what I had in mind for the past couple of years.  Now, the
subject of the thread is about an ELF loader, right?  Ok, ok.

Code to load an ELF file is not difficult to write per se.  The real
question is to know what kind of file we want to load, because we can't
just load any ELF file.  Relocatable files are nice because you can
really load them about anywhere, section by section.  Static executable
files are not suitable for modules because you can't relocate them,
something that is not acceptable in the kernel context.  Shared objects
are relocatable too, and makes the relocator a bit simpler.  However,
you can't group shared objects together to build a new one.

Right now we have two needs:  an ELF loader, and a relocator.  Each of
them kind of depend on what the other intend to deal with.

The past few days I started working on writing an in-kernel ELF loader.
I have several train journeys planned for the next two weeks so I will
have some time to work on this :-) (It's just the loader;  someone else
started working on a relocator.)

We actually have a third need, the module manager itself.  Something has
to handle dependencies, initialise the modules and so on.  Information
about all this has to be stored somewhere.

Link sets are attractive to store that information, but as it
potentially burns a lot of sections, I'd rather have them disposable as
soon as the module is done using them.  For that I thought of having a
program header for the module code and data, the resident part, and one
for the disposable sections, which are used only during initialisation.

Of course, ld -r is unable to produce files with program headers (and,
why not?  an entry point), so an external tool would be needed, and then
linking together modules would be slightly harder.

Then there is the issue of symbol visibility.  One thing that will have
to be defined at some point is what is the kernel API for modules.
Currently, everything is potentially part of the API, so it is very hard
to keep track of the changes.

FreeBSD went the dynamic way with its KLD system.  The kernel itself is
a dynamic executable, and modules are shared objects.  However, the only
use made of this, I think, is the more simple relocator.  For instance,
module dependencies are not expressed in terms of NEEDED entries in the
=2Edynamic section.  (Not that they necessarily should;  it's just
something the mere nature of a dynamic object offers.)

One thing to note is that with relocatable files, there are a few archs
that are troublesome, namely arm and ppc.  But automatically generating
trampoline for those is doable, and not that difficult.  However, it's a
problem we'd not have with shared objects.

My current plan is to continue experimenting manipulation of relocatable
files for a while to see if I can end up with something satisfactory on
accounts of memory requirements, ease of use and modules management.  I
will always have the potential solution of special-casing sections
depending on their name, even though it introduces to much knowledge in
the ELF loader itself.

Oh, and one last thing.  The syscall to load a module will take two
const char * arguments:  the name of the file to load, and a proplib
dictionary, to be used to pass parameters.

I hope this answers your questions about the state of this, and with
input from other people we'll end up with a clear objective very soon.

--=20
Quentin Garnier - cube@cubidou.net - cube@NetBSD.org
"When I find the controls, I'll go where I like, I'll know where I want
to be, but maybe for now I'll stay right here on a silent sea."
KT Tunstall, Silent Sea, Eye to the Telescope, 2004.

--lqaZmxkhekPBfBzr
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (NetBSD)

iQEVAwUBRJPCtdgoQloHrPnoAQJJagf+JiJSI1U4e7gLU2MaMdABgGYCnRnSPEeF
uQyqHslm856jzvQfvjwFX804zHKilr5d4QHtEUa6xCBcM7/xTr0HyxV3L+6ph7FD
GorN3ngbYKTgvioGLWeKGo0KrN1xXuHNMe2Je8DLdOZ1XFc5Z3i0NRAarrt84j4+
m+ykJCSI6jyzr4Mo2VtiSVQFd3b/aTW8NMCGPFQL+MJVTljEDyCaIe2FfJdeylD0
QTovPHhRUkdShrzXX7bBxIU7dwmY7fkX643CY+0TdgAMsmPIW6+K1QM0nm082qEd
DGRIe6u850eqEnu/OL1QN2jM/438WznVX6KaF/W54ZtdXE8FNT2HWg==
=bKgR
-----END PGP SIGNATURE-----

--lqaZmxkhekPBfBzr--