tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kernel linker wish
I was reminded while reading the 'options LUA' discussion about a
feature that I wished our kernel had. I'll sketch it out here and hope
that it's interesting enough to someone who has enough spare time that
they can go and program it. :-)
It would be Nice(TM) if the kernel linker understood weak/strong aliases
in the kernel and in kernel modules so that using weak aliases one could
provide a stub implementation for an optional subsystem in the kernel,
and using strong aliases a loadable module could provide a full-fledged
implementation. Taking bpf(4) as an example, the kernel could provide
weak aliases to stub routines for, e.g., bpf_attach(), bpf_mtap(), etc:
__weak_alias(bpf_attach, voidop);
__weak_alias(bpf_mtap, voidop);
.
.
.
and the BPF kernel module could provide strong aliases to the actual
implementation:
__strong_alias(bpf_attach, bpf_attach_impl);
__strong_alias(bpf_mtap, bpf_mtap_impl);
.
.
.
There are a couple of reasons, I think, to prefer this to the scheme
that bpf(4) uses now to provide a stub implementation and a modular
"real" implementation. One, using the aliases scheme lets the kernel
patch in direct calls, so we could avoid indirect calls through the
bpf_ops vector. Also, it's not necessary to create an operations
vector like bpf_ops for every module that we want to provide stub/real
implementations for.
A rough idea for how to implement this in the kernel linker
is this: when the kernel linker finds a strong alias
bpf_attach -> bpf_attach_impl in a kernel module that overrides
an existing weak alias bpf_attach -> voidop, it can "push" the
old alias onto a stack corresponding to the symbol 'bpf_attach',
push(aliases['bpf_attach'], voidop). When it unloads the kernel module,
it re-assigns the alias: bpf_attach -> pop(aliases['bpf_attach']).
Let's say for now that the height of this stack is just 1 or 0.
Of course, you don't want to unload a kernel module while the kernel
is in it. That is, you don't want for the text of, say, bpf_attach()
to go away while the kernel cv_wait()s inside it. I believe you
can handle that using the entrance/exit-counting scheme for softc's
that I've described earlier (Subject: kicking everybody out of the
softc) in conjunction with a new modcmd, MODULE_CMD_CATCH, that
tells a module to change its behavior while the kernel unloads it.
Roughly, unloading a kernel module would go something like this:
1 Prepare the module to catch new threads as they try to the enter the
module, modcmd(MODULE_CMD_CATCH). Preparation may entail creating
a mutex/condvar pair. Threads that subsequently enter the
module may have to acquire the mutex on the way in, signal the
condvar and release the mutex on the way out.
2 Re-link the kernel's stub implementations (e.g.,
bpf_attach -> voidop). In this way, no more threads may enter the
module, so we can hope for the next step to finish.
3 Module-specific cleanup, modcmd(MODULE_CMD_FINI), may acquire a
mutex installed in step 1, and wait for every thread to quit the
module---i.e., entrance count equals exit count---using a condvar
installed in step 2.
Sometimes this step may fail. Putting things back the way they
were should be possible, but it could be tricky.
4 Finish unlinking the module. Reclaim the module's text/data memory.
Taking this a step further, suppose we want to layer one implementation
on another. I.e., some module is stubbed out in the kernel. We
load a module that provides implementation A. Then we load another
module providing implementation B that refines implementation A.
Or vice versa: we load implementation B, first, implementation A
second, and A refines B. I've been contemplating this in the
context of bus_space(9): one module may provide some debug
instrumentation such as an mmap(2)-able ring buffer of bus_space(9)
access records looking sort of like [I/O read | I/O write |
memory read | ..., address, width, value]. A second module may
provide advanced I/O exception handling. And a third module may
re-order or delay reads and writes between bus barriers in order
to simulate important corner-cases of bus operation. Any module
may refine either the behavior of the previously-loaded modules or
the behavior of the default implementation. For example, let's
consider modules that override bus_space_read_4(). Say the default
implementation is in _bus_space_read_4:
__weak_alias(bus_space_read_4, _bus_space_read_4)
The module with debug instrumentation, bus_space_debug.kmod, has
a weak alias, bus_space_read_4, for its implementation called
debug_bus_space_read_4,
__weak_alias(bus_space_read_4, debug_bus_space_read_4)
It also reserves a private symbol for calling the implementation that it
overrides. Call that symbol super_bus_space_read_4.
The module with exception handling, bus_space_xh.kmod, has
xh_bus_space_read_4,
__weak_alias(bus_space_read_4, xh_bus_space_read_4)
and likewise reserves a private symbol for calling the implementation it
overrode, also called super_bus_space_read_4.
If we load the modules bus_space_debug.kmod and bus_space_xh.kmod in
that order, then a call to bus_space_read_4 gets the xh_bus_space_read_4
implementation, which does its work and calls (through its symbol
super_bus_space_read_4) debug_bus_space_read_4, which does its work and
calls (through its super_bus_space_read_4) the default implementation,
_bus_space_read_4.
I think that to implement loading/unloading modules that refine each
other in this way, you could also use the aliases[symbol] stacks, but
they would grow taller than 0 or 1 items.
It is strange to use a weak alias to override a weak alias (why should a
loadable module's weak alias override the kernel's weak alias?); it may
be necessary to have a new kind of alias or else some meta-information
about each alias so that there is no ambiguity about what the kernel
linker should do.
Dave
--
David Young
dyoung%pobox.com@localhost Urbana, IL (217) 721-9981
Home |
Main Index |
Thread Index |
Old Index