[I am not subscribed to this list, so if you want to answer, make sure to CC
me]
In order to explore error branches, and to test the kernel's ability to cope
with failures, it is often necessary to hard-trigger such failures.
Here is an implementation [1] for fault(4), a driver that allows to trigger
failures in the kernel. A similar driver exists in Linux.
The fault_inject() function decides whether to return true or false,
depending
on parameters configurable by userland via ioctls on /dev/fault. The caller
of
this function should then error out depending on the return value. Typically:
whatever_subsystem()
{
...
if (fault_inject())
return NULL; // means failure
...
return non_null; // means success
}
Several modes can be available, I have implemented one for now, the N-th
mode:
every N-th call to fault_inject (N being configurable) will make it return
true.
Several scopes are available: global (ie system-wide), or calling LWP.
Examples:
- mode=NTH scope=GLOBAL: every N-th call to fault_inject() in the whole
kernel
will return true, regardless of the LWP.
- mode=NTH scope=LWP: every N-th call to fault_inject() made by the LWP that
enabled the mode will return true. For the other LWPs, fault_inject()
always
returns false.
fault_inject() can be introduced in any place of interest. For now I added it
in pool_cache_get():
if (flags & PR_NOWAIT) {
if (fault_inject())
return NULL;
}
Running ATF with kASan+LOCKDEBUG+fault with {N=32 scope=GLOBAL} already gives
an instant crash:
kernel diagnostic assertion
"radix_tree_empty_tree_p(&pmap->pm_pvtree)"
failed: file ".../sys/arch/x86/x86/pmap.c"
Looks like radixtree.c doesn't handle allocation failures very well
somewhere.
fault(4) seems like the kind of feature that would be useful for
stress-testing
and fuzzing. As you can see in the diff, its code is extremely simple.
Maxime
[1] https://m00nbsd.net/garbage/fault/fault.diff
!DSPAM:5e3e9212102679617345149!