tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Straw proposal: MI kthread vector/fp unit API



Here's a straw proposal for an MI API to allow a kthread to use any
vector or floating-point unit on the CPU -- call it the `FPU' for
brevity.

The MI concept of `the FPU' encompasses _all_ vector or floating-point
units that userland threads would have access to, so we don't have to
complicate it by distinguishing, e.g., the crypto registers from the
floating-point registers on Cavium -- it's all or nothing.

1. New kthread flag KTHREAD_FPU.

   Any kthread created with this flag will have its FPU state saved
   and restored like a userland thread.

   The implementation would be a new lwp l_pflag, say LP_SYSTEM_FPU.
   MD FPU traps which currently panic on LP_SYSTEM lwps will panic
   only if LP_SYSTEM && !LP_SYSTEM_FPU.

2. New functions

   s = kthread_fpu_enter();
   ...
   kthread_fpu_exit(s);

   During this time, it has the effect of the KTHREAD_FPU flag, and
   kthread_fpu_enter/exit nest.  kthread_fpu_exit additionally zeroes
   the FPU registers to avoid leaking secrets through Spectre-class
   vulnerabilities in case an adversary can control speculative FPU
   execution before the next FPU-changing context switch.

3. New workqueue flag WQ_FPU passes KTHREAD_FPU to all the internal
   kthreads.  Threadpools do not have any new flag -- they can use
   kthread_fpu_enter/exit in the job function, since different
   threadpool jobs by design share kthreads with one another.

There may also be MD functions like x86 fpu_kern_enter to use the FPU
with preemption disabled.  They may be limited to a single type of FPU
or vector unit, e.g. just Cavium crypto but not MIPS floating-point.
These functions can avoid disabling preemption -- and avoind zeroing
the FPU registers -- in FPU-enabled kthreads.

That way, for example, you can use (say) an AES encryption routine
aes_enc as a subroutine anywhere in the kernel, and an MD definition
of aes_enc can internally use AES-NI with the appropriate MD
fpu_kern_enter -- but it's a little cheaper to use aes_enc in an
FPU-enabled kthread.  This gave a modest measurable boost to cgd(4)
throughput in my preliminary experiments.

Thoughts?


Home | Main Index | Thread Index | Old Index