tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Straw proposal: MI kthread vector/fp unit API



> Date: Mon, 22 Jun 2020 18:45:47 +0000 (UTC)
> From: Eduardo Horvath <eeh%NetBSD.org@localhost>
> 
> I think this is sort of a half-measure since it restricts
> coprocessor usage to a few threads.  If you want to say, implement
> the kenrel memcopy using vector registers (the way sparc64 does)
> this doesn't help and may end up getting in the way.

Why do you think this restricts it to a few threads or gets in the way
of anything?

As I wrote in my original message:

   That way, for example, you can use (say) an AES encryption routine
   aes_enc as a subroutine anywhere in the kernel, and an MD definition
   of aes_enc can internally use AES-NI with the appropriate MD
   fpu_kern_enter -- but it's a little cheaper to use aes_enc in an
   FPU-enabled kthread.  This gave a modest measurable boost to cgd(4)
   throughput in my preliminary experiments.

Note that the subroutine (here aes_enc, but it could in principle be
memcpy too) works `anywhere in the kernel', not just restricted to a
few threads.

The definition of aes_enc with AES-NI CPU instructions on x86 already
works (https://mail-index.netbsd.org/tech-kern/2020/06/18/msg026505.html
for details); just putting kthread_fpu_enter/exit around cgd_process
in cgd.c improved throughput on a RAM-backed disk by about 20%
(presumably mostly because it avoids zeroing the fpu registers on
every aes_* call in that thread).

> I'd do something simpler such as adding a MI routine to allocate or 
> activate a temporary or permanent register save area that can be used by 
> kernel threads.  
> 
> Then, if you want, in the coprocessor trap handler, if you want, if you 
> are in kernel state you can check whether a kernel save area has been 
> allocated and panic if not.

This sounds like a plausible alternative to disabling kpreemption in
some cases, but it is also orthogonal to my proposal -- in an
FPU-enabled kthread there is simply no need to allocate an extra save
area at all because it's already allocated in the lwp pcb, so if a
subroutine does use the FPU then it's cheaper to call that subroutine
in an FPU-enabled kthread than otherwise.

You say it would be simpler -- can you elaborate on how it would
simplify the implementations that already work on x86 and aarch64 by
just adding and testing a new flag in a couple places, and enabling or
disabling the CPU's FPU-enable bit?

https://anonhg.netbsd.org/src-all/rev/e83ef87e4f53
https://anonhg.netbsd.org/src-all/rev/7ec4225df101


Home | Main Index | Thread Index | Old Index