tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Straw proposal: MI kthread vector/fp unit API
> Date: Mon, 22 Jun 2020 18:45:47 +0000 (UTC)
> From: Eduardo Horvath <eeh%NetBSD.org@localhost>
>
> I think this is sort of a half-measure since it restricts
> coprocessor usage to a few threads. If you want to say, implement
> the kenrel memcopy using vector registers (the way sparc64 does)
> this doesn't help and may end up getting in the way.
Why do you think this restricts it to a few threads or gets in the way
of anything?
As I wrote in my original message:
That way, for example, you can use (say) an AES encryption routine
aes_enc as a subroutine anywhere in the kernel, and an MD definition
of aes_enc can internally use AES-NI with the appropriate MD
fpu_kern_enter -- but it's a little cheaper to use aes_enc in an
FPU-enabled kthread. This gave a modest measurable boost to cgd(4)
throughput in my preliminary experiments.
Note that the subroutine (here aes_enc, but it could in principle be
memcpy too) works `anywhere in the kernel', not just restricted to a
few threads.
The definition of aes_enc with AES-NI CPU instructions on x86 already
works (https://mail-index.netbsd.org/tech-kern/2020/06/18/msg026505.html
for details); just putting kthread_fpu_enter/exit around cgd_process
in cgd.c improved throughput on a RAM-backed disk by about 20%
(presumably mostly because it avoids zeroing the fpu registers on
every aes_* call in that thread).
> I'd do something simpler such as adding a MI routine to allocate or
> activate a temporary or permanent register save area that can be used by
> kernel threads.
>
> Then, if you want, in the coprocessor trap handler, if you want, if you
> are in kernel state you can check whether a kernel save area has been
> allocated and panic if not.
This sounds like a plausible alternative to disabling kpreemption in
some cases, but it is also orthogonal to my proposal -- in an
FPU-enabled kthread there is simply no need to allocate an extra save
area at all because it's already allocated in the lwp pcb, so if a
subroutine does use the FPU then it's cheaper to call that subroutine
in an FPU-enabled kthread than otherwise.
You say it would be simpler -- can you elaborate on how it would
simplify the implementations that already work on x86 and aarch64 by
just adding and testing a new flag in a couple places, and enabling or
disabling the CPU's FPU-enable bit?
https://anonhg.netbsd.org/src-all/rev/e83ef87e4f53
https://anonhg.netbsd.org/src-all/rev/7ec4225df101
Home |
Main Index |
Thread Index |
Old Index