tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: AES leaks, cgd ciphers, and vector units in the kernel
This is a *huge* effort. Thank you.
On Sun, Jun 28, 2020 at 03:27:56AM +0000, Taylor R Campbell wrote:
> > Date: Mon, 22 Jun 2020 23:43:20 +0000
> > From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
> >
> > There is some more room for improvement -- SSSE3 provides PSHUFB which
> > can sequentially speed up parts of AES, and is supported by a good
> > number of amd64 CPUs starting around 14 years ago that lack AES-NI --
> > but there are diminishing returns for increasing implementation and
> > maintenance effort, so I'd like to focus on making an impact on
> > systems that matter. (That includes non-x86 CPUs -- e.g., we could
> > probably easily adapt the Intel SSE2 logic to ARM NEON -- but I would
> > like to focus on systems where there is demand.)
>
> I drafted derivatives of Mike Hamburg's vpaes code using Intel SSSE3
> and using ARM NEON / aarch64 SIMD. In principle the ARM NEON code
> should work on armv7, but I have only compile-tested it there, and
> there are a few kinks to be worked out before it can be used in the
> kernel on armv7.
>
> I pushed it to the riastradh-kernelcrypto topic on hg src-draft, and I
> updated the userland aestest utility if you want to get a rough idea
> of the performance without updating your kernel (see previous message
> for usage instructions):
>
> https://www.NetBSD.org/~riastradh/tmp/20200627/aestest.tgz
>
> The summary of the patch set now is (kernel only -- no userland
> changes):
>
> - every architecture gets constant-time AES, with BearSSL's aes_ct
> 32-bit bitsliced implementation -- there is no more vulnerable AES
> code in the NetBSD kernel, although there is a substantial
> performance hit on many platforms
>
> - every architecture gets new cgd(4) support for Adiantum, which is
> generally as fast as or faster than AES-CBC and AES-XTS were before
> and provides better security (and has lots of room to be sped up;
> any speedups would also be applicable to other purposes too, like
> Wireguard)
>
> - most high-end x86 of the past decade gets much much faster AES with
> AES-NI CPU support (no 32-bit yet)
>
> - almost all x86 of the past decade gets faster or much faster AES
> with a vpaes-style SSSE3-based implementation (32-bit included)
>
> - most x86 of the past two decades, including all amd64, mitigates the
> performance hit with a bitsliced SSE2-based implementation (32-bit
> included)
>
> - VIA gets much faster AES with VIA ACE (for all users in the kernel,
> including cgd, not just those that use opencrypto as we had before
> with the via_padlock.c driver)
>
> - almost all aarch64 (except rpi) gets much much faster AES with
> ARMv8.0-AES CPU support
>
> - 64-bit rpi (and, with a little more work, armv7 with NEON) mitigates
> the performance hit -- and may get faster -- with a vpaes-style
> NEON-based implementation
>
> Some other CPUs like modern POWER have AES CPU instructions these days
> too. The vpaes approach could probably be adapted to PowerPC Altivec,
> and maybe some other vector units I'm not as familiar with (MIPS SIMD
> Architecture, MSA?). BearSSL's aes_ct64 64-bit bitsliced
> implementation might be worth adopting for 64-bit CPUs without a
> vector unit, if anyone cares -- maybe alpha or mips64. But I think
> I'm at the limit of what I'm willing to do for fun with the hardware I
> have easy access to.
--
Thor Lancelot Simon tls%panix.com@localhost
"Whether or not there's hope for change is not the question. If you
want to be a free person, you don't stand up for human rights because
it will work, but because it is right." --Andrei Sakharov
Home |
Main Index |
Thread Index |
Old Index