tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: AES leaks, cgd ciphers, and vector units in the kernel



On Thu, Jun 18, 2020 at 08:21:36PM -0400, Greg Troxel wrote:
> So it remains to make userland AES use also constant time, as a separate
> step?

For userland AES, we are mostly using OpenSSL.

There was a bug recently in the in-src copy of OpenSSL that prevented it
from detecting the CPU type properly, but now I fixed that...

- On x86_64 it uses AESNI (constant-time) if supported by the CPU
  Otherwise, bitsliced or vector-permute AES are used (also constant-time)
- On aarch64 it uses vector-permute AES (constant-time)
- On armv7 it uses bit-sliced AES (constant-time)

Note that much of the constant-time AES code in OpenSSL (unlike the
other asm) does not appear to have been released under the modified
BSD license, so we probably can't reuse it in the kernel?

The standard C implementation in OpenSSL is not constant-time.

However, for many architectures there are standard asm variants,
which appear to be ~resistant to timing attacks (probably not immune,
not sure to what degree). For example, in the sparc64 implementation
there's a comment:

# The major reason for undertaken effort was to mitigate the hazard of
# cache-timing attack. This is [currently and initially!] addressed in
# two ways. 1. S-boxes are compressed from 5KB to 2KB+256B size each.
# 2. References to them are scheduled for L2 cache latency, meaning
# that the tables don't have to reside in L1 cache. Once again, this
# is an initial draft and one should expect more countermeasures to
# be implemented...
#
# Version 1.1 prefetches T[ed]4 in order to mitigate attack on last
# round.
#
# Even though performance was not the primary goal [on the contrary,
# extra shifts "induced" by compressed S-box and longer loop epilogue
# "induced" by scheduling for L2 have negative effect on performance],
# the code turned out to run in ~23 cycles per processed byte en-/
# decrypted with 128-bit key. This is pretty good result for code
# with mentioned qualities and UltraSPARC core. Compared to Sun C
# generated code my encrypt procedure runs just few percents faster,
# while decrypt one - whole 50% faster [yes, Sun C failed to generate
# optimal decrypt procedure]. Compared to GNU C generated code both
# procedures are more than 60% faster:-)


Home | Main Index | Thread Index | Old Index