Port-sparc64 archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bswap is slow on SPARC
> Date: Tue, 25 Nov 2025 07:52:23 +0000
> From: Sad Clouds <cryintothebluesky%gmail.com@localhost>
>
> I was just trying to understand why little-endian meta-data was used
> for encrypted swap.
It's good question.
Of course, for encrypted swap, interoperability isn't important; the
CPU just needs to be able to read back what it wrote since it booted.
But that's not the only consideration.
Code implementing cryptographic primitives is fairly costly, to
verify, test, audit, and maintain it: if you make a mistake --
especially in a context where there's no opportunity for a noisy
failure to interoperate -- it could just silently destroy any
security. That's why it's important to keep the number of primitives
low, and risky to adopt nonstandard variants like a big-endian version
of standard primitives.
More on that in the subthread about big-endian and Adiantum:
https://mail-index.NetBSD.org/port-sparc/2025/11/26/msg002996.html
(I actually did this for arc4random(3), but I regret that choice and
if I ever get around to exposing the vectorized ChaCha code in
sys/crypto/chacha to userland to speed up arc4random(3), I would nix
the big-endian option because it's too risky and painful to validate
and maintain.)
In this case, the performance hit nia observed for byte-swapping in
the AES-CBC code we use for encrypted swap wasn't nearly as much as
for Adiantum disk encryption. So it's not as big a deal that the AES
code uses le32dec/enc.
On the other hand, the byte order is also not as fundamental as it is
for Adiantum: AES _doesn't_ rely on the algebraic structure of
Z/(2^32)Z; it's all byte-by-byte substitution and permutations of the
16 byte positions. It's just kind of a pain to find all the shifts
and rotates that would need to be reversed if you went through and
replaced le32dec/enc by be32dec/enc.
If someone wanted to do that, it would still be standard AES and we
already have testing infrastructure in place to verify it. The code
in question for 32-bit sparc lives in these files:
sys/crypto/aes/aes_bear.c
sys/crypto/aes/aes_bear.h
sys/crypto/aes/aes_ct.c
sys/crypto/aes/aes_ct_dec.c
sys/crypto/aes/aes_ct_enc.c
And related newer code which might reasonably be flipped on for
sparc64 or all LP64 platforms lives in:
sys/crypto/aes/aes_bear64.c
sys/crypto/aes/aes_bear64.h
sys/crypto/aes/aes_ct64.c
sys/crypto/aes/aes_ct64_dec.c
sys/crypto/aes/aes_ct64_enc.c
But you'd have to thread it through the bitsliced representation that
we use for timing side channel resistance, and probably sprinkle #if
_BYTE_ORDER == _LITTLE_ENDIAN conditionals all over the place --
hence, kind of a pain. (Not to mention that for sparc64, I bet we can
use the little-endian address space identifier to recover almost all
of the performance without changing anything.)
Home |
Main Index |
Thread Index |
Old Index