Source-Changes archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
CVS commit: src
Module Name: src
Committed By: riastradh
Date: Sun Nov 23 22:48:27 UTC 2025
Modified Files:
src/sys/arch/x86/x86: identcpu.c
src/sys/conf: copts.mk
src/sys/crypto/aes/arch/x86: files.aessse2
src/tests/sys/crypto/aes: Makefile t_aes.c
Added Files:
src/sys/crypto/aes/arch/x86: aes_sse2_4x32.c aes_sse2_4x32.h
aes_sse2_4x32_dec.c aes_sse2_4x32_enc.c aes_sse2_4x32_impl.c
aes_sse2_4x32_impl.h aes_sse2_4x32_subr.c aes_sse2_4x32_subr.h
Removed Files:
src/sys/crypto/aes/arch/x86: aes_sse2.c aes_sse2.h aes_sse2_dec.c
aes_sse2_enc.c aes_sse2_impl.c aes_sse2_impl.h aes_sse2_subr.c
Log Message:
aes(9): Rewrite x86 SSE2 implementation.
This computes eight AES_k instances simultaneously, using the
bitsliced 32-bit aes_ct logic which computes two blocks at a time in
uint32_t arithmetic, vectorized four ways.
Previously, the SSE2 code was a very naive adaptation of aes_ct64,
which computes four blocks at a time in uint64_t arithmetic, without
any 2x vectorization -- I did it at the time because:
(a) it was easier to get working,
(b) it only affects really old hardware with neither AES-NI nor SSSE3
which are both much much faster.
But it was bugging me that this was a kind of dumb use of SSE2.
Substantially reduces stack usage (from ~1200 bytes to ~800 bytes)
and should approximately double throughput for CBC decryption and for
XTS encryption/decryption.
I also tried a 2x64 version but cursory performance measurements
didn't reveal much benefit over 4x32. (If anyone is interested in
doing more serious performance measurements, on ancient hardware for
which it might matter, I also have the 2x64 code around.)
Prompted by:
PR kern/59774: bearssl 32-bit AES is too slow, want 64-bit optimized
version in kernel
To generate a diff of this commit:
cvs rdiff -u -r1.138 -r1.139 src/sys/arch/x86/x86/identcpu.c
cvs rdiff -u -r1.13 -r1.14 src/sys/conf/copts.mk
cvs rdiff -u -r1.2 -r0 src/sys/crypto/aes/arch/x86/aes_sse2.c
cvs rdiff -u -r1.4 -r0 src/sys/crypto/aes/arch/x86/aes_sse2.h \
src/sys/crypto/aes/arch/x86/aes_sse2_subr.c
cvs rdiff -u -r0 -r1.1 src/sys/crypto/aes/arch/x86/aes_sse2_4x32.c \
src/sys/crypto/aes/arch/x86/aes_sse2_4x32.h \
src/sys/crypto/aes/arch/x86/aes_sse2_4x32_dec.c \
src/sys/crypto/aes/arch/x86/aes_sse2_4x32_enc.c \
src/sys/crypto/aes/arch/x86/aes_sse2_4x32_impl.c \
src/sys/crypto/aes/arch/x86/aes_sse2_4x32_impl.h \
src/sys/crypto/aes/arch/x86/aes_sse2_4x32_subr.c \
src/sys/crypto/aes/arch/x86/aes_sse2_4x32_subr.h
cvs rdiff -u -r1.1 -r0 src/sys/crypto/aes/arch/x86/aes_sse2_dec.c \
src/sys/crypto/aes/arch/x86/aes_sse2_enc.c
cvs rdiff -u -r1.5 -r0 src/sys/crypto/aes/arch/x86/aes_sse2_impl.c
cvs rdiff -u -r1.3 -r0 src/sys/crypto/aes/arch/x86/aes_sse2_impl.h
cvs rdiff -u -r1.2 -r1.3 src/sys/crypto/aes/arch/x86/files.aessse2
cvs rdiff -u -r1.9 -r1.10 src/tests/sys/crypto/aes/Makefile
cvs rdiff -u -r1.5 -r1.6 src/tests/sys/crypto/aes/t_aes.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Home |
Main Index |
Thread Index |
Old Index