NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

PR/59774 CVS commit: src



The following reply was made to PR kern/59774; it has been noted by GNATS.

From: "Taylor R Campbell" <riastradh%netbsd.org@localhost>
To: gnats-bugs%gnats.NetBSD.org@localhost
Cc: 
Subject: PR/59774 CVS commit: src
Date: Sun, 23 Nov 2025 22:48:27 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Sun Nov 23 22:48:27 UTC 2025
 
 Modified Files:
 	src/sys/arch/x86/x86: identcpu.c
 	src/sys/conf: copts.mk
 	src/sys/crypto/aes/arch/x86: files.aessse2
 	src/tests/sys/crypto/aes: Makefile t_aes.c
 Added Files:
 	src/sys/crypto/aes/arch/x86: aes_sse2_4x32.c aes_sse2_4x32.h
 	    aes_sse2_4x32_dec.c aes_sse2_4x32_enc.c aes_sse2_4x32_impl.c
 	    aes_sse2_4x32_impl.h aes_sse2_4x32_subr.c aes_sse2_4x32_subr.h
 Removed Files:
 	src/sys/crypto/aes/arch/x86: aes_sse2.c aes_sse2.h aes_sse2_dec.c
 	    aes_sse2_enc.c aes_sse2_impl.c aes_sse2_impl.h aes_sse2_subr.c
 
 Log Message:
 aes(9): Rewrite x86 SSE2 implementation.
 
 This computes eight AES_k instances simultaneously, using the
 bitsliced 32-bit aes_ct logic which computes two blocks at a time in
 uint32_t arithmetic, vectorized four ways.
 
 Previously, the SSE2 code was a very naive adaptation of aes_ct64,
 which computes four blocks at a time in uint64_t arithmetic, without
 any 2x vectorization -- I did it at the time because:
 
 (a) it was easier to get working,
 (b) it only affects really old hardware with neither AES-NI nor SSSE3
     which are both much much faster.
 
 But it was bugging me that this was a kind of dumb use of SSE2.
 
 Substantially reduces stack usage (from ~1200 bytes to ~800 bytes)
 and should approximately double throughput for CBC decryption and for
 XTS encryption/decryption.
 
 I also tried a 2x64 version but cursory performance measurements
 didn't reveal much benefit over 4x32.  (If anyone is interested in
 doing more serious performance measurements, on ancient hardware for
 which it might matter, I also have the 2x64 code around.)
 
 Prompted by:
 
 PR kern/59774: bearssl 32-bit AES is too slow, want 64-bit optimized
 version in kernel
 
 
 To generate a diff of this commit:
 cvs rdiff -u -r1.138 -r1.139 src/sys/arch/x86/x86/identcpu.c
 cvs rdiff -u -r1.13 -r1.14 src/sys/conf/copts.mk
 cvs rdiff -u -r1.2 -r0 src/sys/crypto/aes/arch/x86/aes_sse2.c
 cvs rdiff -u -r1.4 -r0 src/sys/crypto/aes/arch/x86/aes_sse2.h \
     src/sys/crypto/aes/arch/x86/aes_sse2_subr.c
 cvs rdiff -u -r0 -r1.1 src/sys/crypto/aes/arch/x86/aes_sse2_4x32.c \
     src/sys/crypto/aes/arch/x86/aes_sse2_4x32.h \
     src/sys/crypto/aes/arch/x86/aes_sse2_4x32_dec.c \
     src/sys/crypto/aes/arch/x86/aes_sse2_4x32_enc.c \
     src/sys/crypto/aes/arch/x86/aes_sse2_4x32_impl.c \
     src/sys/crypto/aes/arch/x86/aes_sse2_4x32_impl.h \
     src/sys/crypto/aes/arch/x86/aes_sse2_4x32_subr.c \
     src/sys/crypto/aes/arch/x86/aes_sse2_4x32_subr.h
 cvs rdiff -u -r1.1 -r0 src/sys/crypto/aes/arch/x86/aes_sse2_dec.c \
     src/sys/crypto/aes/arch/x86/aes_sse2_enc.c
 cvs rdiff -u -r1.5 -r0 src/sys/crypto/aes/arch/x86/aes_sse2_impl.c
 cvs rdiff -u -r1.3 -r0 src/sys/crypto/aes/arch/x86/aes_sse2_impl.h
 cvs rdiff -u -r1.2 -r1.3 src/sys/crypto/aes/arch/x86/files.aessse2
 cvs rdiff -u -r1.9 -r1.10 src/tests/sys/crypto/aes/Makefile
 cvs rdiff -u -r1.5 -r1.6 src/tests/sys/crypto/aes/t_aes.c
 
 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.
 


Home | Main Index | Thread Index | Old Index