Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 9.99.69 panic - libcrypto changes?



> Date: Thu, 2 Jul 2020 23:09:16 +0100
> From: Chavdar Ivanov <ci4ic4%gmail.com@localhost>
> 
> On amd64 9.99.69 from yesterday I get:
> [...]
> System panicked: fpudna from kernel, ip 0xffffffff802292af, trapframe
> 0xffffbe013c564a50
> [...]
> Xtrap07() at Xtrap07+0xbd
> aesni_enc_impl() at aesni_enc_impl+0x1c
> rijndaelEncrypt() at rijndaelEncrypt+0x4b
> ccmp_init_blocks() at ccmp_init_blocks+0xe8
> [...]

I am investigating.  There must be a bug somewhere in the x86 vector
register state management I used to used to allow the kernel to use
AES-NI, but I'm not yet sure what it is.

> My WiFi link (iwm) is also visibly slower than usual. The panic
> happened while I was running 'pkgin upgrade' over an NFS mount through
> the iwm adapter.

This is likely an unintended side effect of my recent AES rework
(https://mail-index.netbsd.org/tech-kern/2020/06/18/msg026505.html).

For systems where we can take advantage of hardware AES support, like
yours, after every call into the AES subsystem, the kernel will zero
the vector registers to avoid leaking secrets through Spectre-class
speculative execution attacks.

Although your kernel is evidently now taking advantage of hardware
support for AES (the x86 AES-NI CPU instructions), which is much
faster than software AES, the logic in our 802.11 stack to compute
CCMP (the authenticated cipher used in your WPA setup) calls the AES
block cipher one block at a time.

So it's zeroing all the vector registers for every 16 bytes of data in
every frame -- twice, because AES-CCM involves two block cipher calls
for every block of data (one for the AES-CBC-MAC authenticator, one
for the AES-CTR encryption pad).  I expect this is the source of the
slowdown you're witnessing.


There are a few ways we could work around this:

1. Push the AES-CCM computation into the AES subsystem, so we only
   zero the vector registers once per frame, or once per mbuf segment.
   This requires a bit of work but if I can find CCMP test vectors
   then it shouldn't be too hard.  At worst, it will require redoing
   when the wifi branch is merged.

2. Push ieee80211_crypto_* into a worker thread, and use
   <https://mail-index.netbsd.org/tech-kern/2020/06/20/msg026524.html>
   to avoid zeroing the vector registers.  However, this may require
   some design changes in the 802.11 stack and it's not clear that
   they're the right changes or that this can be done quickly.

3. Invent a new nestable transaction mechanism to defer zeroing the
   vector registers.  However, there might also be a penalty to
   enabling or disabling the fpu, so it might not solve the whole
   problem, and it is not entirely clear what it should mean in an MI
   context.

Another approach, of course, is to simply use an open wifi network
instead -- generally hop-by-hop authenticated encryption like WPA is
not worth much compared to end-to-end authenticated encryption like
TLS, SSH, or Wireguard.


Home | Main Index | Thread Index | Old Index