tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

nvmm & pku



Hi NetBSD devs,

I didn't know whom to approach as it's not clear if there's an active
maintainer of nvmm or not. I felt this list is probably a good starting
point given the topic at hand.

I'm an OpenBSD developer (dv@) that works primarily on our hypervisor
(vmm/vmd) and had a few things to bring to NetBSD's attention around
nvmm with respect to userland memory protection keys (PKU).

We're in the middle of implementing an x-only solution for the kernel
and userland and on amd64 leveraging a kernel-managed solution using the
PKU feature in recent Intel/AMD hardware. If you're not aware, it's some
additional page table bits that allow one to assign a "key" value to a
page table entry and the "key" corresponds to a bit pair in a 32-bit
register (PKRU) value. That register dictates read-deny and write-deny
permissions adding another layer of page-level permissions to amd64 that
does *not* impact the ability of the cpu to do an instruction fetch.

The challenge is Intel designed this register to be:
 * manipulated by userland (cpl=3) instructions (WRPKRU/RDPKRU)
 * per-cpu core

It has a bit of a VMX VMCS feel to it in a way.

As for hypervisors and why this pertains to nvmm, this does mean *any*
guest can overwrite the permissions bitfield in that register visible to
*all* nvmm guests. As guest workloads move cores, this value not only
changes, but guests can trample each others previous values and change
page level read/write permissions for others.

Even if NetBSD isn't using PKU, nvmm guests *can* if the CPU supports it
and they flip CR4.PKE to 1.

I think nvmm needs to implement one of the following things (like we're
doing in OpenBSD shortly) to keep this PKU stuff sane:

 1) disable PKU support for guests by masking the cpuid extended feature
    bit advertising support (I believe this is already done) and *also*
    intercepting and prohibiting flipping the CR4 PKE bit to 1. The
    interception of the CR4 bit flip is key. Without it the bit set,
    wrpkru/rdpkru generate #UD, so you can prohibit usage.

    I believe the CR4 intercept is done only in nvmm's VMX functionality
    and luckily is preventing that bit being set, but I'm not 100%
    positive about this. I do know it is *not* done for SVM guests.

or

  2) save and restore the PKRU value at guest entry/exit, similar to FPU
     state.

I think (2) is the right approach and I have a diff for OpenBSD already
that should be landing shortly. For now, as we're testing the use of PKU
in OpenBSD, we have it disabled when we detect we're running under a
hypervisor.

We plan on enabling it prior to the release of OpenBSD 7.3 and this will
have ramifications for OpenBSD guests under nvmm potentially trampling
or breaking other guests. I'm happy to have a conversation with any
NetBSD devs looking to update nvmm to support these features.

-Dave Voutila (dv%openbsd.org@localhost)



Home | Main Index | Thread Index | Old Index