tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

amd64: smap support

Here is a patch that implements SMAP on amd64. SMAP is basically a feature
that prevents the kernel from reading a userland page, and it's a great
exploit mitigation feature.

To function, it relies on two bits: CR4_SMAP in %cr4 and PSL_AC in %rflags.
When AC is cleared, any access to a userland page will generate a page fault,
which is caught as fatal. When AC is set, such an access will succeed without
fault. The logic is that when the kernel wants to touch a userland page (in
copyin for example), it needs to set AC, and then clear it once it's done.

Userland can set/clear AC as it wishes, because in usermode AC stands for
the Alignment Check, which has nothing to do with smap. The main implication
of this design is that PSL_AC needs to be saved/restored when entering/leaving
the kernel; it becomes part of the kernel context.

The patch works as follows:
 * two functions are added, smap_enable and smap_disable. The former clears
   PSL_AC, the latter sets it.
 * these two functions use the clac and stac instructions, which do not
   exist on CPUs that don't support smap. Therefore, a ret+int3+int3 opcode
   is crafted, and it is hot-patched at boot time if smap is supported.
 * smap_enable is called from INTRENTRY. Therefore, whenever we enter the
   kernel, we cannot access a userland page - which is the point here.
 * when leaving the kernel %rflags is already restored entirely, whether it
   is in sysretq or iretq. So the AC bit in the previous context is put back
 * in the copy* functions, smap_disable and smap_enable are called to open
   a window where the kernel can touch userland pages. Such a window looks
       callq   smap_disable
       /* touch userland page */
       callq   smap_enable
   if an interrupt or exception is received in this window, a new context is
   pushed by INTRENTRY, and it won't have PSL_AC set. The trap handler will
   return into the original context but will jump in a recover function. In
   this recover function, we are back with PSL_AC set, and call smap_enable
   to clear it and return an error.

This way, PSL_AC is set exclusively in the copy windows. On CPUs that don't
support SMAP, smap_enable and smap_disable return directly. The performance
cost in this case is a call+ret, so one write, one read and potentially an
icache line load.

 * there are a few places where smap_* is called twice. This could be
   optimized, but the patch is kept simple for now.
 * on Xen, smap (and smep) are not enabled, because they are used by the
   hypervisor to protect itself from dom kernels (us).
 * i386 requires a little more work, so I'm not adding smap there yet.

I've tested this patch mostly on Qemu - my most recent CPU only has smep.
Feel free to test it too, and I'll commit it in a few days, perhaps with a
few modifications.



Home | Main Index | Thread Index | Old Index