NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: What is the difference between nvmm-netbsd and kvm-linux?



I'm not subscribed to this list, but I saw this thread while searching for
an unrelated thing, so here are a few remarks:

> On 06.03.2020 12:35, Kamil Rytarowski wrote:
> > A hypervisor backend shall implement instruction decoder for MMIO/PIO
> > operations. NVMM performs this emulation in userspace, while others like
> > HAXM perform this inside the kernel.
> >
> > There are pros and cons but it is a distinct property of NVMM, but it is
> > definitely a more secure approach.
>
> Of course, I'm not an expert, but IMHO, this is more of an implementation
> feature than an architectural feature. Definitely it is safer, but
> probably more expensive, more context switching or maybe not. But this is
> quite interesting and it is not clear why other hypervisors do not have
> this.

The architectural implication of having MMIO/PIO emulated in userland is
that the kernel never accesses the guest memory. It creates it, it makes
it available to userland, but it never reads or writes to it. Every bug or
vulnerability is therefore moved from the kernel to userland, where the
impact is much more limited. Recent example:
	https://marc.info/?l=openbsd-tech&m=158176939604512&w=2
These are some vulns in OpenBSD (self-proclaimed "ultra secure OS"), the
biggest being a memory vulnerability that allows the guest to overwrite
host kernel memory. Structurally, this class of vulnerability cannot exist
in NVMM.

In terms of performance, there are several points. First of all, when an
MMIO/PIO is done by the guest, it always ends up in the emulator (Qemu, or
VirtualBox, or whatever), so necessarily there is one kernel<->userland
back-and-forth. Therefore, given that we necessarily return to userland,
doing the emulation in userland doesn't introduce an extra transition, so
it doesn't reduce performance. Actually, on the contrary, doing that in
userland is more performant than doing it in the kernel, because in the
kernel it must be done with preemption disabled (see it as holding a host
CPU lock), whereas in userland there is no CPU locking involved.

Having said that, indeed KVM will architecturally perform fewer syscalls,
because it emulates certain devices in kernel mode -- which can increase
performance because it avoids a kernel<->userland cycle, but can decrease
security (see bug class above).

What we do in NVMM to try to gain some performance in that regard, is (1)
we use a shared communication page in order to reduce data movements during
syscalls, and (2) we batch together certain syscalls and "commit" them only
in "burst mode", all at once in one userland->kernel transition. So even if
NVMM architecturally needs to perform more syscalls than KVM, we actually
manage, in the implementation, to reduce the number of syscalls, along with
reducing their individual cost.

In the end, NVMM is able to deliver good performance (much more than I had
initially hoped), while also maintaining good security properties. If we
compare to other same-size hypervisor implementations, NVMM is technically
very advanced. However it doesn't compare with KVM or HyperV, because these
are very big (cloud) solutions, which are backed by big corporations which
throw a lot of money into them.

Other than security/performance aspects, NVMM has a few more extra things,
like the virtualization API, which has no open source equivalent so far
(HVF on macOS is proprietary, WHPX on Windows is proprietary).

One general remark about design/performance. I recall seeing linux slides
about having KVM run in its own address space within the kernel (a kind of
"userland in the kernel"); the "KVM"<->kernel transitions will have a cost.
I also know that Microsoft is moving good bunches of the HyperV emulation
in dedicated userland processes, where that similar cost also exists.

It is possible that we will see bigger hypervisors switch to "everything in
another domain" architectures like NVMM does currently, in order to
increase security.

Maxime


Home | Main Index | Thread Index | Old Index