tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: attaching cpu via lapic



On Mon, Aug 21, 2017 at 04:39:09PM +0530, Cherry G.Mathew wrote:
> Martin Husemann <martin%duskware.de@localhost> writes:
> 
> > On Mon, Aug 21, 2017 at 12:50:32AM +0000, Cherry G. Mathew wrote:
> >> In this case, the CPU is actually exported to the domU as a x86 cpu
> >> as well as vcpu. This has nothing to do with baremetal cpu which is
> >> invisible to HVM domain guests. So you can have upto the currently
> >> supported max of these CPUs regardless of the baremetal number.
> >
> >
> > Sorry, my Xen knowledge is minimal and I fail to parse the above - could
> > you elaborate?
> >
> 
> There's two modes that Xen operates under, broadly speaking, PV and
> HVM. There are "sub-modes" which are a 'mix' of these - PVHVM, and
> PVH. These are well documented on the Xen Wiki. [1]
> 
> PV aka "Paravirtualised" - is where the hypervisor runs at the highest
> privilege level, and the kernel runs either in an intermediate PL, or at
> the same PL as user programs (yes, security bugs have been found because
> of this). The advantage here is that the CPU hardware does not need to
> support any special virtualisation instructions. However, the OS kernel
> needs to pretty much explicitly call the hypervisor using a mechanism
> analogous to how userland calls the OS itself. Since both OS and
> userland calls go directly to Xen, it  mediates this by recognising
> whether the OS kernel or the user program under an OS made the call, and
> behaves accordingly. All this is pretty expensive, which is why, despite
> XenMP, our current PV based xen system is comparatively slow.
> 
> One of the things that is virtualised in this case is the abstraction of
> the cpu itself - called a VCPU. VCPUS are managed by xen, and you can do
> things like start them, stop them, or put them to sleep. These Virtual
> CPUS are pretty much CPU state contexts that are scheduled onto the
> actual CPUs, pretty much the same way that thread contexts are scheduled
> onto CPUS by an OS. From the OS point of view, it sees only VCPUS as the
> actual CPUs, since these are the onlythings that can be used to schedule
> threads. So cpu_info_list is basically a list of these VCPUs, and *NOT*
> the underlying baremetal CPUs - which only the hypervisor can manage.
> 
> EXCEPT:
> 
> dom0, which is the controlling domain that collaborates with XEN to
> provide things like device drivers and filesystem support, are allowed
> access to the MADT and ACPI tables (or a subset of them) which are
> various firmware tables to enumerate baremetal CPUs.  In the interest of
> providing driver support, for things like frequency scaling or
> temperature control (I'm not 100% sure if Xen allows access to this, but
> I'm trying to illustrate the situation), our dom0 does access these
> tables and bring up the respective drivers, seen as "xxx at cpu" in the
> xen/conf/files.xen config file. 
> 
> Remember that the cpus enumerated by these tables are *NOT* schedulable
> entities - therefore they don't get on the cpu_info_list , and they are
> purely used as a node on the config(9) chain. The actual schedulable
> cpus are still the VCPUS, mentioned above, which attach via a completely
> unrelated boot path. John's point was that these VCPUS , since they are
> scheduled contexts, can and often are more than the underlying baremetal
> number of VCPUs (not on dom0 though - although it's technically
> possible). So you could bring up a domU running on a uniprocessor board,
> with 8vcpus, if you like. dom0s have 1:1 correspondence with underlying
> physical CPUS.
> 
> Enter HVM,
> 
> Here, *everything* is virtualised, including the CPUS. The guest OS
> cannot differentiate between native hardware (except via explicit
> interfaces like virtio) and the virtualised container it is running
> under. If we ran NetBSD under HVM, the "physical"/"native" cpus that we
> see, are basically VCPUS from Xen's PoV. NetBSD uses the standard x86
> native code (mpacpi/mpbios) to probe for these CPUs and detect and
> schedule them.
> 
> EXCEPT
> 
> PVHVM - "Paravirtualised HVM" is a mode where *some* Xen functions are
> exported to the user domain. This makes the previously mentioned
> virtualised HVM cpus that NetBSD things are "baremetal" CPUs also
> accessible by a hypervisor mediated API. In this case, both the fully
> virtualised acpi/mpbios API and the 'xvcpu' API access schedulable
> VCPUs. This means that 'xvcpu's  are merely aliases to the acpi probed
> cpus. thus 1:1. These CPUs have nothing to do with underlying acpi CPUs.
> 
> 
> EXCEPT 
> 
> PVH mode is where the PVHVM cpus are also actually underlying baremetal
> CPUs.
> 
> Q.E.D

vCPUs are always virtualized, whether it's a PV/HVM/PVHVM/PVH guest,
Xen is the only entity that controls the physical CPUs (pCPUs),
everything else is an illusion created and in control by Xen.

The main difference between the modes is how the vCPUs are presented
and used by the guest:

                        +-----------------------------------------+
                        | HVM/NATIVE |      PV        |    PVH    |
    +-------------------------------------------------------------+
    | Enumeration       | ACPI MADT  | Hypercalls     | ACPI MADT |
    +-------------------------------------------------------------+
    | AP Bringup        | LAPIC      | Hypercalls     | LAPIC     |
    +-------------------------------------------------------------+
    | Hotplug           | ACPI GPE   | XenStore       | ?         |
    +-------------------------------------------------------------+
    | Interrupt routing | LAPIC/IDT  | Event channels | LAPIC/IDT |
    +-------------------------------------------------------------+

Dom0 has always been kind of special, because it is allowed to access
the unmodified ACPI tables present on the hardware (ie: it's allowed
to parse the full MADT and the processor objects in the DSDT).

This is going to change with PVH Dom0, and the ACPI tables provided to
Dom0 will match as close as possible the environment where Dom0 is
running. This is already in Xen upstream, and a PVH Dom0 gets a
crafted MADT in order to reflect the available number of CPUs to Dom0
[0], instead of seeing all the CPUs in the underlying hardware.

Roger.

[0] http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/hvm/dom0_build.c;h=020c355faf509d5681792744276abb6f0a5f1a64;hb=HEAD#l662



Home | Main Index | Thread Index | Old Index