CVS commit: pkgsrc/sysutils/xenkernel411

To: pkgsrc-changes%NetBSD.org@localhost
Subject: CVS commit: pkgsrc/sysutils/xenkernel411
From: "Manuel Bouyer" <bouyer%netbsd.org@localhost>
Date: Wed, 28 Nov 2018 14:00:49 +0000

Module Name:    pkgsrc
Committed By:   bouyer
Date:           Wed Nov 28 14:00:49 UTC 2018

Modified Files:
        pkgsrc/sysutils/xenkernel411: Makefile distinfo
Added Files:
        pkgsrc/sysutils/xenkernel411/patches: patch-XSA269 patch-XSA275-1
            patch-XSA275-2 patch-XSA276-1 patch-XSA276-2 patch-XSA277
            patch-XSA278 patch-XSA279 patch-XSA280-1 patch-XSA280-2
            patch-XSA282-1 patch-XSA282-2

Log Message:
Apply available security patches relevant for Xen 4.11, up to XSA282.
Bump PKGREVISION


To generate a diff of this commit:
cvs rdiff -u -r1.2 -r1.3 pkgsrc/sysutils/xenkernel411/Makefile
cvs rdiff -u -r1.1 -r1.2 pkgsrc/sysutils/xenkernel411/distinfo
cvs rdiff -u -r0 -r1.1 pkgsrc/sysutils/xenkernel411/patches/patch-XSA269 \
    pkgsrc/sysutils/xenkernel411/patches/patch-XSA275-1 \
    pkgsrc/sysutils/xenkernel411/patches/patch-XSA275-2 \
    pkgsrc/sysutils/xenkernel411/patches/patch-XSA276-1 \
    pkgsrc/sysutils/xenkernel411/patches/patch-XSA276-2 \
    pkgsrc/sysutils/xenkernel411/patches/patch-XSA277 \
    pkgsrc/sysutils/xenkernel411/patches/patch-XSA278 \
    pkgsrc/sysutils/xenkernel411/patches/patch-XSA279 \
    pkgsrc/sysutils/xenkernel411/patches/patch-XSA280-1 \
    pkgsrc/sysutils/xenkernel411/patches/patch-XSA280-2 \
    pkgsrc/sysutils/xenkernel411/patches/patch-XSA282-1 \
    pkgsrc/sysutils/xenkernel411/patches/patch-XSA282-2

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: pkgsrc/sysutils/xenkernel411/Makefile
diff -u pkgsrc/sysutils/xenkernel411/Makefile:1.2 pkgsrc/sysutils/xenkernel411/Makefile:1.3
--- pkgsrc/sysutils/xenkernel411/Makefile:1.2   Tue Jul 24 17:29:09 2018
+++ pkgsrc/sysutils/xenkernel411/Makefile       Wed Nov 28 14:00:49 2018
@@ -1,7 +1,7 @@
-# $NetBSD: Makefile,v 1.2 2018/07/24 17:29:09 maya Exp $
+# $NetBSD: Makefile,v 1.3 2018/11/28 14:00:49 bouyer Exp $
 
 VERSION=       4.11.0
-#PKGREVISION=  4
+PKGREVISION=   1
 DISTNAME=      xen-${VERSION}
 PKGNAME=       xenkernel411-${VERSION}
 CATEGORIES=    sysutils

Index: pkgsrc/sysutils/xenkernel411/distinfo
diff -u pkgsrc/sysutils/xenkernel411/distinfo:1.1 pkgsrc/sysutils/xenkernel411/distinfo:1.2
--- pkgsrc/sysutils/xenkernel411/distinfo:1.1   Tue Jul 24 13:40:11 2018
+++ pkgsrc/sysutils/xenkernel411/distinfo       Wed Nov 28 14:00:49 2018
@@ -1,10 +1,22 @@
-$NetBSD: distinfo,v 1.1 2018/07/24 13:40:11 bouyer Exp $
+$NetBSD: distinfo,v 1.2 2018/11/28 14:00:49 bouyer Exp $
 
 SHA1 (xen411/xen-4.11.0.tar.gz) = 32b0657002bcd1992dcb6b7437dd777463f3b59a
 RMD160 (xen411/xen-4.11.0.tar.gz) = a2195b67ffd4bc1e6fc36bfc580ee9efe1ae708c
 SHA512 (xen411/xen-4.11.0.tar.gz) = 33d431c194f10d5ee767558404a1f80a66b3df019012b0bbd587fcbc9524e1bba7ea04269020ce891fe9d211d2f81c63bf78abedcdbe1595aee26251c803a50a
 Size (xen411/xen-4.11.0.tar.gz) = 25131533 bytes
 SHA1 (patch-Config.mk) = 9372a09efd05c9fbdbc06f8121e411fcb7c7ba65
+SHA1 (patch-XSA269) = baf135f05bbd82fea426a807877ddb1796545c5c
+SHA1 (patch-XSA275-1) = 7097ee5e1c073a0029494ed9ccf8c786d6c4034f
+SHA1 (patch-XSA275-2) = e286286a751c878f5138e3793835c61a11cf4742
+SHA1 (patch-XSA276-1) = 0b1e4b7620bb64f3a82671a172810c12bad91154
+SHA1 (patch-XSA276-2) = ef0e94925f1a281471b066719674bba5ecca8a61
+SHA1 (patch-XSA277) = 845afbe1f1cfdad5da44029f2f3073e1d45ef259
+SHA1 (patch-XSA278) = f344db46772536bb914ed32f2529424342cb81b0
+SHA1 (patch-XSA279) = 6bc022aba315431d916b2d9f6ccd92942e74818a
+SHA1 (patch-XSA280-1) = 401627a7cc80d77c4ab4fd9654a89731467b0bdf
+SHA1 (patch-XSA280-2) = 8317f7d8664fe32a938470a225ebb33a78edfdc6
+SHA1 (patch-XSA282-1) = e790657be970c71ee7c301b7f16bd4e4d282586a
+SHA1 (patch-XSA282-2) = 8919314eadca7e5a16104db1c2101dc702a67f91
 SHA1 (patch-xen_Makefile) = 465388d80de414ca3bb84faefa0f52d817e423a6
 SHA1 (patch-xen_Rules.mk) = c743dc63f51fc280d529a7d9e08650292c171dac
 SHA1 (patch-xen_arch_x86_Rules.mk) = 0bedfc53a128a87b6a249ae04fbdf6a053bfb70b

Added files:

Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA269
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA269:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA269   Wed Nov 28 14:00:49 2018
@@ -0,0 +1,114 @@
+$NetBSD: patch-XSA269,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From: Andrew Cooper <andrew.cooper3%citrix.com@localhost>
+Subject: x86/vtx: Fix the checking for unknown/invalid MSR_DEBUGCTL bits
+
+The VPMU_MODE_OFF early-exit in vpmu_do_wrmsr() introduced by c/s
+11fe998e56 bypasses all reserved bit checking in the general case.  As a
+result, a guest can enable BTS when it shouldn't be permitted to, and
+lock up the entire host.
+
+With vPMU active (not a security supported configuration, but useful for
+debugging), the reserved bit checking in broken, caused by the original
+BTS changeset 1a8aa75ed.
+
+From a correctness standpoint, it is not possible to have two different
+pieces of code responsible for different parts of value checking, if
+there isn't an accumulation of bits which have been checked.  A
+practical upshot of this is that a guest can set any value it
+wishes (usually resulting in a vmentry failure for bad guest state).
+
+Therefore, fix this by implementing all the reserved bit checking in the
+main MSR_DEBUGCTL block, and removing all handling of DEBUGCTL from the
+vPMU MSR logic.
+
+This is XSA-269
+
+Signed-off-by: Andrew Cooper <andrew.cooper3%citrix.com@localhost>
+Reviewed-by: Jan Beulich <jbeulich%suse.com@localhost>
+
+diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
+index 207e2e7..d4444f0 100644
+--- xen/arch/x86/cpu/vpmu_intel.c.orig
++++ xen/arch/x86/cpu/vpmu_intel.c
+@@ -535,27 +535,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
+     uint64_t *enabled_cntrs;
+ 
+     if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
+-    {
+-        /* Special handling for BTS */
+-        if ( msr == MSR_IA32_DEBUGCTLMSR )
+-        {
+-            supported |= IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS |
+-                         IA32_DEBUGCTLMSR_BTINT;
+-
+-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
+-                supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
+-                             IA32_DEBUGCTLMSR_BTS_OFF_USR;
+-            if ( !(msr_content & ~supported) &&
+-                 vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+-                return 0;
+-            if ( (msr_content & supported) &&
+-                 !vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+-                printk(XENLOG_G_WARNING
+-                       "%pv: Debug Store unsupported on this CPU\n",
+-                       current);
+-        }
+         return -EINVAL;
+-    }
+ 
+     ASSERT(!supported);
+ 
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index 9707514..ae028dd 100644
+--- xen/arch/x86/hvm/vmx/vmx.c.orig
++++ xen/arch/x86/hvm/vmx/vmx.c
+@@ -3032,11 +3032,14 @@ void vmx_vlapic_msr_changed(struct vcpu *v)
+ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
+ {
+     struct vcpu *v = current;
++    const struct cpuid_policy *cp = v->domain->arch.cpuid;
+ 
+     HVM_DBG_LOG(DBG_LEVEL_MSR, "ecx=%#x, msr_value=%#"PRIx64, msr, msr_content);
+ 
+     switch ( msr )
+     {
++        uint64_t rsvd;
++
+     case MSR_IA32_SYSENTER_CS:
+         __vmwrite(GUEST_SYSENTER_CS, msr_content);
+         break;
+@@ -3091,16 +3094,26 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
+ 
+     case MSR_IA32_DEBUGCTLMSR: {
+         int i, rc = 0;
+-        uint64_t supported = IA32_DEBUGCTLMSR_LBR | IA32_DEBUGCTLMSR_BTF;
+ 
+-        if ( boot_cpu_has(X86_FEATURE_RTM) )
+-            supported |= IA32_DEBUGCTLMSR_RTM;
+-        if ( msr_content & ~supported )
++        rsvd = ~(IA32_DEBUGCTLMSR_LBR | IA32_DEBUGCTLMSR_BTF);
++
++        /* TODO: Wire vPMU settings properly through the CPUID policy */
++        if ( vpmu_is_set(vcpu_vpmu(v), VPMU_CPU_HAS_BTS) )
+         {
+-            /* Perhaps some other bits are supported in vpmu. */
+-            if ( vpmu_do_wrmsr(msr, msr_content, supported) )
+-                break;
++            rsvd &= ~(IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS |
++                      IA32_DEBUGCTLMSR_BTINT);
++
++            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
++                rsvd &= ~(IA32_DEBUGCTLMSR_BTS_OFF_OS |
++                          IA32_DEBUGCTLMSR_BTS_OFF_USR);
+         }
++
++        if ( cp->feat.rtm )
++            rsvd &= ~IA32_DEBUGCTLMSR_RTM;
++
++        if ( msr_content & rsvd )
++            goto gp_fault;
++
+         if ( msr_content & IA32_DEBUGCTLMSR_LBR )
+         {
+             const struct lbr_info *lbr = last_branch_msr_get();
Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA275-1
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA275-1:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA275-1 Wed Nov 28 14:00:49 2018
@@ -0,0 +1,106 @@
+$NetBSD: patch-XSA275-1,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From: Roger Pau Monné <roger.pau%citrix.com@localhost>
+Subject: amd/iommu: fix flush checks
+
+Flush checking for AMD IOMMU didn't check whether the previous entry
+was present, or whether the flags (writable/readable) changed in order
+to decide whether a flush should be executed.
+
+Fix this by taking the writable/readable/next-level fields into account,
+together with the present bit.
+
+Along these lines the flushing in amd_iommu_map_page() must not be
+omitted for PV domains. The comment there was simply wrong: Mappings may
+very well change, both their addresses and their permissions. Ultimately
+this should honor iommu_dont_flush_iotlb, but to achieve this
+amd_iommu_ops first needs to gain an .iotlb_flush hook.
+
+Also make clear_iommu_pte_present() static, to demonstrate there's no
+caller omitting the (subsequent) flush.
+
+This is part of XSA-275.
+
+Reported-by: Paul Durrant <paul.durrant%citrix.com@localhost>
+Signed-off-by: Roger Pau Monné <roger.pau%citrix.com@localhost>
+Signed-off-by: Jan Beulich <jbeulich%suse.com@localhost>
+
+--- xen/drivers/passthrough/amd/iommu_map.c.orig
++++ xen/drivers/passthrough/amd/iommu_map.c
+@@ -35,7 +35,7 @@ static unsigned int pfn_to_pde_idx(unsig
+     return idx;
+ }
+ 
+-void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long gfn)
++static void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long gfn)
+ {
+     u64 *table, *pte;
+ 
+@@ -49,23 +49,42 @@ static bool_t set_iommu_pde_present(u32
+                                     unsigned int next_level,
+                                     bool_t iw, bool_t ir)
+ {
+-    u64 addr_lo, addr_hi, maddr_old, maddr_next;
++    uint64_t addr_lo, addr_hi, maddr_next;
+     u32 entry;
+-    bool_t need_flush = 0;
++    bool need_flush = false, old_present;
+ 
+     maddr_next = (u64)next_mfn << PAGE_SHIFT;
+ 
+-    addr_hi = get_field_from_reg_u32(pde[1],
+-                                     IOMMU_PTE_ADDR_HIGH_MASK,
+-                                     IOMMU_PTE_ADDR_HIGH_SHIFT);
+-    addr_lo = get_field_from_reg_u32(pde[0],
+-                                     IOMMU_PTE_ADDR_LOW_MASK,
+-                                     IOMMU_PTE_ADDR_LOW_SHIFT);
+-
+-    maddr_old = (addr_hi << 32) | (addr_lo << PAGE_SHIFT);
+-
+-    if ( maddr_old != maddr_next )
+-        need_flush = 1;
++    old_present = get_field_from_reg_u32(pde[0], IOMMU_PTE_PRESENT_MASK,
++                                         IOMMU_PTE_PRESENT_SHIFT);
++    if ( old_present )
++    {
++        bool old_r, old_w;
++        unsigned int old_level;
++        uint64_t maddr_old;
++
++        addr_hi = get_field_from_reg_u32(pde[1],
++                                         IOMMU_PTE_ADDR_HIGH_MASK,
++                                         IOMMU_PTE_ADDR_HIGH_SHIFT);
++        addr_lo = get_field_from_reg_u32(pde[0],
++                                         IOMMU_PTE_ADDR_LOW_MASK,
++                                         IOMMU_PTE_ADDR_LOW_SHIFT);
++        old_level = get_field_from_reg_u32(pde[0],
++                                           IOMMU_PDE_NEXT_LEVEL_MASK,
++                                           IOMMU_PDE_NEXT_LEVEL_SHIFT);
++        old_w = get_field_from_reg_u32(pde[1],
++                                       IOMMU_PTE_IO_WRITE_PERMISSION_MASK,
++                                       IOMMU_PTE_IO_WRITE_PERMISSION_SHIFT);
++        old_r = get_field_from_reg_u32(pde[1],
++                                       IOMMU_PTE_IO_READ_PERMISSION_MASK,
++                                       IOMMU_PTE_IO_READ_PERMISSION_SHIFT);
++
++        maddr_old = (addr_hi << 32) | (addr_lo << PAGE_SHIFT);
++
++        if ( maddr_old != maddr_next || iw != old_w || ir != old_r ||
++             old_level != next_level )
++            need_flush = true;
++    }
+ 
+     addr_lo = maddr_next & DMA_32BIT_MASK;
+     addr_hi = maddr_next >> 32;
+@@ -687,10 +706,7 @@ int amd_iommu_map_page(struct domain *d,
+     if ( !need_flush )
+         goto out;
+ 
+-    /* 4K mapping for PV guests never changes, 
+-     * no need to flush if we trust non-present bits */
+-    if ( is_hvm_domain(d) )
+-        amd_iommu_flush_pages(d, gfn, 0);
++    amd_iommu_flush_pages(d, gfn, 0);
+ 
+     for ( merge_level = IOMMU_PAGING_MODE_LEVEL_2;
+           merge_level <= hd->arch.paging_mode; merge_level++ )
Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA275-2
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA275-2:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA275-2 Wed Nov 28 14:00:49 2018
@@ -0,0 +1,70 @@
+$NetBSD: patch-XSA275-2,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From: Jan Beulich <jbeulich%suse.com@localhost>
+Subject: AMD/IOMMU: suppress PTE merging after initial table creation
+
+The logic is not fit for this purpose, so simply disable its use until
+it can be fixed / replaced. Note that this re-enables merging for the
+table creation case, which was disabled as a (perhaps unintended) side
+effect of the earlier "amd/iommu: fix flush checks". It relies on no
+page getting mapped more than once (with different properties) in this
+process, as that would still be beyond what the merging logic can cope
+with. But arch_iommu_populate_page_table() guarantees this afaict.
+
+This is part of XSA-275.
+
+Reported-by: Paul Durrant <paul.durrant%citrix.com@localhost>
+Signed-off-by: Jan Beulich <jbeulich%suse.com@localhost>
+
+--- xen/drivers/passthrough/amd/iommu_map.c.orig
++++ xen/drivers/passthrough/amd/iommu_map.c
+@@ -702,11 +702,24 @@ int amd_iommu_map_page(struct domain *d,
+                                        !!(flags & IOMMUF_writable),
+                                        !!(flags & IOMMUF_readable));
+ 
+-    /* Do not increase pde count if io mapping has not been changed */
+-    if ( !need_flush )
+-        goto out;
++    if ( need_flush )
++    {
++        amd_iommu_flush_pages(d, gfn, 0);
++        /* No further merging, as the logic doesn't cope. */
++        hd->arch.no_merge = true;
++    }
+ 
+-    amd_iommu_flush_pages(d, gfn, 0);
++    /*
++     * Suppress merging of non-R/W mappings or after initial table creation,
++     * as the merge logic does not cope with this.
++     */
++    if ( hd->arch.no_merge || flags != (IOMMUF_writable | IOMMUF_readable) )
++        goto out;
++    if ( d->creation_finished )
++    {
++        hd->arch.no_merge = true;
++        goto out;
++    }
+ 
+     for ( merge_level = IOMMU_PAGING_MODE_LEVEL_2;
+           merge_level <= hd->arch.paging_mode; merge_level++ )
+@@ -780,6 +793,10 @@ int amd_iommu_unmap_page(struct domain *
+ 
+     /* mark PTE as 'page not present' */
+     clear_iommu_pte_present(pt_mfn[1], gfn);
++
++    /* No further merging in amd_iommu_map_page(), as the logic doesn't cope. */
++    hd->arch.no_merge = true;
++
+     spin_unlock(&hd->arch.mapping_lock);
+ 
+     amd_iommu_flush_pages(d, gfn, 0);
+--- xen/include/asm-x86/iommu.h.orig
++++ xen/include/asm-x86/iommu.h
+@@ -40,6 +40,7 @@ struct arch_iommu
+ 
+     /* amd iommu support */
+     int paging_mode;
++    bool no_merge;
+     struct page_info *root_table;
+     struct guest_iommu *g_iommu;
+ };
Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA276-1
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA276-1:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA276-1 Wed Nov 28 14:00:49 2018
@@ -0,0 +1,122 @@
+$NetBSD: patch-XSA276-1,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From bcc115ba39d2985dcf356ba8a9ac291e314f1f0f Mon Sep 17 00:00:00 2001
+From: Jan Beulich <JBeulich%suse.com@localhost>
+Date: Thu, 11 Oct 2018 04:00:26 -0600
+Subject: [PATCH 1/2] x86/hvm/ioreq: fix page referencing
+
+The code does not take a page reference in hvm_alloc_ioreq_mfn(), only a
+type reference. This can lead to a situation where a malicious domain with
+XSM_DM_PRIV can engineer a sequence as follows:
+
+- create IOREQ server: no pages as yet.
+- acquire resource: page allocated, total 0.
+- decrease reservation: -1 ref, total -1.
+
+This will cause Xen to hit a BUG_ON() in free_domheap_pages().
+
+This patch fixes the issue by changing the call to get_page_type() in
+hvm_alloc_ioreq_mfn() to a call to get_page_and_type(). This change
+in turn requires an extra put_page() in hvm_free_ioreq_mfn() in the case
+that _PGC_allocated is still set (i.e. a decrease reservation has not
+occurred) to avoid the page being leaked.
+
+This is part of XSA-276.
+
+Reported-by: Julien Grall <julien.grall%arm.com@localhost>
+Reported-by: Julien Grall <julien.grall%arm.com@localhost>
+Signed-off-by: Paul Durrant <paul.durrant%citrix.com@localhost>
+Signed-off-by: Jan Beulich <jbeulich%suse.com@localhost>
+---
+ xen/arch/x86/hvm/ioreq.c | 46 +++++++++++++++++++++++++++-------------
+ 1 file changed, 31 insertions(+), 15 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
+index f39f391929..bdc2687014 100644
+--- xen/arch/x86/hvm/ioreq.c.orig
++++ xen/arch/x86/hvm/ioreq.c
+@@ -327,6 +327,7 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+ {
+     struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
++    struct page_info *page;
+ 
+     if ( iorp->page )
+     {
+@@ -349,27 +350,33 @@ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+      * could fail if the emulating domain has already reached its
+      * maximum allocation.
+      */
+-    iorp->page = alloc_domheap_page(s->emulator, MEMF_no_refcount);
++    page = alloc_domheap_page(s->emulator, MEMF_no_refcount);
+ 
+-    if ( !iorp->page )
++    if ( !page )
+         return -ENOMEM;
+ 
+-    if ( !get_page_type(iorp->page, PGT_writable_page) )
+-        goto fail1;
++    if ( !get_page_and_type(page, s->emulator, PGT_writable_page) )
++    {
++        /*
++         * The domain can't possibly know about this page yet, so failure
++         * here is a clear indication of something fishy going on.
++         */
++        domain_crash(s->emulator);
++        return -ENODATA;
++    }
+ 
+-    iorp->va = __map_domain_page_global(iorp->page);
++    iorp->va = __map_domain_page_global(page);
+     if ( !iorp->va )
+-        goto fail2;
++        goto fail;
+ 
++    iorp->page = page;
+     clear_page(iorp->va);
+     return 0;
+ 
+- fail2:
+-    put_page_type(iorp->page);
+-
+- fail1:
+-    put_page(iorp->page);
+-    iorp->page = NULL;
++ fail:
++    if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
++        put_page(page);
++    put_page_and_type(page);
+ 
+     return -ENOMEM;
+ }
+@@ -377,15 +384,24 @@ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+ static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+ {
+     struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
++    struct page_info *page = iorp->page;
+ 
+-    if ( !iorp->page )
++    if ( !page )
+         return;
+ 
++    iorp->page = NULL;
++
+     unmap_domain_page_global(iorp->va);
+     iorp->va = NULL;
+ 
+-    put_page_and_type(iorp->page);
+-    iorp->page = NULL;
++    /*
++     * Check whether we need to clear the allocation reference before
++     * dropping the explicit references taken by get_page_and_type().
++     */
++    if ( test_and_clear_bit(_PGC_allocated, &page->count_info) )
++        put_page(page);
++
++    put_page_and_type(page);
+ }
+ 
+ bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
+-- 
+2.19.1
+
Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA276-2
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA276-2:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA276-2 Wed Nov 28 14:00:49 2018
@@ -0,0 +1,85 @@
+$NetBSD: patch-XSA276-2,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From 0bb2969630fbc92a0510bf120578b58efb74cdab Mon Sep 17 00:00:00 2001
+From: Paul Durrant <Paul.Durrant%citrix.com@localhost>
+Date: Thu, 1 Nov 2018 17:30:20 +0000
+Subject: [PATCH 2/2] x86/hvm/ioreq: use ref-counted target-assigned shared
+ pages
+
+Passing MEMF_no_refcount to alloc_domheap_pages() will allocate, as
+expected, a page that is assigned to the specified domain but is not
+accounted for in tot_pages. Unfortunately there is no logic for tracking
+such allocations and avoiding any adjustment to tot_pages when the page
+is freed.
+
+The only caller of alloc_domheap_pages() that passes MEMF_no_refcount is
+hvm_alloc_ioreq_mfn() so this patch removes use of the flag from that
+call-site to avoid the possibility of a domain using an ioreq server as
+a means to adjust its tot_pages and hence allocate more memory than it
+should be able to.
+
+However, the reason for using the flag in the first place was to avoid
+the allocation failing if the emulator domain is already at its maximum
+memory limit. Hence this patch switches to allocating memory from the
+target domain instead of the emulator domain. There is already an extra
+memory allowance of 2MB (LIBXL_HVM_EXTRA_MEMORY) applied to HVM guests,
+which is sufficient to cover the pages required by the supported
+configuration of a single IOREQ server for QEMU. (Stub-domains do not,
+so far, use resource mapping). It also also the case the QEMU will have
+mapped the IOREQ server pages before the guest boots, hence it is not
+possible for the guest to inflate its balloon to consume these pages.
+
+Reported-by: Julien Grall <julien.grall%arm.com@localhost>
+Signed-off-by: Paul Durrant <paul.durrant%citrix.com@localhost>
+---
+ xen/arch/x86/hvm/ioreq.c | 12 ++----------
+ xen/arch/x86/mm.c        |  6 ------
+ 2 files changed, 2 insertions(+), 16 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
+index bdc2687014..fd10ee6146 100644
+--- xen/arch/x86/hvm/ioreq.c.orig
++++ xen/arch/x86/hvm/ioreq.c
+@@ -342,20 +342,12 @@ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+         return 0;
+     }
+ 
+-    /*
+-     * Allocated IOREQ server pages are assigned to the emulating
+-     * domain, not the target domain. This is safe because the emulating
+-     * domain cannot be destroyed until the ioreq server is destroyed.
+-     * Also we must use MEMF_no_refcount otherwise page allocation
+-     * could fail if the emulating domain has already reached its
+-     * maximum allocation.
+-     */
+-    page = alloc_domheap_page(s->emulator, MEMF_no_refcount);
++    page = alloc_domheap_page(s->target, 0);
+ 
+     if ( !page )
+         return -ENOMEM;
+ 
+-    if ( !get_page_and_type(page, s->emulator, PGT_writable_page) )
++    if ( !get_page_and_type(page, s->target, PGT_writable_page) )
+     {
+         /*
+          * The domain can't possibly know about this page yet, so failure
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index 7d4871b791..24b215d785 100644
+--- xen/arch/x86/mm.c.orig
++++ xen/arch/x86/mm.c
+@@ -4396,12 +4396,6 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
+ 
+             mfn_list[i] = mfn_x(mfn);
+         }
+-
+-        /*
+-         * The frames will have been assigned to the domain that created
+-         * the ioreq server.
+-         */
+-        *flags |= XENMEM_rsrc_acq_caller_owned;
+         break;
+     }
+ 
+-- 
+2.19.1
+
Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA277
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA277:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA277   Wed Nov 28 14:00:49 2018
@@ -0,0 +1,49 @@
+$NetBSD: patch-XSA277,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From: Andrew Cooper <andrew.cooper3%citrix.com@localhost>
+Subject: x86/mm: Put the gfn on all paths after get_gfn_query()
+
+c/s 7867181b2 "x86/PoD: correctly handle non-order-0 decrease-reservation
+requests" introduced an early exit in guest_remove_page() for unexpected p2m
+types.  However, get_gfn_query() internally takes the p2m lock, and must be
+matched with a put_gfn() call later.
+
+Fix the erroneous comment beside the declaration of get_gfn_query().
+
+This is XSA-277.
+
+Reported-by: Paul Durrant <paul.durrant%citrix.com@localhost>
+Signed-off-by: Andrew Cooper <andrew.cooper3%citrix.com@localhost>
+
+diff --git a/xen/common/memory.c b/xen/common/memory.c
+index 987395f..26b7123 100644
+--- xen/common/memory.c.orig
++++ xen/common/memory.c
+@@ -305,7 +305,11 @@ int guest_remove_page(struct domain *d, unsigned long gmfn)
+ #ifdef CONFIG_X86
+     mfn = get_gfn_query(d, gmfn, &p2mt);
+     if ( unlikely(p2mt == p2m_invalid) || unlikely(p2mt == p2m_mmio_dm) )
++    {
++        put_gfn(d, gmfn);
++
+         return -ENOENT;
++    }
+ 
+     if ( unlikely(p2m_is_paging(p2mt)) )
+     {
+diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
+index ac33f50..6d849a5 100644
+--- xen/include/asm-x86/p2m.h.orig
++++ xen/include/asm-x86/p2m.h
+@@ -448,10 +448,7 @@ static inline mfn_t __nonnull(3) get_gfn_type(
+     return get_gfn_type_access(p2m_get_hostp2m(d), gfn, t, &a, q, NULL);
+ }
+ 
+-/* Syntactic sugar: most callers will use one of these. 
+- * N.B. get_gfn_query() is the _only_ one guaranteed not to take the
+- * p2m lock; none of the others can be called with the p2m or paging
+- * lock held. */
++/* Syntactic sugar: most callers will use one of these. */
+ #define get_gfn(d, g, t)         get_gfn_type((d), (g), (t), P2M_ALLOC)
+ #define get_gfn_query(d, g, t)   get_gfn_type((d), (g), (t), 0)
+ #define get_gfn_unshare(d, g, t) get_gfn_type((d), (g), (t), \
Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA278
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA278:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA278   Wed Nov 28 14:00:49 2018
@@ -0,0 +1,328 @@
+$NetBSD: patch-XSA278,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From: Andrew Cooper <andrew.cooper3%citrix.com@localhost>
+Subject: x86/vvmx: Disallow the use of VT-x instructions when nested virt is disabled
+
+c/s ac6a4500b "vvmx: set vmxon_region_pa of vcpu out of VMX operation to an
+invalid address" was a real bugfix as described, but has a very subtle bug
+which results in all VT-x instructions being usable by a guest.
+
+The toolstack constructs a guest by issuing:
+
+  XEN_DOMCTL_createdomain
+  XEN_DOMCTL_max_vcpus
+
+and optionally later, HVMOP_set_param to enable nested virt.
+
+As a result, the call to nvmx_vcpu_initialise() in hvm_vcpu_initialise()
+(which is what makes the above patch look correct during review) is actually
+dead code.  In practice, nvmx_vcpu_initialise() first gets called when nested
+virt is enabled, which is typically never.
+
+As a result, the zeroed memory of struct vcpu causes nvmx_vcpu_in_vmx() to
+return true before nested virt is enabled for the guest.
+
+Fixing the order of initialisation is a work in progress for other reasons,
+but not viable for security backports.
+
+A compounding factor is that the vmexit handlers for all instructions, other
+than VMXON, pass 0 into vmx_inst_check_privilege()'s vmxop_check parameter,
+which skips the CR4.VMXE check.  (This is one of many reasons why nested virt
+isn't a supported feature yet.)
+
+However, the overall result is that when nested virt is not enabled by the
+toolstack (i.e. the default configuration for all production guests), the VT-x
+instructions (other than VMXON) are actually usable, and Xen very quickly
+falls over the fact that the nvmx structure is uninitialised.
+
+In order to fail safe in the supported case, re-implement all the VT-x
+instruction handling using a single function with a common prologue, covering
+all the checks which should cause #UD or #GP faults.  This deliberately
+doesn't use any state from the nvmx structure, in case there are other lurking
+issues.
+
+This is XSA-278
+
+Reported-by: Sergey Dyasli <sergey.dyasli%citrix.com@localhost>
+Signed-off-by: Andrew Cooper <andrew.cooper3%citrix.com@localhost>
+Reviewed-by: Sergey Dyasli <sergey.dyasli%citrix.com@localhost>
+
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index a6415f0..a4d2829 100644
+--- xen/arch/x86/hvm/vmx/vmx.c.orig
++++ xen/arch/x86/hvm/vmx/vmx.c
+@@ -3982,57 +3982,17 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+         break;
+ 
+     case EXIT_REASON_VMXOFF:
+-        if ( nvmx_handle_vmxoff(regs) == X86EMUL_OKAY )
+-            update_guest_eip();
+-        break;
+-
+     case EXIT_REASON_VMXON:
+-        if ( nvmx_handle_vmxon(regs) == X86EMUL_OKAY )
+-            update_guest_eip();
+-        break;
+-
+     case EXIT_REASON_VMCLEAR:
+-        if ( nvmx_handle_vmclear(regs) == X86EMUL_OKAY )
+-            update_guest_eip();
+-        break;
+- 
+     case EXIT_REASON_VMPTRLD:
+-        if ( nvmx_handle_vmptrld(regs) == X86EMUL_OKAY )
+-            update_guest_eip();
+-        break;
+-
+     case EXIT_REASON_VMPTRST:
+-        if ( nvmx_handle_vmptrst(regs) == X86EMUL_OKAY )
+-            update_guest_eip();
+-        break;
+-
+     case EXIT_REASON_VMREAD:
+-        if ( nvmx_handle_vmread(regs) == X86EMUL_OKAY )
+-            update_guest_eip();
+-        break;
+- 
+     case EXIT_REASON_VMWRITE:
+-        if ( nvmx_handle_vmwrite(regs) == X86EMUL_OKAY )
+-            update_guest_eip();
+-        break;
+-
+     case EXIT_REASON_VMLAUNCH:
+-        if ( nvmx_handle_vmlaunch(regs) == X86EMUL_OKAY )
+-            update_guest_eip();
+-        break;
+-
+     case EXIT_REASON_VMRESUME:
+-        if ( nvmx_handle_vmresume(regs) == X86EMUL_OKAY )
+-            update_guest_eip();
+-        break;
+-
+     case EXIT_REASON_INVEPT:
+-        if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
+-            update_guest_eip();
+-        break;
+-
+     case EXIT_REASON_INVVPID:
+-        if ( nvmx_handle_invvpid(regs) == X86EMUL_OKAY )
++        if ( nvmx_handle_vmx_insn(regs, exit_reason) == X86EMUL_OKAY )
+             update_guest_eip();
+         break;
+ 
+diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
+index e97db33..88cb58c 100644
+--- xen/arch/x86/hvm/vmx/vvmx.c.orig
++++ xen/arch/x86/hvm/vmx/vvmx.c
+@@ -1470,7 +1470,7 @@ void nvmx_switch_guest(void)
+  * VMX instructions handling
+  */
+ 
+-int nvmx_handle_vmxon(struct cpu_user_regs *regs)
++static int nvmx_handle_vmxon(struct cpu_user_regs *regs)
+ {
+     struct vcpu *v=current;
+     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+@@ -1522,7 +1522,7 @@ int nvmx_handle_vmxon(struct cpu_user_regs *regs)
+     return X86EMUL_OKAY;
+ }
+ 
+-int nvmx_handle_vmxoff(struct cpu_user_regs *regs)
++static int nvmx_handle_vmxoff(struct cpu_user_regs *regs)
+ {
+     struct vcpu *v=current;
+     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+@@ -1611,7 +1611,7 @@ static int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs)
+     return X86EMUL_OKAY;
+ }
+ 
+-int nvmx_handle_vmresume(struct cpu_user_regs *regs)
++static int nvmx_handle_vmresume(struct cpu_user_regs *regs)
+ {
+     bool_t launched;
+     struct vcpu *v = current;
+@@ -1645,7 +1645,7 @@ int nvmx_handle_vmresume(struct cpu_user_regs *regs)
+     return nvmx_vmresume(v,regs);
+ }
+ 
+-int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
++static int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
+ {
+     bool_t launched;
+     struct vcpu *v = current;
+@@ -1688,7 +1688,7 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
+     return rc;
+ }
+ 
+-int nvmx_handle_vmptrld(struct cpu_user_regs *regs)
++static int nvmx_handle_vmptrld(struct cpu_user_regs *regs)
+ {
+     struct vcpu *v = current;
+     struct vmx_inst_decoded decode;
+@@ -1759,7 +1759,7 @@ int nvmx_handle_vmptrld(struct cpu_user_regs *regs)
+     return X86EMUL_OKAY;
+ }
+ 
+-int nvmx_handle_vmptrst(struct cpu_user_regs *regs)
++static int nvmx_handle_vmptrst(struct cpu_user_regs *regs)
+ {
+     struct vcpu *v = current;
+     struct vmx_inst_decoded decode;
+@@ -1784,7 +1784,7 @@ int nvmx_handle_vmptrst(struct cpu_user_regs *regs)
+     return X86EMUL_OKAY;
+ }
+ 
+-int nvmx_handle_vmclear(struct cpu_user_regs *regs)
++static int nvmx_handle_vmclear(struct cpu_user_regs *regs)
+ {
+     struct vcpu *v = current;
+     struct vmx_inst_decoded decode;
+@@ -1836,7 +1836,7 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
+     return X86EMUL_OKAY;
+ }
+ 
+-int nvmx_handle_vmread(struct cpu_user_regs *regs)
++static int nvmx_handle_vmread(struct cpu_user_regs *regs)
+ {
+     struct vcpu *v = current;
+     struct vmx_inst_decoded decode;
+@@ -1878,7 +1878,7 @@ int nvmx_handle_vmread(struct cpu_user_regs *regs)
+     return X86EMUL_OKAY;
+ }
+ 
+-int nvmx_handle_vmwrite(struct cpu_user_regs *regs)
++static int nvmx_handle_vmwrite(struct cpu_user_regs *regs)
+ {
+     struct vcpu *v = current;
+     struct vmx_inst_decoded decode;
+@@ -1926,7 +1926,7 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs)
+     return X86EMUL_OKAY;
+ }
+ 
+-int nvmx_handle_invept(struct cpu_user_regs *regs)
++static int nvmx_handle_invept(struct cpu_user_regs *regs)
+ {
+     struct vmx_inst_decoded decode;
+     unsigned long eptp;
+@@ -1954,7 +1954,7 @@ int nvmx_handle_invept(struct cpu_user_regs *regs)
+     return X86EMUL_OKAY;
+ }
+ 
+-int nvmx_handle_invvpid(struct cpu_user_regs *regs)
++static int nvmx_handle_invvpid(struct cpu_user_regs *regs)
+ {
+     struct vmx_inst_decoded decode;
+     unsigned long vpid;
+@@ -1980,6 +1980,81 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
+     return X86EMUL_OKAY;
+ }
+ 
++int nvmx_handle_vmx_insn(struct cpu_user_regs *regs, unsigned int exit_reason)
++{
++    struct vcpu *curr = current;
++    int ret;
++
++    if ( !(curr->arch.hvm_vcpu.guest_cr[4] & X86_CR4_VMXE) ||
++         !nestedhvm_enabled(curr->domain) ||
++         (vmx_guest_x86_mode(curr) < (hvm_long_mode_active(curr) ? 8 : 2)) )
++    {
++        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
++        return X86EMUL_EXCEPTION;
++    }
++
++    if ( vmx_get_cpl() > 0 )
++    {
++        hvm_inject_hw_exception(TRAP_gp_fault, 0);
++        return X86EMUL_EXCEPTION;
++    }
++
++    switch ( exit_reason )
++    {
++    case EXIT_REASON_VMXOFF:
++        ret = nvmx_handle_vmxoff(regs);
++        break;
++
++    case EXIT_REASON_VMXON:
++        ret = nvmx_handle_vmxon(regs);
++        break;
++
++    case EXIT_REASON_VMCLEAR:
++        ret = nvmx_handle_vmclear(regs);
++        break;
++
++    case EXIT_REASON_VMPTRLD:
++        ret = nvmx_handle_vmptrld(regs);
++        break;
++
++    case EXIT_REASON_VMPTRST:
++        ret = nvmx_handle_vmptrst(regs);
++        break;
++
++    case EXIT_REASON_VMREAD:
++        ret = nvmx_handle_vmread(regs);
++        break;
++
++    case EXIT_REASON_VMWRITE:
++        ret = nvmx_handle_vmwrite(regs);
++        break;
++
++    case EXIT_REASON_VMLAUNCH:
++        ret = nvmx_handle_vmlaunch(regs);
++        break;
++
++    case EXIT_REASON_VMRESUME:
++        ret = nvmx_handle_vmresume(regs);
++        break;
++
++    case EXIT_REASON_INVEPT:
++        ret = nvmx_handle_invept(regs);
++        break;
++
++    case EXIT_REASON_INVVPID:
++        ret = nvmx_handle_invvpid(regs);
++        break;
++
++    default:
++        ASSERT_UNREACHABLE();
++        domain_crash(curr->domain);
++        ret = X86EMUL_UNHANDLEABLE;
++        break;
++    }
++
++    return ret;
++}
++
+ #define __emul_value(enable1, default1) \
+     ((enable1 | default1) << 32 | (default1))
+ 
+diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
+index 9ea35eb..fc4a8d1 100644
+--- xen/include/asm-x86/hvm/vmx/vvmx.h.orig
++++ xen/include/asm-x86/hvm/vmx/vvmx.h
+@@ -94,9 +94,6 @@ void nvmx_domain_relinquish_resources(struct domain *d);
+ 
+ bool_t nvmx_ept_enabled(struct vcpu *v);
+ 
+-int nvmx_handle_vmxon(struct cpu_user_regs *regs);
+-int nvmx_handle_vmxoff(struct cpu_user_regs *regs);
+-
+ #define EPT_TRANSLATE_SUCCEED       0
+ #define EPT_TRANSLATE_VIOLATION     1
+ #define EPT_TRANSLATE_MISCONFIG     2
+@@ -191,15 +188,7 @@ enum vmx_insn_errno set_vvmcs_real_safe(const struct vcpu *, u32 encoding,
+ uint64_t get_shadow_eptp(struct vcpu *v);
+ 
+ void nvmx_destroy_vmcs(struct vcpu *v);
+-int nvmx_handle_vmptrld(struct cpu_user_regs *regs);
+-int nvmx_handle_vmptrst(struct cpu_user_regs *regs);
+-int nvmx_handle_vmclear(struct cpu_user_regs *regs);
+-int nvmx_handle_vmread(struct cpu_user_regs *regs);
+-int nvmx_handle_vmwrite(struct cpu_user_regs *regs);
+-int nvmx_handle_vmresume(struct cpu_user_regs *regs);
+-int nvmx_handle_vmlaunch(struct cpu_user_regs *regs);
+-int nvmx_handle_invept(struct cpu_user_regs *regs);
+-int nvmx_handle_invvpid(struct cpu_user_regs *regs);
++int nvmx_handle_vmx_insn(struct cpu_user_regs *regs, unsigned int exit_reason);
+ int nvmx_msr_read_intercept(unsigned int msr,
+                                 u64 *msr_content);
+ 
Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA279
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA279:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA279   Wed Nov 28 14:00:49 2018
@@ -0,0 +1,39 @@
+$NetBSD: patch-XSA279,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From: Andrew Cooper <andrew.cooper3%citrix.com@localhost>
+Subject: x86/mm: Don't perform flush after failing to update a guests L1e
+
+If the L1e update hasn't occured, the flush cannot do anything useful.  This
+skips the potentially expensive vcpumask_to_pcpumask() conversion, and
+broadcast TLB shootdown.
+
+More importantly however, we might be in the error path due to a bad va
+parameter from the guest, and this should not propagate into the TLB flushing
+logic.  The INVPCID instruction for example raises #GP for a non-canonical
+address.
+
+This is XSA-279.
+
+Reported-by: Matthew Daley <mattd%bugfuzz.com@localhost>
+Signed-off-by: Andrew Cooper <andrew.cooper3%citrix.com@localhost>
+Reviewed-by: Jan Beulich <jbeulich%suse.com@localhost>
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index 703f330..75663c6 100644
+--- xen/arch/x86/mm.c.orig
++++ xen/arch/x86/mm.c
+@@ -4155,6 +4155,14 @@ static int __do_update_va_mapping(
+     if ( pl1e )
+         unmap_domain_page(pl1e);
+ 
++    /*
++     * Any error at this point means that we haven't change the L1e.  Skip the
++     * flush, as it won't do anything useful.  Furthermore, va is guest
++     * controlled and not necesserily audited by this point.
++     */
++    if ( rc )
++        return rc;
++
+     switch ( flags & UVMF_FLUSHTYPE_MASK )
+     {
+     case UVMF_TLB_FLUSH:
Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA280-1
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA280-1:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA280-1 Wed Nov 28 14:00:49 2018
@@ -0,0 +1,118 @@
+$NetBSD: patch-XSA280-1,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From: Jan Beulich <jbeulich%suse.com@localhost>
+Subject: x86/shadow: move OOS flag bit positions
+
+In preparation of reducing struct page_info's shadow_flags field to 16
+bits, lower the bit positions used for SHF_out_of_sync and
+SHF_oos_may_write.
+
+Instead of also adjusting the open coded use in _get_page_type(),
+introduce shadow_prepare_page_type_change() to contain knowledge of the
+bit positions to shadow code.
+
+This is part of XSA-280.
+
+Signed-off-by: Jan Beulich <jbeulich%suse.com@localhost>
+Reviewed-by: Tim Deegan <tim%xen.org@localhost>
+---
+v2: Rename function and pass full type.
+
+--- xen/arch/x86/mm.c.orig
++++ xen/arch/x86/mm.c
+@@ -2712,17 +2712,8 @@ static int _get_page_type(struct page_in
+         {
+             struct domain *d = page_get_owner(page);
+ 
+-            /*
+-             * Normally we should never let a page go from type count 0
+-             * to type count 1 when it is shadowed. One exception:
+-             * out-of-sync shadowed pages are allowed to become
+-             * writeable.
+-             */
+-            if ( d && shadow_mode_enabled(d)
+-                 && (page->count_info & PGC_page_table)
+-                 && !((page->shadow_flags & (1u<<29))
+-                      && type == PGT_writable_page) )
+-               shadow_remove_all_shadows(d, page_to_mfn(page));
++            if ( d && shadow_mode_enabled(d) )
++               shadow_prepare_page_type_change(d, page, type);
+ 
+             ASSERT(!(x & PGT_pae_xen_l2));
+             if ( (x & PGT_type_mask) != type )
+--- xen/arch/x86/mm/shadow/common.c.orig
++++ xen/arch/x86/mm/shadow/common.c
+@@ -749,6 +749,9 @@ int sh_unsync(struct vcpu *v, mfn_t gmfn
+          || !v->domain->arch.paging.shadow.oos_active )
+         return 0;
+ 
++    BUILD_BUG_ON(!(typeof(pg->shadow_flags))SHF_out_of_sync);
++    BUILD_BUG_ON(!(typeof(pg->shadow_flags))SHF_oos_may_write);
++
+     pg->shadow_flags |= SHF_out_of_sync|SHF_oos_may_write;
+     oos_hash_add(v, gmfn);
+     perfc_incr(shadow_unsync);
+@@ -2413,6 +2416,26 @@ void sh_remove_shadows(struct domain *d,
+     paging_unlock(d);
+ }
+ 
++void shadow_prepare_page_type_change(struct domain *d, struct page_info *page,
++                                     unsigned long new_type)
++{
++    if ( !(page->count_info & PGC_page_table) )
++        return;
++
++#if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
++    /*
++     * Normally we should never let a page go from type count 0 to type
++     * count 1 when it is shadowed. One exception: out-of-sync shadowed
++     * pages are allowed to become writeable.
++     */
++    if ( (page->shadow_flags & SHF_oos_may_write) &&
++         new_type == PGT_writable_page )
++        return;
++#endif
++
++    shadow_remove_all_shadows(d, page_to_mfn(page));
++}
++
+ static void
+ sh_remove_all_shadows_and_parents(struct domain *d, mfn_t gmfn)
+ /* Even harsher: this is a HVM page that we thing is no longer a pagetable.
+--- xen/arch/x86/mm/shadow/private.h.orig
++++ xen/arch/x86/mm/shadow/private.h
+@@ -285,8 +285,8 @@ static inline void sh_terminate_list(str
+  * codepath is called during that time and is sensitive to oos issues, it may
+  * need to use the second flag.
+  */
+-#define SHF_out_of_sync (1u<<30)
+-#define SHF_oos_may_write (1u<<29)
++#define SHF_out_of_sync (1u << (SH_type_max_shadow + 1))
++#define SHF_oos_may_write (1u << (SH_type_max_shadow + 2))
+ 
+ #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) */
+ 
+--- xen/include/asm-x86/shadow.h.orig
++++ xen/include/asm-x86/shadow.h
+@@ -81,6 +81,10 @@ void shadow_final_teardown(struct domain
+ 
+ void sh_remove_shadows(struct domain *d, mfn_t gmfn, int fast, int all);
+ 
++/* Adjust shadows ready for a guest page to change its type. */
++void shadow_prepare_page_type_change(struct domain *d, struct page_info *page,
++                                     unsigned long new_type);
++
+ /* Discard _all_ mappings from the domain's shadows. */
+ void shadow_blow_tables_per_domain(struct domain *d);
+ 
+@@ -105,6 +109,10 @@ int shadow_set_allocation(struct domain
+ static inline void sh_remove_shadows(struct domain *d, mfn_t gmfn,
+                                      int fast, int all) {}
+ 
++static inline void shadow_prepare_page_type_change(struct domain *d,
++                                                   struct page_info *page,
++                                                   unsigned long new_type) {}
++
+ static inline void shadow_blow_tables_per_domain(struct domain *d) {}
+ 
+ static inline int shadow_domctl(struct domain *d,
Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA280-2
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA280-2:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA280-2 Wed Nov 28 14:00:49 2018
@@ -0,0 +1,143 @@
+$NetBSD: patch-XSA280-2,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From: Jan Beulich <jbeulich%suse.com@localhost>
+Subject: x86/shadow: shrink struct page_info's shadow_flags to 16 bits
+
+This is to avoid it overlapping the linear_pt_count field needed for PV
+domains. Introduce a separate, HVM-only pagetable_dying field to replace
+the sole one left in the upper 16 bits.
+
+Note that the accesses to ->shadow_flags in shadow_{pro,de}mote() get
+switched to non-atomic, non-bitops operations, as {test,set,clear}_bit()
+are not allowed on uint16_t fields and hence their use would have
+required ugly casts. This is fine because all updates of the field ought
+to occur with the paging lock held, and other updates of it use |= and
+&= as well (i.e. using atomic operations here didn't really guard
+against potentially racing updates elsewhere).
+
+This is part of XSA-280.
+
+Reported-by: Prgmr.com Security <security%prgmr.com@localhost>
+Signed-off-by: Jan Beulich <jbeulich%suse.com@localhost>
+Reviewed-by: Tim Deegan <tim%xen.org@localhost>
+
+--- xen/arch/x86/mm/shadow/common.c.orig
++++ xen/arch/x86/mm/shadow/common.c
+@@ -1028,10 +1028,14 @@ void shadow_promote(struct domain *d, mf
+ 
+     /* Is the page already shadowed? */
+     if ( !test_and_set_bit(_PGC_page_table, &page->count_info) )
++    {
+         page->shadow_flags = 0;
++        if ( is_hvm_domain(d) )
++            page->pagetable_dying = false;
++    }
+ 
+-    ASSERT(!test_bit(type, &page->shadow_flags));
+-    set_bit(type, &page->shadow_flags);
++    ASSERT(!(page->shadow_flags & (1u << type)));
++    page->shadow_flags |= 1u << type;
+     TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_PROMOTE);
+ }
+ 
+@@ -1040,9 +1044,9 @@ void shadow_demote(struct domain *d, mfn
+     struct page_info *page = mfn_to_page(gmfn);
+ 
+     ASSERT(test_bit(_PGC_page_table, &page->count_info));
+-    ASSERT(test_bit(type, &page->shadow_flags));
++    ASSERT(page->shadow_flags & (1u << type));
+ 
+-    clear_bit(type, &page->shadow_flags);
++    page->shadow_flags &= ~(1u << type);
+ 
+     if ( (page->shadow_flags & SHF_page_type_mask) == 0 )
+     {
+@@ -2921,7 +2925,7 @@ void sh_remove_shadows(struct domain *d,
+     if ( !fast && all && (pg->count_info & PGC_page_table) )
+     {
+         SHADOW_ERROR("can't find all shadows of mfn %"PRI_mfn" "
+-                     "(shadow_flags=%08x)\n",
++                     "(shadow_flags=%04x)\n",
+                       mfn_x(gmfn), pg->shadow_flags);
+         domain_crash(d);
+     }
+--- xen/arch/x86/mm/shadow/multi.c.orig
++++ xen/arch/x86/mm/shadow/multi.c
+@@ -3299,8 +3299,8 @@ static int sh_page_fault(struct vcpu *v,
+ 
+     /* Unshadow if we are writing to a toplevel pagetable that is
+      * flagged as a dying process, and that is not currently used. */
+-    if ( sh_mfn_is_a_page_table(gmfn)
+-         && (mfn_to_page(gmfn)->shadow_flags & SHF_pagetable_dying) )
++    if ( sh_mfn_is_a_page_table(gmfn) && is_hvm_domain(d) &&
++         mfn_to_page(gmfn)->pagetable_dying )
+     {
+         int used = 0;
+         struct vcpu *tmp;
+@@ -4254,9 +4254,9 @@ int sh_rm_write_access_from_sl1p(struct
+     ASSERT(mfn_valid(smfn));
+ 
+     /* Remember if we've been told that this process is being torn down */
+-    if ( curr->domain == d )
++    if ( curr->domain == d && is_hvm_domain(d) )
+         curr->arch.paging.shadow.pagetable_dying
+-            = !!(mfn_to_page(gmfn)->shadow_flags & SHF_pagetable_dying);
++            = mfn_to_page(gmfn)->pagetable_dying;
+ 
+     sp = mfn_to_page(smfn);
+ 
+@@ -4572,10 +4572,10 @@ static void sh_pagetable_dying(struct vc
+                    : shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l2_pae_shadow);
+         }
+ 
+-        if ( mfn_valid(smfn) )
++        if ( mfn_valid(smfn) && is_hvm_domain(d) )
+         {
+             gmfn = _mfn(mfn_to_page(smfn)->v.sh.back);
+-            mfn_to_page(gmfn)->shadow_flags |= SHF_pagetable_dying;
++            mfn_to_page(gmfn)->pagetable_dying = true;
+             shadow_unhook_mappings(d, smfn, 1/* user pages only */);
+             flush = 1;
+         }
+@@ -4612,9 +4612,9 @@ static void sh_pagetable_dying(struct vc
+     smfn = shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l4_64_shadow);
+ #endif
+ 
+-    if ( mfn_valid(smfn) )
++    if ( mfn_valid(smfn) && is_hvm_domain(d) )
+     {
+-        mfn_to_page(gmfn)->shadow_flags |= SHF_pagetable_dying;
++        mfn_to_page(gmfn)->pagetable_dying = true;
+         shadow_unhook_mappings(d, smfn, 1/* user pages only */);
+         /* Now flush the TLB: we removed toplevel mappings. */
+         flush_tlb_mask(d->dirty_cpumask);
+--- xen/arch/x86/mm/shadow/private.h.orig
++++ xen/arch/x86/mm/shadow/private.h
+@@ -292,8 +292,6 @@ static inline void sh_terminate_list(str
+ 
+ #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) */
+ 
+-#define SHF_pagetable_dying (1u<<31)
+-
+ static inline int sh_page_has_multiple_shadows(struct page_info *pg)
+ {
+     u32 shadows;
+--- xen/include/asm-x86/mm.h.orig
++++ xen/include/asm-x86/mm.h
+@@ -259,8 +259,15 @@ struct page_info
+          * Guest pages with a shadow.  This does not conflict with
+          * tlbflush_timestamp since page table pages are explicitly not
+          * tracked for TLB-flush avoidance when a guest runs in shadow mode.
++         *
++         * pagetable_dying is used for HVM domains only. The layout here has
++         * to avoid re-use of the space used by linear_pt_count, which (only)
++         * PV guests use.
+          */
+-        u32 shadow_flags;
++        struct {
++            uint16_t shadow_flags;
++            bool pagetable_dying;
++        };
+ 
+         /* When in use as a shadow, next shadow in this hash chain. */
+         __pdx_t next_shadow;
Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA282-1
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA282-1:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA282-1 Wed Nov 28 14:00:49 2018
@@ -0,0 +1,149 @@
+$NetBSD: patch-XSA282-1,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From: Jan Beulich <jbeulich%suse.com@localhost>
+Subject: x86: extend get_platform_badpages() interface
+
+Use a structure so along with an address (now frame number) an order can
+also be specified.
+
+This is part of XSA-282.
+
+Signed-off-by: Jan Beulich <jbeulich%suse.com@localhost>
+Reviewed-by: Andrew Cooper <andrew.cooper3%citrix.com@localhost>
+
+--- xen/arch/x86/guest/xen.c.orig
++++ xen/arch/x86/guest/xen.c
+@@ -40,7 +40,7 @@ bool __read_mostly xen_guest;
+ static __read_mostly uint32_t xen_cpuid_base;
+ extern char hypercall_page[];
+ static struct rangeset *mem;
+-static unsigned long __initdata reserved_pages[2];
++static struct platform_bad_page __initdata reserved_pages[2];
+ 
+ DEFINE_PER_CPU(unsigned int, vcpu_id);
+ 
+@@ -326,7 +326,7 @@ void __init hypervisor_fixup_e820(struct
+         panic("Unable to get " #p);             \
+     mark_pfn_as_ram(e820, pfn);                 \
+     ASSERT(i < ARRAY_SIZE(reserved_pages));     \
+-    reserved_pages[i++] = pfn << PAGE_SHIFT;    \
++    reserved_pages[i++].mfn = pfn;              \
+ })
+     MARK_PARAM_RAM(HVM_PARAM_STORE_PFN);
+     if ( !pv_console )
+@@ -334,7 +334,7 @@ void __init hypervisor_fixup_e820(struct
+ #undef MARK_PARAM_RAM
+ }
+ 
+-const unsigned long *__init hypervisor_reserved_pages(unsigned int *size)
++const struct platform_bad_page *__init hypervisor_reserved_pages(unsigned int *size)
+ {
+     ASSERT(xen_guest);
+ 
+--- xen/arch/x86/mm.c.orig
++++ xen/arch/x86/mm.c
+@@ -5768,23 +5768,23 @@ void arch_dump_shared_mem_info(void)
+             mem_sharing_get_nr_saved_mfns());
+ }
+ 
+-const unsigned long *__init get_platform_badpages(unsigned int *array_size)
++const struct platform_bad_page *__init get_platform_badpages(unsigned int *array_size)
+ {
+     u32 igd_id;
+-    static unsigned long __initdata bad_pages[] = {
+-        0x20050000,
+-        0x20110000,
+-        0x20130000,
+-        0x20138000,
+-        0x40004000,
++    static const struct platform_bad_page __initconst snb_bad_pages[] = {
++        { .mfn = 0x20050000 >> PAGE_SHIFT },
++        { .mfn = 0x20110000 >> PAGE_SHIFT },
++        { .mfn = 0x20130000 >> PAGE_SHIFT },
++        { .mfn = 0x20138000 >> PAGE_SHIFT },
++        { .mfn = 0x40004000 >> PAGE_SHIFT },
+     };
+ 
+-    *array_size = ARRAY_SIZE(bad_pages);
++    *array_size = ARRAY_SIZE(snb_bad_pages);
+     igd_id = pci_conf_read32(0, 0, 2, 0, 0);
+-    if ( !IS_SNB_GFX(igd_id) )
+-        return NULL;
++    if ( IS_SNB_GFX(igd_id) )
++        return snb_bad_pages;
+ 
+-    return bad_pages;
++    return NULL;
+ }
+ 
+ void paging_invlpg(struct vcpu *v, unsigned long va)
+--- xen/common/page_alloc.c.orig
++++ xen/common/page_alloc.c
+@@ -270,7 +270,7 @@ void __init init_boot_pages(paddr_t ps,
+     unsigned long bad_spfn, bad_epfn;
+     const char *p;
+ #ifdef CONFIG_X86
+-    const unsigned long *badpage = NULL;
++    const struct platform_bad_page *badpage;
+     unsigned int i, array_size;
+ 
+     BUILD_BUG_ON(8 * sizeof(frame_table->u.free.first_dirty) <
+@@ -299,8 +299,8 @@ void __init init_boot_pages(paddr_t ps,
+     {
+         for ( i = 0; i < array_size; i++ )
+         {
+-            bootmem_region_zap(*badpage >> PAGE_SHIFT,
+-                               (*badpage >> PAGE_SHIFT) + 1);
++            bootmem_region_zap(badpage->mfn,
++                               badpage->mfn + (1U << badpage->order));
+             badpage++;
+         }
+     }
+@@ -312,8 +312,8 @@ void __init init_boot_pages(paddr_t ps,
+         {
+             for ( i = 0; i < array_size; i++ )
+             {
+-                bootmem_region_zap(*badpage >> PAGE_SHIFT,
+-                                   (*badpage >> PAGE_SHIFT) + 1);
++                bootmem_region_zap(badpage->mfn,
++                                   badpage->mfn + (1U << badpage->order));
+                 badpage++;
+             }
+         }
+--- xen/include/asm-x86/guest/xen.h.orig
++++ xen/include/asm-x86/guest/xen.h
+@@ -37,7 +37,7 @@ void hypervisor_ap_setup(void);
+ int hypervisor_alloc_unused_page(mfn_t *mfn);
+ int hypervisor_free_unused_page(mfn_t mfn);
+ void hypervisor_fixup_e820(struct e820map *e820);
+-const unsigned long *hypervisor_reserved_pages(unsigned int *size);
++const struct platform_bad_page *hypervisor_reserved_pages(unsigned int *size);
+ uint32_t hypervisor_cpuid_base(void);
+ void hypervisor_resume(void);
+ 
+@@ -65,7 +65,7 @@ static inline void hypervisor_fixup_e820
+     ASSERT_UNREACHABLE();
+ }
+ 
+-static inline const unsigned long *hypervisor_reserved_pages(unsigned int *size)
++static inline const struct platform_bad_page *hypervisor_reserved_pages(unsigned int *size)
+ {
+     ASSERT_UNREACHABLE();
+     return NULL;
+--- xen/include/asm-x86/mm.h.orig
++++ xen/include/asm-x86/mm.h
+@@ -348,7 +348,13 @@ void zap_ro_mpt(mfn_t mfn);
+ 
+ bool is_iomem_page(mfn_t mfn);
+ 
+-const unsigned long *get_platform_badpages(unsigned int *array_size);
++struct platform_bad_page {
++    unsigned long mfn;
++    unsigned int order;
++};
++
++const struct platform_bad_page *get_platform_badpages(unsigned int *array_size);
++
+ /* Per page locks:
+  * page_lock() is used for two purposes: pte serialization, and memory sharing.
+  *
Index: pkgsrc/sysutils/xenkernel411/patches/patch-XSA282-2
diff -u /dev/null pkgsrc/sysutils/xenkernel411/patches/patch-XSA282-2:1.1
--- /dev/null   Wed Nov 28 14:00:49 2018
+++ pkgsrc/sysutils/xenkernel411/patches/patch-XSA282-2 Wed Nov 28 14:00:49 2018
@@ -0,0 +1,44 @@
+$NetBSD: patch-XSA282-2,v 1.1 2018/11/28 14:00:49 bouyer Exp $
+
+From: Jan Beulich <jbeulich%suse.com@localhost>
+Subject: x86: work around HLE host lockup erratum
+
+XACQUIRE prefixed accesses to the 4Mb range of memory starting at 1Gb
+are liable to lock up the processor. Disallow use of this memory range.
+
+Unfortunately the available Core Gen7 and Gen8 spec updates are pretty
+old, so I can only guess that they're similarly affected when Core Gen6
+is and the Xeon counterparts are, too.
+
+This is part of XSA-282.
+
+Signed-off-by: Jan Beulich <jbeulich%suse.com@localhost>
+Reviewed-by: Andrew Cooper <andrew.cooper3%citrix.com@localhost>
+---
+v2: Don't apply the workaround when running ourselves virtualized.
+
+--- xen/arch/x86/mm.c.orig
++++ xen/arch/x86/mm.c
+@@ -5853,6 +5853,22 @@ const struct platform_bad_page *__init g
+         { .mfn = 0x20138000 >> PAGE_SHIFT },
+         { .mfn = 0x40004000 >> PAGE_SHIFT },
+     };
++    static const struct platform_bad_page __initconst hle_bad_page = {
++        .mfn = 0x40000000 >> PAGE_SHIFT, .order = 10
++    };
++
++    switch ( cpuid_eax(1) & 0x000f3ff0 )
++    {
++    case 0x000406e0: /* erratum SKL167 */
++    case 0x00050650: /* erratum SKZ63 */
++    case 0x000506e0: /* errata SKL167 / SKW159 */
++    case 0x000806e0: /* erratum KBL??? */
++    case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
++        *array_size = (cpuid_eax(0) >= 7 &&
++                       !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) &&
++                       (cpuid_count_ebx(7, 0) & cpufeat_mask(X86_FEATURE_HLE)));
++        return &hle_bad_page;
++    }
+ 
+     *array_size = ARRAY_SIZE(snb_bad_pages);
+     igd_id = pci_conf_read32(0, 0, 2, 0, 0);

Prev by Date: CVS commit: pkgsrc/doc
Next by Date: CVS commit: pkgsrc/doc
Previous by Thread: CVS commit: pkgsrc/doc
Next by Thread: CVS commit: pkgsrc/doc
Indexes:

Home | Main Index | Thread Index | Old Index