tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Problems with UVM pagefaults



Hello,

I'm trying to develop a gntdev for NetBSD, I've posted a first version
of the device some time ago, but it had problems (mainly it was unable
to work with HVM domains).

Maybe it would be good to have a little introduction about the gnt
device. It is used by Xen userspace programs to map memory from other
domains (it does not allow to share memory from the current domain, only
to map memory from other domains that have previously allowed it). It is
mainly used to run device backends in userspace.

This device works by passing a memory region previously allocated with
mmap(NULL, size, prot, MAP_ANON | MAP_SHARED, -1, 0). This region is
passed to the device using an ioctl, and then inside the device we get
the mfm of the ptes of the allocated memory region and pass them to Xen,
so the hypervisor can modify the ptes to point to the right mfns from
the other domain.

So far I've been able to get the ptes, pass them to Xen and stablish the
mapping. Writing to that memory area from userspace seems to work fine
(using pread), but the problem comes when the userspace program executes
something like:

pwrite(fd, buf,...)

Where "buf" is a region of the memory mapped by the gnt device. This
triggers a page fault in UVM, and this fault will try to modify the pte
of the mapped memory region. This pte should not be modified, because if
we modify the content of the pte, Xen will probably complain and crash,
and if Xen doesn't crash we won't be able to unmap the pte later on,
since the pte doesn't contain the value that Xen expects.

I've added a little hack to my gnt device, to be able to know who is
trying to change the content of the pte, and I've got the following trace:

breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1f2
printf_nolog() at netbsd:printf_nolog
xpq_flush_queue() at netbsd:xpq_flush_queue+0x180
pmap_enter_ma() at netbsd:pmap_enter_ma+0x5c1
pmap_enter() at netbsd:pmap_enter+0x35
uvm_fault_upper_enter.clone.4() at
netbsd:uvm_fault_upper_enter.clone.4+0x22a
uvm_fault_internal() at netbsd:uvm_fault_internal+0x28f4
uvm_fault_wire() at netbsd:uvm_fault_wire+0x53
genfs_directio() at netbsd:genfs_directio+0x16a
ffs_write() at netbsd:ffs_write+0x43a
VOP_WRITE() at netbsd:VOP_WRITE+0x55
vn_write() at netbsd:vn_write+0xf9
do_filewritev() at netbsd:do_filewritev+0x1fd
sys_pwritev() at netbsd:sys_pwritev+0x2b
syscall() at netbsd:syscall+0x94
--- syscall (number 290) ---

Is there anyway to prevent UVM from faulting? The address on that VA is
already set AFAIK, but I don't know almost anything about how UVM works,
so I would like to ask if someone could help me with that.

I'm attaching the code of the gntdev, the main function that contains
interesting code is gntmap_grant_ref, that's where I try to get the ptes
and set the mapping. This is not finished code, but I would like to
understand why this page faults happen, and how can I solve this problem.

Thanks, Roger.
From 1e6fb3749453810be6bec2e14e8f8abd371e9a6f Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau%citrix.com@localhost>
Date: Tue, 8 Jan 2013 19:42:29 +0100
Subject: [PATCH] xen: add gntdev

This is a basic (and experimental) gntdev implementation for NetBSD.

The gnt device allows usermode applications to map grant references in
userspace. It is mainly used by Qemu to implement a Xen backend (that
runs in userspace).

Due to the fact that qemu-upstream is not yet functional in NetBSD,
the only way to try this gntdev is to use the old qemu
(qemu-traditional).

This device allows to map memory from guests domains that request it,
but it doesn't allow to map memory from the current domain to another
one.
---
 etc/MAKEDEV.tmpl                   |    5 +
 etc/etc.amd64/MAKEDEV.conf         |    2 +-
 etc/etc.i386/MAKEDEV.conf          |    2 +-
 sys/arch/amd64/conf/XEN3_DOM0      |    1 +
 sys/arch/amd64/conf/majors.amd64   |    1 +
 sys/arch/i386/conf/XEN3_DOM0       |    1 +
 sys/arch/i386/conf/majors.i386     |    1 +
 sys/arch/xen/conf/files.xen        |    2 +
 sys/arch/xen/include/xen_shm.h     |    3 +
 sys/arch/xen/include/xenio.h       |   76 ++++++
 sys/arch/xen/x86/x86_xpmap.c       |   24 ++
 sys/arch/xen/x86/xen_shm_machdep.c |   70 +++++-
 sys/arch/xen/xen/gntdev.c          |  492 ++++++++++++++++++++++++++++++++++++
 sys/dev/DEVNAMES                   |    1 +
 sys/rump/librump/rumpkern/devsw.c  |    1 +
 15 files changed, 679 insertions(+), 3 deletions(-)
 create mode 100644 sys/arch/xen/xen/gntdev.c

diff --git a/etc/MAKEDEV.tmpl b/etc/MAKEDEV.tmpl
index 21b0568..00029c6 100644
--- a/etc/MAKEDEV.tmpl
+++ b/etc/MAKEDEV.tmpl
@@ -289,6 +289,7 @@
 #      wsfont* console font control
 #      wsmux*  wscons event multiplexor
 #      xenevt  Xen event interface
+#      gntdev  Xen grant table interface
 #
 # iSCSI communication devices
 #      iscsi*  iSCSI driver and /sbin/iscsid communication
@@ -1020,6 +1021,10 @@ xsd_kva)
        mkdev xsd_kva c %xenevt_chr% 1
        ;;
 
+gntdev)
+       mkdev gntdev c %gntdev_chr% 0
+       ;;
+
 xencons)
        mkdev xencons c %xencons_chr% 0
        ;;
diff --git a/etc/etc.amd64/MAKEDEV.conf b/etc/etc.amd64/MAKEDEV.conf
index a4a831c..5e2098c 100644
--- a/etc/etc.amd64/MAKEDEV.conf
+++ b/etc/etc.amd64/MAKEDEV.conf
@@ -44,5 +44,5 @@ all_md)
        ;;
 
 xen)
-       makedev xenevt xencons xsd_kva
+       makedev xenevt xencons xsd_kva gntdev
        ;;
diff --git a/etc/etc.i386/MAKEDEV.conf b/etc/etc.i386/MAKEDEV.conf
index ba3e2cc..bd38673 100644
--- a/etc/etc.i386/MAKEDEV.conf
+++ b/etc/etc.i386/MAKEDEV.conf
@@ -48,7 +48,7 @@ all_md)
        ;;
 
 xen)
-       makedev xenevt xencons xsd_kva
+       makedev xenevt xencons xsd_kva gntdev
        ;;
 
 floppy)
diff --git a/sys/arch/amd64/conf/XEN3_DOM0 b/sys/arch/amd64/conf/XEN3_DOM0
index e5f9f1f..1807dd2 100644
--- a/sys/arch/amd64/conf/XEN3_DOM0
+++ b/sys/arch/amd64/conf/XEN3_DOM0
@@ -838,6 +838,7 @@ pseudo-device       wsfont
 pseudo-device  drvctl
 
 # xen pseudo-devices
+pseudo-device  gntdev
 pseudo-device  xenevt
 pseudo-device  xvif
 pseudo-device  xbdback
diff --git a/sys/arch/amd64/conf/majors.amd64 b/sys/arch/amd64/conf/majors.amd64
index 9e6b1ac..cf15f7d 100644
--- a/sys/arch/amd64/conf/majors.amd64
+++ b/sys/arch/amd64/conf/majors.amd64
@@ -96,6 +96,7 @@ device-major  nsmb            char 98                 nsmb
 # - they appear in the i386 MAKEDEV
 #
 
+device-major   gntdev          char 140                gntdev
 device-major   xenevt          char 141                xenevt
 device-major   xbd             char 142 block 142      xbd
 device-major   xencons         char 143                xencons
diff --git a/sys/arch/i386/conf/XEN3_DOM0 b/sys/arch/i386/conf/XEN3_DOM0
index 8b5cf99..be28bbc 100644
--- a/sys/arch/i386/conf/XEN3_DOM0
+++ b/sys/arch/i386/conf/XEN3_DOM0
@@ -820,6 +820,7 @@ pseudo-device       wsfont
 pseudo-device  drvctl
 
 # xen pseudo-devices
+pseudo-device  gntdev
 pseudo-device  xenevt
 pseudo-device  xvif
 pseudo-device  xbdback
diff --git a/sys/arch/i386/conf/majors.i386 b/sys/arch/i386/conf/majors.i386
index 38c043f..9aab728 100644
--- a/sys/arch/i386/conf/majors.i386
+++ b/sys/arch/i386/conf/majors.i386
@@ -111,6 +111,7 @@ device-major        mt              char 107 block 24       
mt
 # - they appear in the i386 MAKEDEV
 #
 
+device-major   gntdev          char 140                gntdev
 device-major   xenevt          char 141                xenevt
 device-major   xbd             char 142 block 142      xbd
 device-major   xencons         char 143                xencons
diff --git a/sys/arch/xen/conf/files.xen b/sys/arch/xen/conf/files.xen
index e022db5..91ff858 100644
--- a/sys/arch/xen/conf/files.xen
+++ b/sys/arch/xen/conf/files.xen
@@ -198,6 +198,7 @@ attach      xencons at xendevbus
 file   arch/xen/xen/xencons.c          xencons needs-flag
 
 # Xen event peudo-device
+defpseudo gntdev
 defpseudo xenevt
 defpseudo xvif
 defpseudo xbdback
@@ -390,6 +391,7 @@ include     "dev/pcmcia/files.pcmcia"
 # Domain-0 operations
 defflag        opt_xen.h                       DOM0OPS
 file   arch/xen/xen/privcmd.c          dom0ops
+file   arch/xen/xen/gntdev.c           dom0ops
 file   arch/xen/x86/xen_shm_machdep.c  dom0ops
 file   arch/x86/pci/pci_machdep.c      hypervisor & pci & dom0ops
 file   arch/xen/xen/pci_intr_machdep.c hypervisor & pci
diff --git a/sys/arch/xen/include/xen_shm.h b/sys/arch/xen/include/xen_shm.h
index e2d89d0..6416ca1 100644
--- a/sys/arch/xen/include/xen_shm.h
+++ b/sys/arch/xen/include/xen_shm.h
@@ -37,7 +37,10 @@
  */
 
 int  xen_shm_map(int, int, grant_ref_t *, vaddr_t *, grant_handle_t *, int);
+int xen_shm_map_pte(int nentries, int *domid, grant_ref_t *grefp,
+       pt_entry_t **pte, grant_handle_t *handlep, int flags);
 void xen_shm_unmap(vaddr_t, int, grant_handle_t *);
+int xen_shm_unmap_pte(int, pt_entry_t **, grant_handle_t *);
 int xen_shm_callback(int (*)(void *), void *);
 
 /* flags for xen_shm_map() */
diff --git a/sys/arch/xen/include/xenio.h b/sys/arch/xen/include/xenio.h
index 6b25733..87cd376 100644
--- a/sys/arch/xen/include/xenio.h
+++ b/sys/arch/xen/include/xenio.h
@@ -122,4 +122,80 @@ typedef struct oprivcmd_hypercall
 /* EVTCHN_UNBIND: Unbind from the specified event-channel port. */
 #define EVTCHN_UNBIND _IOW('E', 3, unsigned long)
 
+/* Interface to /dev/gntdev */
+
+typedef struct ioctl_gntdev_grant_ref {
+    /* The domain ID of the grant to be mapped. */
+    uint32_t domid;
+    /* The grant reference of the grant to be mapped. */
+    uint32_t ref;
+} ioctl_gntdev_grant_ref;
+
+typedef struct ioctl_gntdev_map_grant_ref {
+    /* IN parameters */
+    /* The number of grants to be mapped. */
+    uint32_t count;
+    uint32_t pad;
+    uint64_t vaddr;
+    /* OUT parameters */
+    /* The offset to be used on a subsequent call to mmap(). */
+    uint64_t index;
+    /* Variable IN parameter. */
+    /* Array of grant references, of size @count. */
+    ioctl_gntdev_grant_ref *refs;
+} ioctl_gntdev_map_grant_ref;
+
+typedef struct ioctl_gntdev_unmap_grant_ref {
+    /* IN parameters */
+    /* The offset was returned by the corresponding map operation. */
+    uint64_t index;
+    /* The number of pages to be unmapped. */
+    uint32_t count;
+    uint32_t pad;
+} ioctl_gntdev_unmap_grant_ref;
+
+typedef struct ioctl_gntdev_get_offset_for_vaddr {
+    /* IN parameters */
+    /* The virtual address of the first mapped page in a range. */
+    uint64_t vaddr;
+    /* OUT parameters */
+    /* The offset that was used in the initial mmap() operation. */
+    uint64_t offset;
+    /* The number of pages mapped in the VM area that begins at @vaddr. */
+    uint32_t count;
+    uint32_t pad;
+} ioctl_gntdev_get_offset_for_vaddr;
+
+/*
+ * Inserts the grant references into the mapping table of an instance
+ * of gntdev. N.B. This does not perform the mapping, which is deferred
+ * until mmap() is called with @index as the offset.
+ */
+#define IOCTL_GNTDEV_MAP_GRANT_REF \
+    _IOWR('G', 0, ioctl_gntdev_map_grant_ref)
+
+/*
+ * Removes the grant references from the mapping table of an instance of
+ * of gntdev. N.B. munmap() must be called on the relevant virtual address(es)
+ * before this ioctl is called, or an error will result.
+ */
+#define IOCTL_GNTDEV_UNMAP_GRANT_REF \
+    _IOW('G', 1, ioctl_gntdev_unmap_grant_ref)
+
+/*
+ * Returns the offset in the driver's address space that corresponds
+ * to @vaddr. This can be used to perform a munmap(), followed by an
+ * UNMAP_GRANT_REF ioctl, where no state about the offset is retained by
+ * the caller. The number of pages that were allocated at the same time as
+ * @vaddr is returned in @count.
+ *
+ * N.B. Where more than one page has been mapped into a contiguous range, the
+ *      supplied @vaddr must correspond to the start of the range; otherwise
+ *      an error will result. It is only possible to munmap() the entire
+ *      contiguously-allocated range at once, and not any subrange thereof.
+ */
+#define IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR \
+    _IOWR('G', 2, ioctl_gntdev_get_offset_for_vaddr)
+
+
 #endif /* __XEN_XENIO_H__ */
diff --git a/sys/arch/xen/x86/x86_xpmap.c b/sys/arch/xen/x86/x86_xpmap.c
index ebb0567..2c71a8a 100644
--- a/sys/arch/xen/x86/x86_xpmap.c
+++ b/sys/arch/xen/x86/x86_xpmap.c
@@ -173,6 +173,9 @@ void xpq_debug_dump(void);
 static mmu_update_t xpq_queue_array[MAXCPUS][XPQUEUE_SIZE];
 static int xpq_idx_array[MAXCPUS];
 
+paddr_t grant_pte[XPQUEUE_SIZE];
+static int initialized = 0;
+
 #ifdef i386
 extern union descriptor tmpgdt[];
 #endif /* i386 */
@@ -180,6 +183,7 @@ void
 xpq_flush_queue(void)
 {
        int i, ok = 0, ret;
+       int j;
 
        mmu_update_t *xpq_queue = xpq_queue_array[curcpu()->ci_cpuid];
        int xpq_idx = xpq_idx_array[curcpu()->ci_cpuid];
@@ -189,6 +193,26 @@ xpq_flush_queue(void)
                XENPRINTK2(("%d: 0x%08" PRIx64 " 0x%08" PRIx64 "\n", i,
                    xpq_queue[i].ptr, xpq_queue[i].val));
 
+       if (initialized == 0) {
+               memset(grant_pte, 0, sizeof(grant_pte[0]) * XPQUEUE_SIZE);
+               initialized = 1;
+       }
+
+       /* XXX: This is the other part of the lame hack,
+        * Ptes that hold references to grant frames should not
+        * be modified, or we will not be able to unmap them!
+        */
+       for (i = 0; i < 2048; i++) {
+               if (grant_pte[i] == 0)
+                       continue;
+               for(j = 0; j < xpq_idx; j++) {
+                       if (xpq_queue[j].ptr == grant_pte[i]) {
+                               panic("bang: %p -> %p", (void *) 
xpq_queue[j].ptr,
+                                     (void *) xpq_queue[j].val);
+                       }
+               }
+       }
+
 retry:
        ret = HYPERVISOR_mmu_update_self(xpq_queue, xpq_idx, &ok);
 
diff --git a/sys/arch/xen/x86/xen_shm_machdep.c 
b/sys/arch/xen/x86/xen_shm_machdep.c
index d47745c..ba99b7c 100644
--- a/sys/arch/xen/x86/xen_shm_machdep.c
+++ b/sys/arch/xen/x86/xen_shm_machdep.c
@@ -35,6 +35,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_shm_machdep.c,v 1.10 
2011/09/02 22:25:08 dyoung
 #include <sys/queue.h>
 #include <sys/vmem.h>
 #include <sys/kernel.h>
+#include <sys/malloc.h>
 #include <uvm/uvm.h>
 
 #include <machine/pmap.h>
@@ -116,7 +117,6 @@ xen_shm_init(void)
        }
 }
 
-int
 xen_shm_map(int nentries, int domid, grant_ref_t *grefp, vaddr_t *vap,
     grant_handle_t *handlep, int flags)
 {
@@ -185,6 +185,74 @@ xen_shm_map(int nentries, int domid, grant_ref_t *grefp, 
vaddr_t *vap,
        return 0;
 }
 
+int
+xen_shm_map_pte(int nentries, int *domid, grant_ref_t *grefp,
+       pt_entry_t **pte, grant_handle_t *handlep, int flags)
+{
+       int i;
+       int err;
+       gnttab_map_grant_ref_t op[XENSHM_MAX_PAGES_PER_REQUEST];
+
+#ifdef DIAGNOSTIC
+       if (nentries > XENSHM_MAX_PAGES_PER_REQUEST) {
+               printf("xen_shm_map_pte: %d entries\n", nentries);
+               panic("xen_shm_map_pte");
+       }
+#endif
+
+       for (i = 0; i < nentries; i++) {
+               op[i].host_addr = xpmap_ptetomach(pte[i]);
+               op[i].dom = domid[i];
+               op[i].ref = grefp[i];
+               op[i].flags = GNTMAP_host_map | GNTMAP_contains_pte |
+                             GNTMAP_application_map |
+                             ((flags & XSHM_RO) ? GNTMAP_readonly : 0);
+       }
+       err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, op, nentries);
+       if (__predict_false(err))
+               panic("xen_shm_map_pte: HYPERVISOR_grant_table_op failed");
+       for (i = 0; i < nentries; i++) {
+               if (__predict_false(op[i].status)) {
+                       /* On error, unmap mapped grefs and return */
+                       xen_shm_unmap_pte(i, pte, handlep);
+                       return op[i].status;
+               }
+               handlep[i] = op[i].handle;
+       }
+       return 0;
+}
+
+int
+xen_shm_unmap_pte(int nentries, pt_entry_t **pte, grant_handle_t *handlep)
+{
+       gnttab_unmap_grant_ref_t op[XENSHM_MAX_PAGES_PER_REQUEST];
+       int ret;
+       int i;
+
+#ifdef DIAGNOSTIC
+       if (nentries > XENSHM_MAX_PAGES_PER_REQUEST) {
+               printf("xen_shm_unmap_pte: %d entries\n", nentries);
+               panic("xen_shm_unmap_pte");
+       }
+#endif
+
+       for (i = 0; i < nentries; i++) {
+               op[i].host_addr = xpmap_ptetomach(pte[i]);
+               op[i].dev_bus_addr = 0;
+               op[i].handle = handlep[i];
+       }
+       ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
+           op, nentries);
+       if (__predict_false(ret))
+               panic("xen_shm_unmap_pte: unmap failed");
+       for (i = 0; i < nentries; i++) {
+               if(__predict_false(op[i].status)) {
+                       return op[i].status;
+               }
+       }
+       return 0;
+}
+
 void
 xen_shm_unmap(vaddr_t va, int nentries, grant_handle_t *handlep)
 {
diff --git a/sys/arch/xen/xen/gntdev.c b/sys/arch/xen/xen/gntdev.c
new file mode 100644
index 0000000..5ac5098
--- /dev/null
+++ b/sys/arch/xen/xen/gntdev.c
@@ -0,0 +1,492 @@
+/*
+ * Copyright (c) 2012 Roger Pau Monné.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+#include <sys/cdefs.h>
+
+#include "opt_xen.h"
+
+#include <sys/param.h>
+#include <sys/malloc.h>
+#include <sys/mutex.h>
+#include <sys/file.h>
+#include <sys/filedesc.h>
+#include <sys/conf.h>
+
+#include <uvm/uvm.h>
+
+#include <xen/xen_shm.h>
+#include <xen/xenio.h>
+
+extern paddr_t grant_pte[2048];
+
+void gntdevattach(int n);
+
+#define freem(va) \
+       if (va) free(va, M_DEVBUF)
+
+#define GNTDEBUG
+#ifdef GNTDEBUG
+       #define debug(M, ...) \
+               printk("gntdev:%d: " M "\n", __LINE__, ##__VA_ARGS__)
+#else
+       #define debug(M, ...)
+#endif
+
+#define error(M, ...) \
+       printk("gntdev:%d error:" M "\n", __LINE__, ##__VA_ARGS__)
+
+#define VA_FREE 0
+
+static int gntdev_fioctl(struct file *, u_long, void *);
+static int gntdev_fclose(struct file *);
+
+static const struct fileops gntdev_fileops = {
+       .fo_read = fbadop_read,
+       .fo_write = fbadop_write,
+       .fo_ioctl = gntdev_fioctl,
+       .fo_fcntl = fnullop_fcntl,
+       .fo_poll = fnullop_poll,
+       .fo_stat = fbadop_stat,
+       .fo_close = gntdev_fclose,
+       .fo_kqfilter = fnullop_kqfilter,
+       .fo_restart = fnullop_restart,
+};
+
+dev_type_open(gntdev_open);
+
+const struct cdevsw gntdev_cdevsw = {
+       gntdev_open, nullclose, noread, nowrite, noioctl,
+       nostop, notty, nopoll, nommap, nokqfilter, D_OTHER
+};
+
+struct gntmap {
+       struct uvm_object uobj;
+       struct vm_map *vmap;
+       LIST_ENTRY(gntmap) next_map;
+       int index;
+       int count;
+       grant_ref_t *grants;
+       int *domids;
+       vaddr_t va;
+       grant_handle_t *handles;
+       pd_entry_t **pte;
+       bool ro;
+       bool mapped;
+};
+
+struct gntproc {
+       LIST_HEAD(,gntmap) maps;
+       kmutex_t lock;
+       struct lwp *lwp;
+       unsigned int num_maps;
+};
+
+static void
+gntdev_insert_map(struct gntproc *proc, struct gntmap *map);
+static struct gntmap *
+gntdev_find_map(struct gntproc *proc, int index, int count);
+static struct gntmap *
+gntdev_find_vaddr(struct gntproc *proc, vaddr_t va);
+static void
+gntdev_remove_map(struct gntproc *proc, struct gntmap *map);
+
+/* --- Helpers --- */
+
+static void
+gntdev_insert_map(struct gntproc *proc, struct gntmap *map)
+{
+       struct gntmap *tmap;
+
+       mutex_enter(&proc->lock);
+       proc->num_maps++;
+       if (LIST_EMPTY(&proc->maps)) {
+               LIST_INSERT_HEAD(&proc->maps, map, next_map);
+               goto out;
+       }
+       LIST_FOREACH(tmap, &proc->maps, next_map) {
+               if (map->index + map->count < tmap->index) {
+                       LIST_INSERT_BEFORE(tmap, map, next_map);
+                       goto out;
+               }
+               map->index = tmap->index + tmap->count;
+               if (LIST_NEXT(tmap, next_map) == NULL) {
+                       LIST_INSERT_AFTER(tmap, map, next_map);
+                       goto out;
+               }
+       }
+
+out:
+       mutex_exit(&proc->lock);
+       return;
+}
+
+static struct gntmap *
+gntdev_find_map(struct gntproc *proc, int index, int count)
+{
+       struct gntmap *map = NULL;
+
+       mutex_enter(&proc->lock);
+       if (LIST_EMPTY(&proc->maps))
+               goto out;
+
+       LIST_FOREACH(map, &proc->maps, next_map) {
+               if (index != map->index) {
+                       continue;
+               }
+               if (count && count != map->count) {
+                       continue;
+               }
+               goto out;
+       }
+       map = NULL;
+
+out:
+       mutex_exit(&proc->lock);
+       return map;
+}
+
+static struct gntmap *
+gntdev_find_vaddr(struct gntproc *proc, vaddr_t va)
+{
+       struct gntmap *map = NULL;
+
+       mutex_enter(&proc->lock);
+       if (LIST_EMPTY(&proc->maps))
+               goto out;
+
+       LIST_FOREACH(map, &proc->maps, next_map) {
+               if (va >= map->va && va < (map->va + (map->count * PAGE_SIZE)))
+                       goto out;
+       }
+       map = NULL;
+
+out:
+       mutex_exit(&proc->lock);
+       return map;
+}
+
+static void
+gntdev_remove_map(struct gntproc *proc, struct gntmap *map)
+{
+       int i, j;
+
+       mutex_enter(&proc->lock);
+       LIST_REMOVE(map, next_map);
+       proc->num_maps--;
+       mutex_exit(&proc->lock);
+       if (map->mapped) {
+               debug("unmapping map at index: %d", map->index);
+               if (xen_shm_unmap_pte(map->count, map->pte, map->handles)) {
+                       error("unable to unmap grant references for index %d", 
map->index);
+               }
+               /* XXX: Since we have unmapped the grants, remove the 
protection */
+               for (i = 0; i < map->count; i++) {
+                       for (j = 0; j < 2048; j++) {
+                               if (grant_pte[j] == 
xpmap_ptetomach(map->pte[i])) {
+                                       grant_pte[j] = 0;
+                                       break;
+                               }
+                       }
+               }
+               map->mapped = false;
+       }
+       free(map->grants, M_DEVBUF);
+       free(map->handles, M_DEVBUF);
+       free(map->domids, M_DEVBUF);
+       free(map->pte, M_DEVBUF);
+       free(map, M_DEVBUF);
+}
+
+static int
+gntmap_grant_ref(struct gntmap *map)
+{
+       int i, j, rc;
+       pt_entry_t *ptep;
+       pd_entry_t *ptes;
+       pd_entry_t * const *pdes;
+       pmap_t pmap = vm_map_pmap(map->vmap);
+       struct pmap *pmap2;
+
+       memset(map->handles, -1, sizeof(map->handles[0]) * map->count);
+
+       /* Lock pmap for the operation */
+       kpreempt_disable();
+       pmap_map_ptes(pmap, &pmap2, &ptes, &pdes);
+       for (i = 0; i < map->count; i++) {
+               /* Get ptes to pass to the grant table operation */
+               ptep = &ptes[pl1_i(map->va + (i * PAGE_SIZE))];
+               if (!pmap_valid_entry(*ptep)) {
+                       error("pte at %p not valid", ptep);
+                       rc = EINVAL;
+                       goto out;
+               }
+               map->pte[i] = ptep;
+       }
+
+       rc = xen_shm_map_pte(map->count, map->domids, map->grants, map->pte,
+                            map->handles, map->ro ? XSHM_RO : 0);
+       if (rc) {
+               error("unable to map ptes");
+               goto out;
+       }
+
+       /* XXX: This is a lame debug hack to check if someone (UVM)
+        * is modifying those ptes behind our back.
+        *
+        * Ptes used to map grant refs should not be modified, or we will
+        * not be able to unmap them!
+        */
+       for (i = 0; i < map->count; i++) {
+               debug("VA: %p *pte: %p pte: %p *pte maddr: %p",
+                         map->va + (i * PAGE_SIZE), map->pte[i], 
*(map->pte[i]),
+                         xpmap_ptetomach(map->pte[i]));
+               for (j = 0; j < 2048; j++) {
+                       if (grant_pte[j] == 0) {
+                               grant_pte[j] = xpmap_ptetomach(map->pte[i]);
+                               break;
+                       }
+               }
+       }
+
+       rc = 0;
+out:
+       pmap_unmap_ptes(pmap, pmap2);
+       kpreempt_enable();
+       return rc;
+}
+
+/* --- ioctl handlers --- */
+
+static int
+gntdev_ioctl_map_grant_ref(struct gntproc *proc,
+       ioctl_gntdev_map_grant_ref *map_grants)
+{
+       grant_ref_t *refs = NULL;
+       grant_handle_t *handles = NULL;
+       int *domids = NULL;
+       pt_entry_t **pte = NULL;
+       struct gntmap *map = NULL;
+       struct vm_map *vmm;
+       ioctl_gntdev_grant_ref ioctl_map;
+       int i, rc;
+       vaddr_t va0;
+
+       if (gntdev_find_vaddr(proc, map_grants->vaddr)) {
+               error("memory area %p already in use", (void *) 
map_grants->vaddr);
+               rc = EINVAL;
+               goto error;
+       }
+
+       debug("mapping %d refs", map_grants->count);
+
+       refs = malloc(sizeof(*refs) * map_grants->count, M_DEVBUF,
+                    M_WAITOK | M_ZERO);
+       handles = malloc(sizeof(*handles) * map_grants->count, M_DEVBUF,
+                    M_WAITOK | M_ZERO);
+       domids = malloc(sizeof(*domids) * map_grants->count, M_DEVBUF,
+                    M_WAITOK | M_ZERO);
+       pte = malloc(sizeof(*pte) * map_grants->count, M_DEVBUF,
+                    M_WAITOK | M_ZERO);
+
+       for (i = 0; i < map_grants->count; i++) {
+               rc = copyin(&map_grants->refs[i], &ioctl_map, 
sizeof(ioctl_map));
+               if (rc != 0) {
+                       error("unable to copyin grant ref info %d", i);
+                       goto error;
+               }
+               debug("mapping ref: %u Dom: %u", ioctl_map.ref, 
ioctl_map.domid);
+               refs[i] = ioctl_map.ref;
+               domids[i] = ioctl_map.domid;
+       }
+       map = malloc(sizeof(*map), M_DEVBUF,
+                                    M_WAITOK | M_ZERO);
+       vmm = &proc->lwp->l_proc->p_vmspace->vm_map;
+       va0 = map_grants->vaddr & ~PAGE_MASK;
+       vm_map_lock_read(vmm);
+       if (uvm_map_checkprot(vmm, va0, va0 + (map_grants->count << PGSHIFT) - 
1,
+           VM_PROT_WRITE)) {
+               map->ro = false;
+       } else if (uvm_map_checkprot(vmm, va0,
+           va0 + (map_grants->count << PGSHIFT) - 1, VM_PROT_READ)) {
+               map->ro = true;
+       } else {
+               error("unable check protection");
+               rc = EINVAL;
+               vm_map_unlock_read(vmm);
+               goto error;
+       }
+       vm_map_unlock_read(vmm);
+       map->grants = refs;
+       map->handles = handles;
+       map->pte = pte;
+       map->domids = domids;
+       map->va = map_grants->vaddr;
+       map->count = map_grants->count;
+       map->vmap = vmm;
+       map->index = 0;
+       map->mapped = false;
+
+       rc = gntmap_grant_ref(map);
+       if (rc) {
+               error("map_grant_ref failed");
+               goto error;
+       }
+       map->mapped = true;
+       gntdev_insert_map(proc, map);
+       map_grants->index = map->index << PAGE_SHIFT;
+       debug("gntrefs mapped at index %" PRIu64 "", map->index);
+       return 0;
+
+error:
+       freem(refs);
+       freem(handles);
+       freem(pte);
+       freem(domids);
+       freem(map);
+       error("unable to map grant refs");
+       return rc;
+}
+
+static int
+gntdev_ioctl_unmap_grant_ref(struct gntproc *proc,
+       ioctl_gntdev_unmap_grant_ref *unmap_grants)
+{
+       struct gntmap *map;
+       uint64_t index = unmap_grants->index >> PAGE_SHIFT;
+       int rc = 0;
+
+       map = gntdev_find_map(proc, index, unmap_grants->count);
+       if (map == NULL) {
+               error("unable to find index %" PRIu64, index);
+               rc = EINVAL;
+               goto out;
+       }
+       gntdev_remove_map(proc, map);
+out:
+       return rc;
+}
+
+static int
+gntdev_ioctl_get_offset_vaddr(struct gntproc *proc,
+       ioctl_gntdev_get_offset_for_vaddr *offset_vaddr)
+{
+       struct gntmap *map;
+       int rc = 0;
+
+       debug("find offset va: %p", (void *)offset_vaddr->vaddr);
+
+       map = gntdev_find_vaddr(proc, offset_vaddr->vaddr);
+       if (map == NULL) {
+               error("unable to find vaddr");
+               rc = EINVAL;
+               goto out;
+       }
+
+       offset_vaddr->offset = map->index << PAGE_SHIFT;
+       offset_vaddr->count = map->count;
+
+out:
+       return rc;
+}
+
+/* --- Device ops handlers --- */
+
+static int
+gntdev_fioctl(struct file *fp, u_long cmd, void *addr)
+{
+       struct gntproc *proc = fp->f_data;
+       ioctl_gntdev_map_grant_ref *map_grants;
+       ioctl_gntdev_unmap_grant_ref *unmap_grants;
+       ioctl_gntdev_get_offset_for_vaddr *offset_vaddr;
+       int rc;
+
+       switch (cmd) {
+       case IOCTL_GNTDEV_MAP_GRANT_REF:
+               map_grants = addr;
+               rc = gntdev_ioctl_map_grant_ref(proc, map_grants);
+               break;
+       case IOCTL_GNTDEV_UNMAP_GRANT_REF:
+               unmap_grants = addr;
+               rc = gntdev_ioctl_unmap_grant_ref(proc, unmap_grants);
+               break;
+       case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR:
+               offset_vaddr = addr;
+               rc = gntdev_ioctl_get_offset_vaddr(proc, offset_vaddr);
+               break;
+       default:
+               error("unknown ioctl 0x%08lu", cmd);
+               rc = EINVAL;
+       }
+       return rc;
+}
+
+int
+gntdev_open(dev_t dev, int flags, int mode, struct lwp *l)
+{
+       struct gntproc *proc;
+       struct file *fp;
+       int fd, rc;
+
+       rc = fd_allocfile(&fp, &fd);
+       if (rc)
+               return rc;
+
+       proc = malloc(sizeof(*proc), M_DEVBUF, M_WAITOK | M_ZERO);
+       mutex_init(&proc->lock, MUTEX_DEFAULT, IPL_NONE);
+       LIST_INIT(&proc->maps);
+       proc->lwp = l;
+       proc->num_maps = 0;
+       debug("opened for proc %p", l);
+       return fd_clone(fp, fd, flags, &gntdev_fileops, proc);
+}
+
+static int
+gntdev_fclose(struct file *fp)
+{
+       struct gntproc *proc = fp->f_data;
+       struct gntmap *map;
+
+       mutex_enter(&proc->lock);
+       while (LIST_FIRST(&proc->maps) != NULL) {
+               map = LIST_FIRST(&proc->maps);
+               mutex_exit(&proc->lock);
+               gntdev_remove_map(proc, map);
+               mutex_enter(&proc->lock);
+       }
+       KASSERT(proc->num_maps == 0);
+       mutex_exit(&proc->lock);
+       mutex_destroy(&proc->lock);
+       debug("closed device for proc %p", proc->lwp);
+       free(proc, M_DEVBUF);
+       return 0;
+}
+
+void
+gntdevattach(int n)
+{
+       debug("attached");
+       return;
+}
diff --git a/sys/dev/DEVNAMES b/sys/dev/DEVNAMES
index 45cf018..765fe45 100644
--- a/sys/dev/DEVNAMES
+++ b/sys/dev/DEVNAMES
@@ -1517,6 +1517,7 @@ xdc                       MI
 xdc                    sun3
 xe                     next68k
 xel                    x68k
+gntdev                 xen
 xencons                        xen
 xenevt                 xen
 xennet                 xen
diff --git a/sys/rump/librump/rumpkern/devsw.c 
b/sys/rump/librump/rumpkern/devsw.c
index 5a1af01..e513885 100644
--- a/sys/rump/librump/rumpkern/devsw.c
+++ b/sys/rump/librump/rumpkern/devsw.c
@@ -134,6 +134,7 @@ struct devsw_conv devsw_conv0[] = {
        { "rd", 22, 105, DEVNODE_DONTBOTHER, 0, { 0, 0 }},
        { "ct", 23, 106, DEVNODE_DONTBOTHER, 0, { 0, 0 }},
        { "mt", 24, 107, DEVNODE_DONTBOTHER, 0, { 0, 0 }},
+       { "gntdev", -1, 140, DEVNODE_DONTBOTHER, 0, { 0, 0 }},
        { "xenevt", -1, 141, DEVNODE_DONTBOTHER, 0, { 0, 0 }},
        { "xbd", 142, 142, DEVNODE_DONTBOTHER, 0, { 0, 0 }},
        { "xencons", -1, 143, DEVNODE_DONTBOTHER, 0, { 0, 0 }},
-- 
1.7.7.5 (Apple Git-26)



Home | Main Index | Thread Index | Old Index