tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Problems with UVM pagefaults

On 09/01/13 17:01, Chuck Silvers wrote:
> On Tue, Jan 08, 2013 at 08:16:30PM +0100, Roger Pau Monn wrote:
>> So far I've been able to get the ptes, pass them to Xen and stablish the
>> mapping. Writing to that memory area from userspace seems to work fine
>> (using pread), but the problem comes when the userspace program executes
>> something like:
>> pwrite(fd, buf,...)
>> Where "buf" is a region of the memory mapped by the gnt device. This
>> triggers a page fault in UVM, and this fault will try to modify the pte
>> of the mapped memory region. This pte should not be modified, because if
>> we modify the content of the pte, Xen will probably complain and crash,
>> and if Xen doesn't crash we won't be able to unmap the pte later on,
>> since the pte doesn't contain the value that Xen expects.
>> I've added a little hack to my gnt device, to be able to know who is
>> trying to change the content of the pte, and I've got the following trace:
>> breakpoint() at netbsd:breakpoint+0x5
>> vpanic() at netbsd:vpanic+0x1f2
>> printf_nolog() at netbsd:printf_nolog
>> xpq_flush_queue() at netbsd:xpq_flush_queue+0x180
>> pmap_enter_ma() at netbsd:pmap_enter_ma+0x5c1
>> pmap_enter() at netbsd:pmap_enter+0x35
>> uvm_fault_upper_enter.clone.4() at
>> netbsd:uvm_fault_upper_enter.clone.4+0x22a
>> uvm_fault_internal() at netbsd:uvm_fault_internal+0x28f4
>> uvm_fault_wire() at netbsd:uvm_fault_wire+0x53
>> genfs_directio() at netbsd:genfs_directio+0x16a
>> ffs_write() at netbsd:ffs_write+0x43a
>> VOP_WRITE() at netbsd:VOP_WRITE+0x55
>> vn_write() at netbsd:vn_write+0xf9
>> do_filewritev() at netbsd:do_filewritev+0x1fd
>> sys_pwritev() at netbsd:sys_pwritev+0x2b
>> syscall() at netbsd:syscall+0x94
>> --- syscall (number 290) ---
>> Is there anyway to prevent UVM from faulting? The address on that VA is
>> already set AFAIK, but I don't know almost anything about how UVM works,
>> so I would like to ask if someone could help me with that.
>> I'm attaching the code of the gntdev, the main function that contains
>> interesting code is gntmap_grant_ref, that's where I try to get the ptes
>> and set the mapping. This is not finished code, but I would like to
>> understand why this page faults happen, and how can I solve this problem.
> hi roger,

Hi Chuck, thanks for the description, it has been really helpful!

> the problem is that UVM doesn't expect PTEs to be modified behind its back
> like this.  operations that want memory to be wired (such as the O_DIRECT
> I/O that you're doing above) will simulate page faults on the range
> in order to update the page wired counters, and this ends up replacing
> the pmap entries (ie. PTEs on x86) with what it thinks should be there,
> which are the original MAP_ANON pages instead of the gntdev ones.
> there's currently no way to prevent this from happening.
> it would be better if we could arrange for the PTEs that are created by
> the normal UVM page-fault logic would be the ones that are actually wanted.
> one way of doing that would be to use a mapping of a character device
> instead of a MAP_ANON mapping.  I see that you've got a new "gntdev" device
> driver in your patch, you can just mmap that and have the d_mmap method
> return the paddr_t values for the PTEs that you want.  the userspace side
> of this would change a bit, instead of creating a memory mapping with mmap
> and then replacing what that mapping points to with an ioctl, the new approach
> would call the ioctl first to establish the device-offset-to-PTE mapping
> inside the gntdev driver and return the new device offset to the user code,
> then mmap the gntdev device with the offset returned from the ioctl.
> will this approach work for the other components of this system?

That's how is done in Linux, and that was my first attempt to implement
this device under NetBSD. However, I've also found problems with this
implementation. In d_mmap you have to return physical addresses, the
problem with that is that Xen has physical addresses and machine
addresses, and there's a mapping between physical and machine addresses
(physical addresses are used as a translation to machine addresses, so
the OS thinks memory is contiguous, when it is not).

When returning physical addresses from the d_mmap handler I've found
that the memory system extracts the machine address behind that physical
address and tries to assign that machine address to a VA in the calling
process virtual memory. This did not work, because the machine address
behind the physical address returned in the handler doesn't belong to
the same domain, so trying to map this to a different VA in userspace is
not allowed (Xen only allows this by using a set of specific hypercalls,
as you can see in gntmap_grant_ref).

Is there anyway to obtain the VA or pte that's going to be used in
userspace from inside the d_mmap handler? If so, I could do the mapping
myself inside the d_mmap handler.

> if we can't (or don't want to) change the userspace part of this, then
> another way to achieve this goal would be to have the ioctl that changes
> the PTEs also change the UVM mapping to match, ie. to be a mapping of
> the right offset of the gntdev device.  then any simulated page faults
> on that mapping would do the right thing, ie. not change the PTEs
> since the existing ones already have the values that UVM wants.

Is there any example of another device driver that does this? I don't
have much idea about how to manipulate the UVM region (although having
option 1 working would be preferred).

> it would be best to abstract any code that manipulates vm_map structures
> into the UVM code proper, and similarly move the code that manipulates
> the pmap datastructures into pmap.c or xen_pmap.c.

Yes, I've just put everything inside the device driver to have all the
code contained in the for now, so it is easier to understand.

Home | Main Index | Thread Index | Old Index