tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: AMAP_SHARED (was Re: XIP)



On Mon, Nov 01, 2010 at 02:32:23AM +0900, Masao Uebayashi wrote:
> Hmmmmm.  I may be missing something seriously.  I think I've
> understood the following description you wrote, and I got hit this
> in the very early development stage ... >1 year ago.  It was
> uvm_fault() -> uvmfault_promote() -> amap_add() -> pmap_page_protect()
> Since then I've assumed that shared amap is a pretty much common
> thing...  Now I realize it should not be, as you describe.  Worse,
> I can't reproduce that code path...
> 
> Now I have to *really* understand how this works...

I still have no idea how I was hit by this.  Strange.

Considering how AMAP_SHARED is used, especially it's backed by
vnode, this is only for highly tuned server/client programs which
share initialized data (.data).  While such a use case may have
some value, I'd say this is a rare feature.

I append a test program.

> 
> (I'll respond about this topic again later.)
> 
> Masao
> 
> On Tue, Oct 26, 2010 at 02:06:38PM -0700, Chuck Silvers wrote:
> (snip)
> > now here's the explanation I promised for how to treat XIP pages
> > as unmanaged instead of managed.
> > 
> > first, some background for other people who don't know all this:
> > the only reason that treating XIP pages as managed pages is
> > relevant at all is because of the AMAP_SHARED flag in UVM,
> > which allows anonymous memory to be shared between processes
> > such that the changes made by one process are seen by the other.
> > this impacts XIP pages (which are not anonymous) because a
> > PROT_WRITE, MAP_PRIVATE mapping of an XIP vnode should point to
> > the XIP pages as long as all access to the mapping is for reads,
> > but when the mapping is written to then the XIP page should be
> > copied to an anonymous page (the normal COW operation) but that
> > new anonymous page should still be shared between all processes
> > that are sharing the AMAP_SHARED mapping.  to force those other
> > processes to take another page fault the next time they access
> > their copy of the mapping (which we need to do so that they will
> > start accessing the new anonymous page instead of the XIP page),
> > we must invalidate all the other pmap entries for the XIP page,
> > which we do by calling pmap_page_protect() on it.  the pmap layer
> > tracks all the mappings of the page and thus it can find them all.
> > 
> > there are several ways that the AMAP_SHARED functionality is used,
> > and unfortunately they would need to changed in different ways to
> > make this work for XIP pages without needing to track mappings:
> > 
> >  - uvm_io(), which copies data between the kernel or current process
> >    and an arbitrary other process address space.
> >    currently this works by sharing the other address space with
> >    the kernel via uvm_map_extract() and then just using uiomove()
> >    to transfer the data.  this could be done instead by using part
> >    of the uvm_fault() code to find the physical page in the other
> >    address space that we want to access and lock it (ie. set PG_BUSY),
> >    map the page into the kernel (perhaps using uvm_pager_mapin()),
> >    transfer the data, then unmap the page and unlock it.
> > 
> >  - uvm_mremap(), which resizes an existing mapping.
> >    this uses uvm_map_extract() internally, which uses AMAP_SHARED,
> >    but the mremap operation doesn't actually need the semantics of
> >    AMAP_SHARED since as mremap doesn't create any additional mappings
> >    as far as applications are concerned.  the usage of AMAP_SHARED
> >    is just a side-effect of the current implementation, which bends
> >    over backward to call a bunch of existing higher-level functions
> >    rather than doing something more direct (which would be simpler
> >    and more efficient as well).
> > 
> >  - MAP_INHERIT_SHARE, used to implement minherit().
> >    this is the one that is the most trouble, since it's what AMAP_SHARED
> >    was invented for.  however, it's also of least importance since
> >    some searching with google finds absolutely no evidence of
> >    any application actually using it, just lots of copies of
> >    the implementations and manpages for various operating systems.
> > 
> >    with that in mind, there are several ways this could be handled.
> >     (1) just drop support for minherit() entirely.
> >     (2) reject attempts to set MAP_INHERIT_SHARE on XIP mappings.
> >     (3) copy XIP pages into normal anon pages when setting
> >             MAP_INHERIT_SHARE.
> >     (4) copy XIP pages into normal vnode pages when setting
> >             MAP_INHERIT_SHARE.  this would mean that the getpages
> >             code would need to look in the normal page cache
> >             before using XIP pages.  I think this option would also
> >             need getpages to know about the inherit flag to
> >             correctly handle later faults on XIP mappings,
> >             and there are probably other sublte complications.
> > 
> >    of these choices, (2) sounds like the best compromise to me.
> > 
> > 
> > this approach would also bring back some issues where our previous
> > discussion went around in circles, such as callers of VOP_GETPAGES()
> > wanting vm_page pointers but XIP pages not having them, but
> > I'll leave that additional discussion for future email if necessary.

So you're OK to address this AMAP_SHARED mess after the merge.

I'll come back these interesting topics later.

Masao

> > 
> > I'm just going over all this now so that it's clear to everyone
> > that this kind of approach is possible if the memory overhead of
> > the full vm_page structures for XIP pages is deemed too high.
> > 
> > 
> > -Chuck
> 
> -- 
> Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
/*      $Id$    */

#include <err.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/cdefs.h>
#include <sys/mman.h>

#define SIZE    4096    /* PAGE_SIZE */

char buf_a[SIZE] __aligned(SIZE) = { [0] = 1 };
char buf_b[SIZE] __aligned(SIZE) = { [0] = 1 };

int
main(int ac, char *av[])
{
        int error = 0;
        pid_t pid;
        int status;

        error = minherit(buf_b, SIZE, MAP_INHERIT_SHARE);
        if (error != 0)
                err(1, "minherit(MAP_INHERIT_SHARE)");

        if ((pid = fork()) < 0)
                err(1, "fork");
        switch (pid) {
        case 0:
                sleep(1);
                printf("buf_a[0] = %d\n", buf_a[0]);
                printf("buf_b[0] = %d\n", buf_b[0]);
                break;
        default:
                buf_a[0] = -1;
                buf_b[0] = -1;
                (void)wait(&status);
                break;
        }

        return 0;
}


Home | Main Index | Thread Index | Old Index