Subject: Re: uvm_loan(), page daemon, pmap_kenter
To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 04/06/1999 17:20:57
Jason Thorpe writes:
> Hi folks...
> 
> So, I've been thinking of using uvm_loan() for a few things, including
> zero-copy transmit path of network connections (receive is a little
> harder, and I will think more about that later :-).
> 
> The major problem with uvm_loan() as I see it is that it requires PMAP_NEW
> in order to work properly if you're loaning pages to the kernel.  (It will
> function without it, just not correctly, specifically, when memory is scarce.)

how does !PMAP_NEW break things?
(we should just finish implementing PMAP_NEW everywhere anyway.)


> I discussed this briefly w/ Chuck, who outlined an example of why it's
> required:
> 
> 	1. User loans page from vnode object to kernel (for an mbuf,
> 	   for example).
> 
> 	2. Pages get scarce, page daemon starts scanning active pages.
> 
> 	3. Page daemon decides to deactivate the page (can happen to active
> 	   pages if there is an inactive shortage).  This causes a
> 	   pmap_page_protect(..., VM_PROT_NONE) to happen on the page.
> 
> 	4. Because of the way loaning works (i.e. the kernel loan mappings
> 	   aren't "real", because that would cause really painful map
> 	   fragmentation), the kernel will die when it faults on the
> 	   revoked mapping, because it can't look up the object that
> 	   the page belongs to.
> 
> As I see it, the problem is step #3.  When a page is loaned to the kernel,
> the kernel mapping is wired.  This should cause wire_count to increase for
> the page.  Now, take a look at the following code block from uvmpd_scan():
> 
>                 /*
>                  * deactivate this page if there's a shortage of
>                  * inactive pages.
>                  */
>                 if (inactive_shortage > 0) {
>                         pmap_page_protect(PMAP_PGARG(p), VM_PROT_NONE);
>                         /* no need to check wire_count as pg is "active" */
>                         uvm_pagedeactivate(p);
>                         uvmexp.pddeact++;
>                         inactive_shortage--;
>                 }
> 
> I think that block is code is bogus.  If you look at uvm_pagedeactivate(),
> you'll note that it expects the caller to make sure the page is not wired.
> 
> In fact, uvm_pagedeactivate() includes a DIAGNOSTIC panic for the case
> where wire_count is != 0!  HOWEVER, that case is not triggered here
> because that path is NOT taken if the page was on the active list, which
> is the case here in uvmpd_scan().
> 
> Wired pages, by definition, should not take faults, and given that the page
> was loaned to a wired kernel mapping, the page should not take a fault
> in the kernel.  Therefore, I think that uvmpd_scan() should not deactivate
> pages that are wired.
> 
> I think the following diff is all that's needed to solve this case:
> 
> Index: uvm_pdaemon.c
> ===================================================================
> RCS file: /cvsroot/src/sys/uvm/uvm_pdaemon.c,v
> retrieving revision 1.15
> diff -c -r1.15 uvm_pdaemon.c
> *** uvm_pdaemon.c	1999/03/30 10:12:01	1.15
> --- uvm_pdaemon.c	1999/04/06 19:05:19
> ***************
> *** 1093,1103 ****
>    
>   		/*
>   		 * deactivate this page if there's a shortage of
> ! 		 * inactive pages.
>   		 */
> ! 		if (inactive_shortage > 0) {
>   			pmap_page_protect(PMAP_PGARG(p), VM_PROT_NONE);
> - 			/* no need to check wire_count as pg is "active" */
>   			uvm_pagedeactivate(p);
>   			uvmexp.pddeact++;
>   			inactive_shortage--;
> --- 1093,1102 ----
>    
>   		/*
>   		 * deactivate this page if there's a shortage of
> ! 		 * inactive pages and it's not wired.
>   		 */
> ! 		if (inactive_shortage > 0 && p->wire_count == 0) {
>   			pmap_page_protect(PMAP_PGARG(p), VM_PROT_NONE);
>   			uvm_pagedeactivate(p);
>   			uvmexp.pddeact++;
>   			inactive_shortage--;
> 
> Comments on this part?

wired pages should not be on the active queue (or any paging queue).
uvm_pagewire() deals with this.


> I do think, however, that there is a more general issue to be dealt with.
> 
> What to do with pmap_page_protect(..., VM_PROT_NONE) when there are
> wired mappings for a page?  Some pmaps currently skip revoking a mapping
> (revoking, not removing ... pmap_remove() always works :-) if the mapping
> is wired (usually indicated by a software PTE bit).  Here's an example
> from the hp300 pmap:
> 
> /*      
>  * pmap_page_protect:           [ INTERFACE ]
>  *      
>  *      Lower the permission for all mappings to a given page to
>  *      the permissions specified.
>  */             
> void                
> pmap_page_protect(pa, prot)
>         paddr_t         pa;
>         vm_prot_t       prot;
> {
>         struct pv_entry *pv;
>         int s;  
> 
>         .
>         .
>         .
> 
>         /* [VM_PROT_NONE case] */
>         pv = pa_to_pvh(pa);
>         s = splimp();
>         while (pv->pv_pmap != NULL) {
>                 pt_entry_t *pte;
>  
>                 pte = pmap_pte(pv->pv_pmap, pv->pv_va);
> #ifdef DEBUG    
>                 if (!pmap_ste_v(pv->pv_pmap, pv->pv_va) ||
>                     pmap_pte_pa(pte) != pa)
>                         panic("pmap_page_protect: bad mapping");
> #endif      
>                 if (!pmap_pte_w(pte))
>                         pmap_remove_mapping(pv->pv_pmap, pv->pv_va,
>                                             pte, PRM_TFLUSH|PRM_CFLUSH);
>                 else {
>                         pv = pv->pv_next;
> #ifdef DEBUG
>                         if (pmapdebug & PDB_PARANOIA)
>                                 printf("%s wired mapping for %lx not removed\n",
>                                        "pmap_page_protect:", pa);
> #endif
>                         if (pv == NULL)
>                                 break;
>                 }
>         }
>         splx(s);
> }
> 
> I think this is correct behavior, and in some sense, would prevent part
> of this problem.  All pmaps should probably be modified in this way (actually,
> it might be nice to revisit the pmap_page_protect() interface and change it
> into two functions: pmap_copy_on_write() [which is what it used to be, to
> handle the VM_PROT_READ case] and pmap_revoke() [to handle the VM_PROT_NONE
> case]).  XXX Must consider VM_PROT_EXEC for systems where the I-cache must
> be flushed, e.g. Alpha.  Should instrument pmap_page_protect() to see if
> VM_PROT_EXEC is ever actually passed to that function.

the problem is that pmap_page_protect() is not well-defined for wired pages.
I would argue that pmap_page_protect() should never be called on a wired page,
and that pmap_page_protect() should panic if that happens.  this assertion
probably doesn't play well with the current loaning code, but I think it would
be a good direction to head.

copy-on-write and wired pages don't mix.  wiring a COW page should force
an immediate copy, as should COWing a wired page.  so if pmap_page_protect()
is split, the pmap_copy_on_write() variety should break the COW for wired
pages and the pmap_revoke() variety should panic for wired pages.


> But another thing to consider is what happens when you mark a page as
> copy-on-write while it's loaned to the kernel.  I haven't thought about
> all of those issues yet...  It's certainly wrong to not revoke write perms
> on a page if it's being marked COW, and pmap_kenter gets this wrong.  I
> guess loaned pages are meant to be read-only, yes?

COW and loaning don't mix either, at least not the loan-to-kernel form.
COW depends on the page not being wired, and loan-to-kernel depends on
the page being wired.  can't have both at the same time.

the other forms of loaning could probably be mixed with COW mappings.

loans are basically a fancy form of COW which allow sharing of pages
between different types of vm constructs (uvm_objects, anons, wired
kernel pages).

-Chuck (the other)


>         -- Jason R. Thorpe <thorpej@nas.nasa.gov>