Subject: uvm_loan(), page daemon, pmap_kenter
To: None <tech-kern@netbsd.org>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-kern
Date: 04/06/1999 12:38:03
Hi folks...

So, I've been thinking of using uvm_loan() for a few things, including
zero-copy transmit path of network connections (receive is a little
harder, and I will think more about that later :-).

The major problem with uvm_loan() as I see it is that it requires PMAP_NEW
in order to work properly if you're loaning pages to the kernel.  (It will
function without it, just not correctly, specifically, when memory is scarce.)

I discussed this briefly w/ Chuck, who outlined an example of why it's
required:

	1. User loans page from vnode object to kernel (for an mbuf,
	   for example).

	2. Pages get scarce, page daemon starts scanning active pages.

	3. Page daemon decides to deactivate the page (can happen to active
	   pages if there is an inactive shortage).  This causes a
	   pmap_page_protect(..., VM_PROT_NONE) to happen on the page.

	4. Because of the way loaning works (i.e. the kernel loan mappings
	   aren't "real", because that would cause really painful map
	   fragmentation), the kernel will die when it faults on the
	   revoked mapping, because it can't look up the object that
	   the page belongs to.

As I see it, the problem is step #3.  When a page is loaned to the kernel,
the kernel mapping is wired.  This should cause wire_count to increase for
the page.  Now, take a look at the following code block from uvmpd_scan():

                /*
                 * deactivate this page if there's a shortage of
                 * inactive pages.
                 */
                if (inactive_shortage > 0) {
                        pmap_page_protect(PMAP_PGARG(p), VM_PROT_NONE);
                        /* no need to check wire_count as pg is "active" */
                        uvm_pagedeactivate(p);
                        uvmexp.pddeact++;
                        inactive_shortage--;
                }

I think that block is code is bogus.  If you look at uvm_pagedeactivate(),
you'll note that it expects the caller to make sure the page is not wired.

In fact, uvm_pagedeactivate() includes a DIAGNOSTIC panic for the case
where wire_count is != 0!  HOWEVER, that case is not triggered here
because that path is NOT taken if the page was on the active list, which
is the case here in uvmpd_scan().

Wired pages, by definition, should not take faults, and given that the page
was loaned to a wired kernel mapping, the page should not take a fault
in the kernel.  Therefore, I think that uvmpd_scan() should not deactivate
pages that are wired.

I think the following diff is all that's needed to solve this case:

Index: uvm_pdaemon.c
===================================================================
RCS file: /cvsroot/src/sys/uvm/uvm_pdaemon.c,v
retrieving revision 1.15
diff -c -r1.15 uvm_pdaemon.c
*** uvm_pdaemon.c	1999/03/30 10:12:01	1.15
--- uvm_pdaemon.c	1999/04/06 19:05:19
***************
*** 1093,1103 ****
   
  		/*
  		 * deactivate this page if there's a shortage of
! 		 * inactive pages.
  		 */
! 		if (inactive_shortage > 0) {
  			pmap_page_protect(PMAP_PGARG(p), VM_PROT_NONE);
- 			/* no need to check wire_count as pg is "active" */
  			uvm_pagedeactivate(p);
  			uvmexp.pddeact++;
  			inactive_shortage--;
--- 1093,1102 ----
   
  		/*
  		 * deactivate this page if there's a shortage of
! 		 * inactive pages and it's not wired.
  		 */
! 		if (inactive_shortage > 0 && p->wire_count == 0) {
  			pmap_page_protect(PMAP_PGARG(p), VM_PROT_NONE);
  			uvm_pagedeactivate(p);
  			uvmexp.pddeact++;
  			inactive_shortage--;

Comments on this part?

I do think, however, that there is a more general issue to be dealt with.

What to do with pmap_page_protect(..., VM_PROT_NONE) when there are
wired mappings for a page?  Some pmaps currently skip revoking a mapping
(revoking, not removing ... pmap_remove() always works :-) if the mapping
is wired (usually indicated by a software PTE bit).  Here's an example
from the hp300 pmap:

/*      
 * pmap_page_protect:           [ INTERFACE ]
 *      
 *      Lower the permission for all mappings to a given page to
 *      the permissions specified.
 */             
void                
pmap_page_protect(pa, prot)
        paddr_t         pa;
        vm_prot_t       prot;
{
        struct pv_entry *pv;
        int s;  

        .
        .
        .

        /* [VM_PROT_NONE case] */
        pv = pa_to_pvh(pa);
        s = splimp();
        while (pv->pv_pmap != NULL) {
                pt_entry_t *pte;
 
                pte = pmap_pte(pv->pv_pmap, pv->pv_va);
#ifdef DEBUG    
                if (!pmap_ste_v(pv->pv_pmap, pv->pv_va) ||
                    pmap_pte_pa(pte) != pa)
                        panic("pmap_page_protect: bad mapping");
#endif      
                if (!pmap_pte_w(pte))
                        pmap_remove_mapping(pv->pv_pmap, pv->pv_va,
                                            pte, PRM_TFLUSH|PRM_CFLUSH);
                else {
                        pv = pv->pv_next;
#ifdef DEBUG
                        if (pmapdebug & PDB_PARANOIA)
                                printf("%s wired mapping for %lx not removed\n",
                                       "pmap_page_protect:", pa);
#endif
                        if (pv == NULL)
                                break;
                }
        }
        splx(s);
}

I think this is correct behavior, and in some sense, would prevent part
of this problem.  All pmaps should probably be modified in this way (actually,
it might be nice to revisit the pmap_page_protect() interface and change it
into two functions: pmap_copy_on_write() [which is what it used to be, to
handle the VM_PROT_READ case] and pmap_revoke() [to handle the VM_PROT_NONE
case]).  XXX Must consider VM_PROT_EXEC for systems where the I-cache must
be flushed, e.g. Alpha.  Should instrument pmap_page_protect() to see if
VM_PROT_EXEC is ever actually passed to that function.

But another thing to consider is what happens when you mark a page as
copy-on-write while it's loaned to the kernel.  I haven't thought about
all of those issues yet...  It's certainly wrong to not revoke write perms
on a page if it's being marked COW, and pmap_kenter gets this wrong.  I
guess loaned pages are meant to be read-only, yes?

        -- Jason R. Thorpe <thorpej@nas.nasa.gov>