Subject: Re: pmap problems in netbsd-4 on arm omap
To: None <tech-kern@netbsd.org>
From: Bucky Katz <bucky@picovex.com>
List: tech-kern
Date: 09/05/2007 14:47:30
Bucky Katz <bucky@picovex.com> writes:

I now have more detail on the problem, attached next, and a concrete
question:

are the mappings always supposed to be anonymous for asynch?

----------------------------------------------------------------------

The problem seems to always happen wheh the fault handler promotes a
page shortly after asynch io unbusy-s a bunch of pages.  I can't
figure out how asynch io is supposed to remove mappings at pmap
level.  Are the mappings always supposed to be anonymous for asynch
io?

I see the function uvm_aio_aiodone modifies the pages that can be
freed here.

/*
 * do accounting for pagedaemon i/o and arrange to free
 * the pages instead of just unbusying them.
 */
if (pg->flags & PG_PAGEOUT) {
        pg->flags &= ~PG_PAGEOUT;
        uvmexp.paging--;
        uvmexp.pdfreed++;
        pg->flags |= PG_RELEASED;
}

and then unbusy is called, which calls uvm_pagefree on the pages with
PG_RELEASED set

if (pg->flags & PG_RELEASED) {
        UVMHIST_LOG(ubchist, "releasing pg %p", pg,0,0,0);
        KASSERT(pg->uobject != NULL ||
            (pg->uanon != NULL && pg->uanon->an_ref > 0));
        pg->flags &= ~PG_RELEASED;

        // XXXX debug added XXXX
        if (pmap_has_mappings(pg->phys_addr)) {
                printf("uvm_page_unbusy : physical page 0x%08x has
        mappings\n", (unsigned int)pg->phys_addr);
        }

        uvm_pagefree(pg);
}

then returns - no pmap_remove() as far as I can tell. 

Here is my debug log, and the panic

uvm_page_unbusy : physical page 0x1107b000 has mappings
XXXXX : placing page 0x1107b000 with mappings on freelist
uvm_page_unbusy : physical page 0x1107a000 has mappings
XXXXX : placing page 0x1107a000 with mappings on freelist
uvm_page_unbusy : physical page 0x11079000 has mappings
XXXXX : placing page 0x11079000 with mappings on freelist
uvm_page_unbusy : physical page 0x11078000 has mappings
XXXXX : placing page 0x11078000 with mappings on freelist
uvm_page_unbusy : physical page 0x11077000 has mappings
XXXXX : placing page 0x11077000 with mappings on freelist
uvm_page_unbusy : physical page 0x11076000 has mappings
XXXXX : placing page 0x11076000 with mappings on freelist
uvm_page_unbusy : physical page 0x11075000 has mappings
XXXXX : placing page 0x11075000 with mappings on freelist
uvm_page_unbusy : physical page 0x11074000 has mappings
XXXXX : placing page 0x11074000 with mappings on freelist
uvm_page_unbusy : physical page 0x11073000 has mappings
XXXXX : placing page 0x11073000 with mappings on freelist
uvm_page_unbusy : physical page 0x11072000 has mappings
XXXXX : placing page 0x11072000 with mappings on freelistu
vm_page_unbusy : physical page 0x11071000 has mappings
XXXXX : placing page 0x11071000 with mappings on freelist
uvm_page_unbusy : physical page 0x11070000 has mappings
XXXXX : placing page 0x11070000 with mappings on freelist
uvm_page_unbusy : physical page 0x1106f000 has mappings
XXXXX : placing page 0x1106f000 with mappings on freelist

panic: pmap_zero_page: page (0x11074000) has mappings

0 -> panic+0x110
1 -> pmap_zero_page_generic+0x148
2 -> uvm_pagealloc_strat+0x2a4
3 -> uvmfault_promote+-x168
4 -> uvm_fault_internal+0x12b8
5 -> data_abort_handler+0x31c
6 -> address_exception_entry+0x50
7 -> 0x253815
8 -> 0x11770c
9 -> 0x116210
10 -> 0x1162a8
11 -> 0x159fc4
  


> Hi,
>
> One of our developers is working on a new omap dev board and running
> into problems with pmap issues.  He asked the following questions, and
> I'm afraid I don't know the answers. Any help is most welcome.
>
> He is seeing a pmap panic periodically:
>
> pmap_zero_page: page xxxx has mappings.
>
> Preliminary investigation involved utilizing uvm_hist and indicated a
> physical page first has a managed mappping.  It then got an anonymous
> mapping.  The anonymous mapping was later removed, and the physical
> page was placed on the free queue.  Later, the page is selected from
> uvm_pagealloc_strat, and pmap_zero_page is called, resulting in the
> panic.
>
> Secondary investigation involved trying to add a panic where the
> page is actually placed on the free queue.  The comments for the
> function uvm_pagefree states that it assumes all valid mappings of pg
> are gone.  I created a pmap_has_mappings() function which basically
> returns nonzero if (vm_page*)pg->mdpage.pvh_list != NULL.  This causes
> a panic just starting up /etc/init   
>
> I found that uvm_km_free() for instance, orders the function calls as
> such:
>
>                 uvm_km_pgremove(addr, addr + size);
>                 pmap_remove(pmap_kernel(), addr, addr + size);
>
>
> Therefore, my testing in uvm_pagefree is done prior to pmap removing
> the managed mapping.  Is there a reason for the above ordering?
>
> For testing purposes, I've reversed the order, and I get further
> along, but eventually hit a similar problem in uvm_page_unbusy().  I'm
> working around this, now.  
>
> I'm not sure if any locks are acquired by uvm_km_free() and I'm
> wondering if there may be a locking hole somewhere where a physical
> page may be placed on the free list before its pmap layer mapping is
> removed.  Has anyone encountered such problems before?