Subject: Recursive PT mapping strangeness?
To: None <port-arm@netbsd.org>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: port-arm
Date: 02/05/2002 19:44:27
There are bogons lurking in the ARM pmap... (wow, now there's an
understatement...)

Attached is the patch I committed today that makes the ARM pmap
use vm_page_md rather than pmap_physseg to store pmap-specific
data on a per-page basis.  vm_page_md is much more efficient, as
the data is provided to the pmap directly in calls that involve
managed pages, eliminating the need to look up the data somewhere
else.

This worked fine on my XScale board, and it seemed to work fine on
my Shark.  However, I guess I didn't pound on it hard enough, because
after Chris Gilbert said he was having trouble w/ his CATS after the
change, I checked on my Shark again, and poof, putting the Shark under
heavy memory load caused it to crash.

Chris and I seem to be having similar (though not quite identical)
failure modes.

I'm crashing in pmap_handled_emulation(), here:

        /* Get the pte */
        pte = pmap_pte(pmap, va);
        if (!pte) {
                PDEBUG(2, printf("no pte\n"));
                return(0); 
        }
 
        PDEBUG(1, printf("*pte=%08x\n", *pte)); 
  
        /* Check for a zero pte */
        if (*pte == 0) 
                return(0);

The "*pte" faults.

Chris is losing in pmap_clearbit():

                pte = pmap_pte(pv->pv_pmap, va);
                KASSERT(pte != NULL);
                if (maskbits & (PT_Wr|PT_M)) {
                        if ((pv->pv_flags & PT_NC)) {
                                /*
                                 * Entry is not cacheable: reenable
                                 * the cache, nothing to flush

The KASSERT() fails.

Now, on my Shark, the fault looks like:

[u]vm_fault(0xf01c0eec, effc3000, 1, 0) -> e
Unhandled trap (frame = 0xf5366d7c)
Data abort: 'Translation fault (page)' status=007 address=effc3200 PC=f0166224
Stopped in pid 6 (pagedaemon) at        0xf0166224:     ldr     r0, [r8, #0x0000
]
db> 

This says to me that the recursive PT mapping is somehow screwed up.

Now, for recursive PT mappings to work properly, both the L1 and L2
descriptors must have the same format.  This is not generally the case
with ARM PTEs.

Is there an app note or any papers put out by the Gods Of ARM that
describe suggested methods of page table access for ARM CPUs?  I'm
beginning to suspect that whatever strategy we're using is fundamentally
flawed...

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>