Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: earmhf issues on Beaglebone Black



On Thu, Jul 24, 2014 at 03:21:57PM -0700, Matt Thomas wrote:
> 
> On Jul 24, 2014, at 2:25 PM, Manuel Bouyer <bouyer%antioche.eu.org@localhost> 
> wrote:
> 
> > I see the L1 data cache is write-back; would it be possible that
> > the MMU table walk engine reads directly from L2 or RAM and doesn't
> > see the value still in the L1 cache ?
> > The problem seems to happen mostly with new pmaps. If the new first-level
> > table has been zeroed out but not flushed, the MMU could get stale data from
> > RAM ...
> 
> That's what PTE_SYNC is for.

Yes, when a PTE is added or updated.
Here's what I think happens:
when a new process is started, pmap_create() calls pmap_alloc_l1() to get the
L1 page. For this, on this CPU uvm_pagealloc(NULL, 0, NULL, UVM_PGA_ZERO)
is used. uvm_pagealloc() may then use pmap_zero_page_generic() to zero out the
page, if no zero-initialized page is already available.
On this CPU, pmap_zero_page_generic() will use the direct map to bzero_page()
the page. This will zero out the page in the cache, but not in the RAM,
until the cache lines are flushed, so the MMU may read stale data from
the RAM. A cache write back is missing for this page before using it
as page directory.

Note that L2 allocations have the cache write back in pmap_l2ptp_ctor().
I would expect that for L2 pages put back in the cache they already
have been properly written back to RAM (as all PTE would have been invalidated
one by one) but I haven't double-checked this.

The RPI CPU has a VIVT L1 data cache, which could explain the
behavior difference (pmap_zero_page_generic() will behaves differently).
But I'm still surprised that it didn't cause more problems.

With the attached patch I've been able to build several packages with the BB
white booted multiuser, without error messages or spurious signals.
This has never been possible for me without this patch.
But I won't commit it as is as I'm not sure it won't cause extra write
back that could be avoided (it's hard to tell, there's a lot of #ifdef
here ...).

> 
> > BTW, is the L2 cache enabled ? I couldn't find code to enable it ...
> 
> L2 is write-through.

Yes, I've noticed this. My question was more "can we get some performance
boost by writing code to enable it?"

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--
Index: pmap.c
===================================================================
RCS file: /cvsroot/src/sys/arch/arm/arm32/pmap.c,v
retrieving revision 1.294
diff -u -p -u -r1.294 pmap.c
--- pmap.c      15 Jun 2014 04:04:01 -0000      1.294
+++ pmap.c      25 Jul 2014 14:37:22 -0000
@@ -1307,16 +1307,19 @@ pmap_alloc_l1(pmap_t pm)
        vaddr_t va = pmap_direct_mapped_phys(pm->pm_l1_pa, &ok, 0xdeadbeef);
        KASSERT(ok);
        KASSERT(va >= KERNEL_BASE);
+       /* the page may not have been flushed to ram by pmap_zero_page */
+       cpu_dcache_wb_range(va, PAGE_SIZE);
 
-#else
+
+#else /* __HAVE_MM_MD_DIRECT_MAPPED_PHYS */
        KASSERTMSG(kernel_map != NULL, "pm %p", pm);
        vaddr_t va = uvm_km_alloc(kernel_map, PAGE_SIZE, 0,
            UVM_KMF_WIRED|UVM_KMF_ZERO);
        KASSERT(!va);
        pmap_extract(pmap_kernel(), va, &pm->pm_l1_pa);
-#endif
+#endif /* !__HAVE_MM_MD_DIRECT_MAPPED_PHYS */
        pm->pm_l1 = (pd_entry_t *)va;
-#else
+#else /* ARM_MMU_EXTENDED */
        struct l1_ttable *l1;
        uint8_t domain;
 
@@ -1349,7 +1352,7 @@ pmap_alloc_l1(pmap_t pm)
         */
        pm->pm_l1 = l1;
        pm->pm_domain = domain + 1;
-#endif
+#endif /* !ARM_MMU_EXTENDED */
 }
 
 /*


Home | Main Index | Thread Index | Old Index