NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)
On Tue, Feb 14, 2012 at 11:41:02PM +0530, Cherry G. Mathew wrote:
> Hi Manuel,
>
>
> > I've made progress on this; I think I understood what's going on and
> > I have a fix.
> >
> >
> > The page is inded still used as a page table; it's still in another CPU's
> > ci_kpm_pdir. The reason is that xen_kpm_sync() is not working as expected,
> > leading to races between CPUs.
> > 1 the check (xpq_cpu != &x86_curcpu) is always false because we
> > have different x86_curcpu symbols with different addresses in the kernel.
> > Fortunably, all addresses dissaemble to the same code.
>
> Ok, this is my messup - I was looking for a way to not use %gs/%fs
> before they are setup via cpu_init_msrs(). The reason it doesn't work,
> as dh pointed out elsewhere, is because x86_curcpu() is defined in a
> header as static inline.
>
>
>
>
> >
> > 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
> > needs to know on which CPU a pmap is loaded *now*:
> > pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
> > to a new pmap, leaving a window where a pmap is still in a CPU's
> > ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
> > by the hypervisor at any time, it can be large enough to let another
> > CPU free the PTP and reuse it as a normal page.
> >
>
> Makes sense.
>
> >
> > To fix 2) I choose to avoid cross-calls and IPIs completely, and instead
> > use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
> > It's safe because we just need to update the table page, a tlbflush IPI will
> > happen later. As a side effect, we don't need a different code for
> > bootstrap.
> >
> >
> > to fix 3), I introduced a pm_xen_ptp_cpus which is updated from
> > cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
> > ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
> >
> >
> > While there I removed the unused pmap_is_active() function;
> > and added some more details to DIAGNOSTIC panics.
> >
>
> Can we add these broken down into separate patch sets ?
Not easy, they're touching the same part of the code.
>
> >
> > The attached patch implements this; it has been tested on 4-CPUs domU
> > (on a physical dual-core box, so the race described in 2) is more likely to
> > happen) and I couldn't trigger the panic any more (a build.sh -j8 release
> > would never complete before for me).
> >
>
> ...
>
> > @@ -387,68 +360,30 @@ pmap_kpm_sync_xcall(void *arg1, void *ar
> > void
> > xen_kpm_sync(struct pmap *pmap, int index)
> > {
> > - uint64_t where;
> > + CPU_INFO_ITERATOR cii;
> > + struct cpu_info *ci;
> >
> >
> > KASSERT(pmap != NULL);
> >
> >
> > pmap_pte_flush();
> >
> >
> > - if (__predict_false(xpq_cpu != &x86_curcpu)) { /* Too early to xcall
> > */
> > - CPU_INFO_ITERATOR cii;
> > - struct cpu_info *ci;
> > - int s = splvm();
> > - for (CPU_INFO_FOREACH(cii, ci)) {
> > - if (ci == NULL) {
> > - continue;
> > - }
> > - if (pmap == pmap_kernel() ||
> > - ci->ci_cpumask & pmap->pm_cpus) {
> > - pmap_kpm_setpte(ci, pmap, index);
> > - }
> > + for (CPU_INFO_FOREACH(cii, ci)) {
> > + if (ci == NULL) {
> > + continue;
> > }
> > - pmap_pte_flush();
> > - splx(s);
> > - return;
> > - }
> > -
> > - if (pmap == pmap_kernel()) {
> > - where = xc_broadcast(XC_HIGHPRI,
> > - pmap_kpm_sync_xcall, pmap, &index);
> > - xc_wait(where);
> > - } else {
> > - KASSERT(mutex_owned(pmap->pm_lock));
> > - KASSERT(kpreempt_disabled());
> > -
> > - CPU_INFO_ITERATOR cii;
> > - struct cpu_info *ci;
> > - for (CPU_INFO_FOREACH(cii, ci)) {
> > - if (ci == NULL) {
> > - continue;
> > - }
> > - while (ci->ci_cpumask & pmap->pm_cpus) {
> > -#ifdef MULTIPROCESSOR
> > -#define CPU_IS_CURCPU(ci) __predict_false((ci) == curcpu())
> > -#else /* MULTIPROCESSOR */
> > -#define CPU_IS_CURCPU(ci) __predict_true((ci) == curcpu())
> > -#endif /* MULTIPROCESSOR */
> > -#if 0 /* XXX: Race with remote pmap_load() */
> > - if (ci->ci_want_pmapload &&
> > - !CPU_IS_CURCPU(ci)) {
> > - /*
> > - * XXX: make this more cpu
> > - * cycle friendly/co-operate
> > - * with pmap_load()
> > - */
> > - continue;
> > - }
> > -#endif /* 0 */
> > - where = xc_unicast(XC_HIGHPRI,
> > pmap_kpm_sync_xcall,
> > - pmap, &index, ci);
> > - xc_wait(where);
> > - break;
> > - }
> > + if (pmap != pmap_kernel() &&
> > + (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) == 0)
> > + continue;
> > +
> > + /* take the lock and check again */
> > + mutex_enter(&ci->ci_kpm_mtx);
> > + if (pmap == pmap_kernel() ||
> > + (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) != 0) {
> > + pmap_kpm_setpte(ci, pmap, index);
>
> Isn't a tlb shootdown needed after this, to make sure the old pte is not
> referred to in the TLB ?
The pmap code does it. Plus, I suspect that the hypervisor does tlb
flushes itself, to make sure you can't write accidentally to a ptp page
using stale TLB entries.
>
> Separately, please note that the tlb shootdown code ignores requests to
> shootdown when the pmap is being destroyed.
this shouldn't be a problem.
>
> http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap_tlb.c#222
>
> I have a hypothesis that stale TLB entries involving PTPs in the
> ci_kpm_pdir[] may be responsible for the "other" bug that riz@ is seeing
> ?
I don't think so. The bug is that a new, free page is still registered
as PTP to the hypervisor. It's not a TLB issue (this is in the CPU's cache,
not in the hypervisor's data structure). I don't know yet if it's because
NetBSD freed a page without clearing the PTE (this would be a NetBSD
bug) or if Xen didn't track it properly (this would be a hypervisor bug).
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index