NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)



The following reply was made to PR port-xen/45975; it has been noted by GNATS.

From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: "Cherry G. Mathew" <cherry%zyx.in@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, port-xen-maintainer%NetBSD.org@localhost,
        gnats-admin%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost
Subject: Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update
 failed, ret: -22 on -current MP domU (amd64)
Date: Tue, 14 Feb 2012 19:19:44 +0100

 On Tue, Feb 14, 2012 at 11:41:02PM +0530, Cherry G. Mathew wrote:
 > Hi Manuel,
 > 
 > 
 > > I've made progress on this; I think I understood what's going on and
 > > I have a fix.
 > >
 > >
 > > The page is inded still used as a page table; it's still in another CPU's
 > > ci_kpm_pdir. The reason is that xen_kpm_sync() is not working as expected,
 > > leading to races between CPUs.
 > > 1 the check (xpq_cpu != &x86_curcpu) is always false because we
 > >   have different x86_curcpu symbols with different addresses in the kernel.
 > >   Fortunably, all addresses dissaemble to the same code.
 > 
 > Ok, this is my messup - I was looking for a way to not use %gs/%fs
 > before they are setup via cpu_init_msrs(). The reason it doesn't work,
 > as dh pointed out elsewhere, is because x86_curcpu() is defined in a
 > header as static inline.
 > 
 > 
 > 
 > 
 > >
 > > 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
 > >   needs to know on which CPU a pmap is loaded *now*:
 > >   pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
 > >   to a new pmap, leaving a window where a pmap is still in a CPU's
 > >   ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
 > >   by the hypervisor at any time, it can be large enough to let another
 > >   CPU free the PTP and reuse it as a normal page.
 > >
 > 
 > Makes sense.
 > 
 > >
 > > To fix 2) I choose to avoid cross-calls and IPIs completely, and instead
 > > use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
 > > It's safe because we just need to update the table page, a tlbflush IPI 
 > > will
 > > happen later. As a side effect, we don't need a different code for 
 > > bootstrap.
 > >
 > >
 > > to fix 3), I introduced a pm_xen_ptp_cpus which is updated from
 > > cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
 > > ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
 > >
 > >
 > > While there I removed the unused pmap_is_active() function;
 > > and added some more details to DIAGNOSTIC panics.
 > >
 > 
 > Can we add these broken down into separate patch sets ?
 
 Not easy, they're touching the same part of the code.
 
 > 
 > >
 > > The attached patch implements this; it has been tested on 4-CPUs domU
 > > (on a physical dual-core box, so the race described in 2) is more likely to
 > > happen) and I couldn't trigger the panic any more (a build.sh -j8 release
 > > would never complete before for me).
 > >
 > 
 > ...
 > 
 > > @@ -387,68 +360,30 @@ pmap_kpm_sync_xcall(void *arg1, void *ar
 > >  void
 > >  xen_kpm_sync(struct pmap *pmap, int index)
 > >  {
 > > -      uint64_t where;
 > > +      CPU_INFO_ITERATOR cii;
 > > +      struct cpu_info *ci;
 > >
 > >
 > >        KASSERT(pmap != NULL);
 > >
 > >
 > >        pmap_pte_flush();
 > >
 > >
 > > -      if (__predict_false(xpq_cpu != &x86_curcpu)) { /* Too early to 
 > > xcall */
 > > -              CPU_INFO_ITERATOR cii;
 > > -              struct cpu_info *ci;
 > > -              int s = splvm();
 > > -              for (CPU_INFO_FOREACH(cii, ci)) {
 > > -                      if (ci == NULL) {
 > > -                              continue;
 > > -                      }
 > > -                      if (pmap == pmap_kernel() ||
 > > -                          ci->ci_cpumask & pmap->pm_cpus) {
 > > -                              pmap_kpm_setpte(ci, pmap, index);
 > > -                      }
 > > +      for (CPU_INFO_FOREACH(cii, ci)) {
 > > +              if (ci == NULL) {
 > > +                      continue;
 > >                }
 > > -              pmap_pte_flush();
 > > -              splx(s);
 > > -              return;
 > > -      }
 > > -
 > > -      if (pmap == pmap_kernel()) {
 > > -              where = xc_broadcast(XC_HIGHPRI,
 > > -                  pmap_kpm_sync_xcall, pmap, &index);
 > > -              xc_wait(where);
 > > -      } else {
 > > -              KASSERT(mutex_owned(pmap->pm_lock));
 > > -              KASSERT(kpreempt_disabled());
 > > -
 > > -              CPU_INFO_ITERATOR cii;
 > > -              struct cpu_info *ci;
 > > -              for (CPU_INFO_FOREACH(cii, ci)) {
 > > -                      if (ci == NULL) {
 > > -                              continue;
 > > -                      }
 > > -                      while (ci->ci_cpumask & pmap->pm_cpus) {
 > > -#ifdef MULTIPROCESSOR
 > > -#define CPU_IS_CURCPU(ci) __predict_false((ci) == curcpu())
 > > -#else /* MULTIPROCESSOR */
 > > -#define CPU_IS_CURCPU(ci) __predict_true((ci) == curcpu())
 > > -#endif /* MULTIPROCESSOR */
 > > -#if 0 /* XXX: Race with remote pmap_load() */
 > > -                              if (ci->ci_want_pmapload &&
 > > -                                  !CPU_IS_CURCPU(ci)) {
 > > -                                      /*
 > > -                                       * XXX: make this more cpu
 > > -                                       *  cycle friendly/co-operate
 > > -                                       *  with pmap_load()
 > > -                                       */
 > > -                                      continue;
 > > -                                  }
 > > -#endif /* 0 */
 > > -                              where = xc_unicast(XC_HIGHPRI, 
 > > pmap_kpm_sync_xcall,
 > > -                                  pmap, &index, ci);
 > > -                              xc_wait(where);
 > > -                              break;
 > > -                      }
 > > +              if (pmap != pmap_kernel() &&
 > > +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) == 0)
 > > +                      continue;
 > > +
 > > +              /* take the lock and check again */
 > > +              mutex_enter(&ci->ci_kpm_mtx);
 > > +              if (pmap == pmap_kernel() ||
 > > +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) != 0) {
 > > +                      pmap_kpm_setpte(ci, pmap, index);
 > 
 > Isn't a tlb shootdown needed after this, to make sure the old pte is not
 > referred to in the TLB ?
 
 The pmap code does it. Plus, I suspect that the hypervisor does tlb
 flushes itself, to make sure you can't write accidentally to a ptp page
 using stale TLB entries.
 
 > 
 > Separately, please note that the tlb shootdown code ignores requests to
 > shootdown when the pmap is being destroyed. 
 
 this shouldn't be a problem.
 
 > 
 > http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap_tlb.c#222
 > 
 > I have a hypothesis that stale TLB entries involving PTPs in the
 > ci_kpm_pdir[] may be responsible for the "other" bug that riz@ is seeing
 > ?
 
 I don't think so. The bug is that a new, free page is still registered
 as PTP to the hypervisor. It's not a TLB issue (this is in the CPU's cache,
 not in the hypervisor's data structure). I don't know yet if it's because
 NetBSD freed a page without clearing the PTE (this would be a NetBSD
 bug) or if Xen didn't track it properly (this would be a hypervisor bug).
 
 -- 
 Manuel Bouyer <bouyer%antioche.eu.org@localhost>
      NetBSD: 26 ans d'experience feront toujours la difference
 --
 


Home | Main Index | Thread Index | Old Index