Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)

To: bouyer%NetBSD.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, riz%NetBSD.org@localhost
Subject: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)
From: Cherry G. Mathew <cherry%zyx.in@localhost>
Date: Tue, 14 Feb 2012 18:15:03 +0000 (UTC)

The following reply was made to PR port-xen/45975; it has been noted by GNATS.

From: Cherry G. Mathew <cherry%zyx.in@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: port-xen-maintainer%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost, 
netbsd-bugs%NetBSD.org@localhost
Subject: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: 
-22 on -current MP domU (amd64)
Date: Tue, 14 Feb 2012 23:41:02 +0530

 Hi Manuel,
 
 
 > I've made progress on this; I think I understood what's going on and
 > I have a fix.
 >
 >
 > The page is inded still used as a page table; it's still in another CPU's
 > ci_kpm_pdir. The reason is that xen_kpm_sync() is not working as expected,
 > leading to races between CPUs.
 > 1 the check (xpq_cpu != &x86_curcpu) is always false because we
 >   have different x86_curcpu symbols with different addresses in the kernel.
 >   Fortunably, all addresses dissaemble to the same code.
 
 Ok, this is my messup - I was looking for a way to not use %gs/%fs
 before they are setup via cpu_init_msrs(). The reason it doesn't work,
 as dh pointed out elsewhere, is because x86_curcpu() is defined in a
 header as static inline.
 
 
 
 
 >
 > 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
 >   needs to know on which CPU a pmap is loaded *now*:
 >   pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
 >   to a new pmap, leaving a window where a pmap is still in a CPU's
 >   ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
 >   by the hypervisor at any time, it can be large enough to let another
 >   CPU free the PTP and reuse it as a normal page.
 >
 
 Makes sense.
 
 >
 > To fix 2) I choose to avoid cross-calls and IPIs completely, and instead
 > use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
 > It's safe because we just need to update the table page, a tlbflush IPI will
 > happen later. As a side effect, we don't need a different code for bootstrap.
 >
 >
 > to fix 3), I introduced a pm_xen_ptp_cpus which is updated from
 > cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
 > ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
 >
 >
 > While there I removed the unused pmap_is_active() function;
 > and added some more details to DIAGNOSTIC panics.
 >
 
 Can we add these broken down into separate patch sets ?
 
 >
 > The attached patch implements this; it has been tested on 4-CPUs domU
 > (on a physical dual-core box, so the race described in 2) is more likely to
 > happen) and I couldn't trigger the panic any more (a build.sh -j8 release
 > would never complete before for me).
 >
 
 ...
 
 > @@ -387,68 +360,30 @@ pmap_kpm_sync_xcall(void *arg1, void *ar
 >  void
 >  xen_kpm_sync(struct pmap *pmap, int index)
 >  {
 > -      uint64_t where;
 > +      CPU_INFO_ITERATOR cii;
 > +      struct cpu_info *ci;
 >
 >
 >        KASSERT(pmap != NULL);
 >
 >
 >        pmap_pte_flush();
 >
 >
 > -      if (__predict_false(xpq_cpu != &x86_curcpu)) { /* Too early to xcall 
 > */
 > -              CPU_INFO_ITERATOR cii;
 > -              struct cpu_info *ci;
 > -              int s = splvm();
 > -              for (CPU_INFO_FOREACH(cii, ci)) {
 > -                      if (ci == NULL) {
 > -                              continue;
 > -                      }
 > -                      if (pmap == pmap_kernel() ||
 > -                          ci->ci_cpumask & pmap->pm_cpus) {
 > -                              pmap_kpm_setpte(ci, pmap, index);
 > -                      }
 > +      for (CPU_INFO_FOREACH(cii, ci)) {
 > +              if (ci == NULL) {
 > +                      continue;
 >                }
 > -              pmap_pte_flush();
 > -              splx(s);
 > -              return;
 > -      }
 > -
 > -      if (pmap == pmap_kernel()) {
 > -              where = xc_broadcast(XC_HIGHPRI,
 > -                  pmap_kpm_sync_xcall, pmap, &index);
 > -              xc_wait(where);
 > -      } else {
 > -              KASSERT(mutex_owned(pmap->pm_lock));
 > -              KASSERT(kpreempt_disabled());
 > -
 > -              CPU_INFO_ITERATOR cii;
 > -              struct cpu_info *ci;
 > -              for (CPU_INFO_FOREACH(cii, ci)) {
 > -                      if (ci == NULL) {
 > -                              continue;
 > -                      }
 > -                      while (ci->ci_cpumask & pmap->pm_cpus) {
 > -#ifdef MULTIPROCESSOR
 > -#define CPU_IS_CURCPU(ci) __predict_false((ci) == curcpu())
 > -#else /* MULTIPROCESSOR */
 > -#define CPU_IS_CURCPU(ci) __predict_true((ci) == curcpu())
 > -#endif /* MULTIPROCESSOR */
 > -#if 0 /* XXX: Race with remote pmap_load() */
 > -                              if (ci->ci_want_pmapload &&
 > -                                  !CPU_IS_CURCPU(ci)) {
 > -                                      /*
 > -                                       * XXX: make this more cpu
 > -                                       *  cycle friendly/co-operate
 > -                                       *  with pmap_load()
 > -                                       */
 > -                                      continue;
 > -                                  }
 > -#endif /* 0 */
 > -                              where = xc_unicast(XC_HIGHPRI, 
 > pmap_kpm_sync_xcall,
 > -                                  pmap, &index, ci);
 > -                              xc_wait(where);
 > -                              break;
 > -                      }
 > +              if (pmap != pmap_kernel() &&
 > +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) == 0)
 > +                      continue;
 > +
 > +              /* take the lock and check again */
 > +              mutex_enter(&ci->ci_kpm_mtx);
 > +              if (pmap == pmap_kernel() ||
 > +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) != 0) {
 > +                      pmap_kpm_setpte(ci, pmap, index);
 
 Isn't a tlb shootdown needed after this, to make sure the old pte is not
 referred to in the TLB ?
 
 Separately, please note that the tlb shootdown code ignores requests to
 shootdown when the pmap is being destroyed. 
 
 http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap_tlb.c#222
 
 I have a hypothesis that stale TLB entries involving PTPs in the
 ci_kpm_pdir[] may be responsible for the "other" bug that riz@ is seeing
 ?
 
 Cheers,
 
 Cherry.

Prev by Date: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)
Next by Date: Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)
Previous by Thread: Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)
Next by Thread: Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)
Indexes:

Home | Main Index | Thread Index | Old Index