NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)



On Tue, Feb 14, 2012 at 11:41:02PM +0530, Cherry G. Mathew wrote:
> Hi Manuel,
> 
> 
> > I've made progress on this; I think I understood what's going on and
> > I have a fix.
> >
> >
> > The page is inded still used as a page table; it's still in another CPU's
> > ci_kpm_pdir. The reason is that xen_kpm_sync() is not working as expected,
> > leading to races between CPUs.
> > 1 the check (xpq_cpu != &x86_curcpu) is always false because we
> >   have different x86_curcpu symbols with different addresses in the kernel.
> >   Fortunably, all addresses dissaemble to the same code.
> 
> Ok, this is my messup - I was looking for a way to not use %gs/%fs
> before they are setup via cpu_init_msrs(). The reason it doesn't work,
> as dh pointed out elsewhere, is because x86_curcpu() is defined in a
> header as static inline.
> 
> 
> 
> 
> >
> > 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
> >   needs to know on which CPU a pmap is loaded *now*:
> >   pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
> >   to a new pmap, leaving a window where a pmap is still in a CPU's
> >   ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
> >   by the hypervisor at any time, it can be large enough to let another
> >   CPU free the PTP and reuse it as a normal page.
> >
> 
> Makes sense.
> 
> >
> > To fix 2) I choose to avoid cross-calls and IPIs completely, and instead
> > use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
> > It's safe because we just need to update the table page, a tlbflush IPI will
> > happen later. As a side effect, we don't need a different code for 
> > bootstrap.
> >
> >
> > to fix 3), I introduced a pm_xen_ptp_cpus which is updated from
> > cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
> > ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
> >
> >
> > While there I removed the unused pmap_is_active() function;
> > and added some more details to DIAGNOSTIC panics.
> >
> 
> Can we add these broken down into separate patch sets ?

Not easy, they're touching the same part of the code.

> 
> >
> > The attached patch implements this; it has been tested on 4-CPUs domU
> > (on a physical dual-core box, so the race described in 2) is more likely to
> > happen) and I couldn't trigger the panic any more (a build.sh -j8 release
> > would never complete before for me).
> >
> 
> ...
> 
> > @@ -387,68 +360,30 @@ pmap_kpm_sync_xcall(void *arg1, void *ar
> >  void
> >  xen_kpm_sync(struct pmap *pmap, int index)
> >  {
> > -      uint64_t where;
> > +      CPU_INFO_ITERATOR cii;
> > +      struct cpu_info *ci;
> >
> >
> >        KASSERT(pmap != NULL);
> >
> >
> >        pmap_pte_flush();
> >
> >
> > -      if (__predict_false(xpq_cpu != &x86_curcpu)) { /* Too early to xcall 
> > */
> > -              CPU_INFO_ITERATOR cii;
> > -              struct cpu_info *ci;
> > -              int s = splvm();
> > -              for (CPU_INFO_FOREACH(cii, ci)) {
> > -                      if (ci == NULL) {
> > -                              continue;
> > -                      }
> > -                      if (pmap == pmap_kernel() ||
> > -                          ci->ci_cpumask & pmap->pm_cpus) {
> > -                              pmap_kpm_setpte(ci, pmap, index);
> > -                      }
> > +      for (CPU_INFO_FOREACH(cii, ci)) {
> > +              if (ci == NULL) {
> > +                      continue;
> >                }
> > -              pmap_pte_flush();
> > -              splx(s);
> > -              return;
> > -      }
> > -
> > -      if (pmap == pmap_kernel()) {
> > -              where = xc_broadcast(XC_HIGHPRI,
> > -                  pmap_kpm_sync_xcall, pmap, &index);
> > -              xc_wait(where);
> > -      } else {
> > -              KASSERT(mutex_owned(pmap->pm_lock));
> > -              KASSERT(kpreempt_disabled());
> > -
> > -              CPU_INFO_ITERATOR cii;
> > -              struct cpu_info *ci;
> > -              for (CPU_INFO_FOREACH(cii, ci)) {
> > -                      if (ci == NULL) {
> > -                              continue;
> > -                      }
> > -                      while (ci->ci_cpumask & pmap->pm_cpus) {
> > -#ifdef MULTIPROCESSOR
> > -#define CPU_IS_CURCPU(ci) __predict_false((ci) == curcpu())
> > -#else /* MULTIPROCESSOR */
> > -#define CPU_IS_CURCPU(ci) __predict_true((ci) == curcpu())
> > -#endif /* MULTIPROCESSOR */
> > -#if 0 /* XXX: Race with remote pmap_load() */
> > -                              if (ci->ci_want_pmapload &&
> > -                                  !CPU_IS_CURCPU(ci)) {
> > -                                      /*
> > -                                       * XXX: make this more cpu
> > -                                       *  cycle friendly/co-operate
> > -                                       *  with pmap_load()
> > -                                       */
> > -                                      continue;
> > -                                  }
> > -#endif /* 0 */
> > -                              where = xc_unicast(XC_HIGHPRI, 
> > pmap_kpm_sync_xcall,
> > -                                  pmap, &index, ci);
> > -                              xc_wait(where);
> > -                              break;
> > -                      }
> > +              if (pmap != pmap_kernel() &&
> > +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) == 0)
> > +                      continue;
> > +
> > +              /* take the lock and check again */
> > +              mutex_enter(&ci->ci_kpm_mtx);
> > +              if (pmap == pmap_kernel() ||
> > +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) != 0) {
> > +                      pmap_kpm_setpte(ci, pmap, index);
> 
> Isn't a tlb shootdown needed after this, to make sure the old pte is not
> referred to in the TLB ?

The pmap code does it. Plus, I suspect that the hypervisor does tlb
flushes itself, to make sure you can't write accidentally to a ptp page
using stale TLB entries.

> 
> Separately, please note that the tlb shootdown code ignores requests to
> shootdown when the pmap is being destroyed. 

this shouldn't be a problem.

> 
> http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap_tlb.c#222
> 
> I have a hypothesis that stale TLB entries involving PTPs in the
> ci_kpm_pdir[] may be responsible for the "other" bug that riz@ is seeing
> ?

I don't think so. The bug is that a new, free page is still registered
as PTP to the hypervisor. It's not a TLB issue (this is in the CPU's cache,
not in the hypervisor's data structure). I don't know yet if it's because
NetBSD freed a page without clearing the PTE (this would be a NetBSD
bug) or if Xen didn't track it properly (this would be a hypervisor bug).

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index