tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: src/sys/arch/xen

On 29.08.2011 15:03, Cherry G. Mathew wrote:
> I'm a bit confused now - are we assuming that the pmap lock protects the
> (pte update op queue-push(es) + pmap_pte_flush()) as a single atomic
> operation (ie; no invlpg/tlbflush or out-of-band-read can occur between
> the update(s) and the pmap_pte_flush()) ?
> If so, I think I've slightly misunderstood the scope of the mmu queue
> design - I assumed that the queue is longer-lived, and the sync points
> (for the queue flush) can span across pmap locking - a sort of lazy pte
> update, with the queue being flushed at out-of-band or in-band read
> time ( I guess that won't work though - how does one know when the
> hardware walks the page table ?) . It seems that the queue is meant for
> pte updates in loops, for eg:, quickly followed by a flush. Is this
> correct ? 

IMHO, this should be regarded this way, and nothing else. x86 and xen
pmap(9) share a lot in common, low level operations (like these: PT/PD
editing, TLB flushes, MMU updates...) should not leak through this

Said differently, the way Xen handles MMU must remain transparent to
pmap, except in a few places. Remember, although we are adding a level
of indirection through hypervisor, the calls should remain close to
native x86 semantic.

However, for convenience, Xen offers "multiseats" MMU hypercalls, where
you can schedule more than one op at a time to avoid unneeded context
switches, like in the pmap_alloc_level() function. This is our
problematic part.

> If so, there's just one hazard afaict - the synchronous TLB_FLUSH_MULTI
> could beat the race between the queue update and the queue flush via
> pmap_tlb_shootnow() (see pmap_tlb.c on the cherry-xenmp branch, and *if*
> other CPUs reload their TLBs before the flush, they'll have stale info.

What stale info? If a VCPU queue isn't empty while another VCPU has
scheduled a TLB_FLUSH_MULTI op, the stale content of the queue will
eventually be depleted later after a pmap_pte_flush() followed by a
local invalidation. This is the part that should be carefully reviewed.

For clarity, I would expect a queue to be empty when leaving pmap (e.g.
when releasing its lock). Assertions might be necessary to catch all
corner cases.

> So the important question (rmind@ ?) is, is pmap_tlb_shootnow()
> guaranteed to be always called with the pmap lock held ?
> In real life, I removed the global xpq_queue_lock() and the pmap was
> falling apart. So a bit of debugging ahead.

Did you keep the same queue for all CPUs? If that was the case, yes,
this is a call for trouble.

Anyway, we can't put a giant lock around xpq_queue. This doesn't make
any sense in a MP system, especially for operations that are frequently
called and still may take a while to complete. Just imagine: all CPUs
waiting for one to finish its TLB flush before they can edit their PD/PT
again... ouch.

Jean-Yves Migeon

Home | Main Index | Thread Index | Old Index