tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: CVS commit: src/sys/arch/xen
On Mon, Aug 29, 2011 at 12:07:05PM +0200, Cherry G. Mathew wrote:
> JM> On Mon, 22 Aug 2011 12:47:40 +0200, Manuel Bouyer wrote:
> >>> This is slightly more complicated than it appears. Some of the
> >>> "ops" in a per-cpu queue may have ordering dependencies with
> >>> other cpu queues, and I think this would be hard to express
> >>> trivially. (an example would be a pte update on one queue, and
> >>> reading the same pte read on another queue - these cases are
> >>> quite analogous (although completely unrelated)
> >>
>
> Hi,
>
> So I had a better look at this - implemented per-cpu queues and messed
> with locking a bit:
>
>
> >> read don't go through the xpq queue, don't they ?
>
> JM> Nope, PTE are directly obtained from the recursive mappings
> JM> (vtopte/kvtopte).
>
> Let's call this "out of band" reads. But see below for "in-band" reads.
>
> JM> Content is "obviously" only writable by hypervisor (so it can
> JM> keep control of his mapping alone).
> >> I think this is similar to a tlb flush but the other way round, I
> >> guess we could use a IPI for this too.
>
> JM> IIRC that's what the current native x86 code does: it uses an
> JM> IPI to signal other processors that a shootdown is necessary.
>
> Xen's TLB_FLUSH operation is synchronous, and doesn't require an IPI
> (within the domain), which makes the queue ordering even more important
> (to make sure that stale ptes are not reloaded before the per-cpu queue
> has made progress). Yes, we can implement a roundabout ipi driven
> queueflush + tlbflush scheme(described below), but that would be
> performance sensitive, and the basic issue won't go away, imho.
>
> Let's stick to the xpq ops for a second, ignoring "out-of-band" reads
> (for which I agree that your assertion, that locking needs to be done at
> a higher level, holds true).
>
> The question here, really is, what are the global ordering requirements
> of per-cpu memory op queues, given the following basic "ops":
>
> i) write memory (via MMU_NORMAL_PT_UPDATE, MMU_MACHPHYS_UPDATE)
> ii) read memory
> via:
> MMUEXT_PIN_L1_TABLE
> MMUEXT_PIN_L2_TABLE
> MMUEXT_PIN_L3_TABLE
> MMUEXT_PIN_L4_TABLE
> MMUEXT_UNPIN_TABLE
This is when adding/removing a page table from a pmap. When this occurs,
the pmap is locked, isn't it ?
> MMUEXT_NEW_BASEPTR
> MMUEXT_NEW_USER_BASEPTR
This is a context switch
> MMUEXT_TLB_FLUSH_LOCAL
> MMUEXT_INVLPG_LOCAL
> MMUEXT_TLB_FLUSH_MULTI
> MMUEXT_INVLPG_MULTI
> MMUEXT_TLB_FLUSH_ALL
> MMUEXT_INVLPG_ALL
> MMUEXT_FLUSH_CACHE
This may, or may not, cause a read. This usually happens after updating
the pmap, and I guess this also happens with the pmap locked (I have not
carefully checked).
So couldn't we just use the pmap lock for this ?
I suspect the same lock will also be needed for out of band reads at some
point (right now it's protected by splvm()).
> [...]
>
> >>> I'm thinking that it might be easier and more justifiable to
> >>> nuke the current queue scheme and implement shadow page tables,
> >>> which would fit more naturally and efficiently with CAS pte
> >>> updates, etc.
> >>
> >> I'm not sure this would completely fis the issue: with shadow
> >> page tables you can't use a CAS to assure atomic operation with
> >> the hardware TLB, as this is, precisely, a shadow PT and not the
> >> one used by hardware.
>
> Definitely worth looking into, I imho. I'm not very comfortable with
> the queue based scheme for MP.
>
> the CAS doesn't provide any guarantees with the TLB on native h/w,
> afaict.
What makes you think so ? I think the hw TLB also does CAS to update
referenced and dirty bits in the PTE, otherwise we couldn't rely on these
bits; this would be bad especially for the dirty bit.
> If you do a CAS pte update, and the update succeeded, it's a
> good idea to invalidate + shootdown anyway (even on baremetal).
Yes, of course inval is needed after updating the PTE. But using a true CAS
is important to get the refereced and dirty bits right.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index