Re: Some questions about x86/xen pmap

To: Andrew Doran <ad%netbsd.org@localhost>, Manuel Bouyer <bouyer%antioche.eu.org@localhost>
Subject: Re: Some questions about x86/xen pmap
From: Jean-Yves Migeon <jeanyves.migeon%free.fr@localhost>
Date: Thu, 07 May 2009 01:03:19 +0200

Andrew Doran wrote:

Hi,

Sorry for taking so long to reply.


No problem, I was myself out of time lately ;) Thanks for your reply.

Just to set the background, to make Xen save/restore/migrate work undera NetBSD domU, I have to do some slight modifications to the pmap, fortwo reasons:- Xen (and xentools) do not handle alternative recursive mappings (theAPDP entries in our PD).
I' planning to elminate these for native x86. Instead of using the alternate
space, native x86 will temporarily switch the pmap onto the CPU. I discussed
this briefly with Manuel and he suggested that there may be problems with
this approach for xen. My patch leaves it unchanged for xen.

Alright, then I am adding Manuel to the loop, as I would like to knowwhy removing the APDP would be a problem comparing to temporarilyswitching pmaps. Is it a performance or a design issue, NetBSD-specificor anything else?

using a MFN, the thread has to acquire a reader lock, and release itwhen it has finished with it. The suspension thread acquires a writerlock, so that every other thread that wants to manipulate MFNs have towait for the suspension to release the writer, which is done duringresume when it is deemed safe.
I think I have seen a patch where you did this?

It is (somewhat) found in my branch, jym-xensuspend, where we can seehow I am using rwlock(9) to protect these parts.

It will change though, as it depends on the APDP issue, and on the waythe x86 pmap is structured. I'd like to get the locking right beforehaving to re-design it due to modifications that could affectsuspend/resume in unexpected places.

There used to be a global RW
lock in the pmap and we worked quite hard to get rid of it. Global RW locks
are bad news. Even if there is no contention on the lock itself, they cause
lots of cache line ping-ponging between CPUs and lots of writebacks to main
memory. For heavyweight stuff that's OK but the pmap really gets hammered
on at runtime. As an alternative I would suggest locking all existing pmaps
and also prevent new pmaps from being created while your code runs. Would
that work for you?


Looks like it would. I will have a look.

- in pmap_unmap_ptes() (used to unlock the pmap used earlier bypmap_map_ptes()), the APDP is only unmapped when MULTIPROCESSOR isdefined. While it is now the case for GENERIC, we do not have SMPsupport inside port-xen currently. Why is it necessary to empty theentry for MP and not for UP?
From memory this is a NULL assignment? I don't know.

My point was to move the APDP unmap code outside of the MULTIPROCESSOR#define. From my current understanding of the pmap, APDP entries arealso used with UP, but we do not need the global TLB flush to keep it insync with other CPUs in case of MP.

This along with the pmap_pte_flush() call, which is a NOOP undertraditional x86 (under Xen, it is used to batch updated to the queueused for MMU operations).

The initial goal was to expand my locking to include protection for thepmap_map_ptes/pmap_unmap_ptes calls, so that APDP entries are cleared bypmap_unmap_ptes(). That way, I could avoid the pmap iterations duringsave, as I could guarantee that they all APDP entries are left unmapped.

- what would be the fastest way to iterate through pmaps? My currentcode is inspired from the one found in pmap_extract(), but it requiresto map/unmap each pmap to check for non-0 APDP slots. As they are notthat common, this may be costly when the pmap list is lengthy. I triedto access the PD of each pmap through their pm_pdir member, but itusually ends badly with a fault exception in some cases.
I believe that there is a global list of pmaps. Have a look at
pmap_growkernel().


Correct, thanks :)

- rwlock(9) clearly states that callers must not /recursively/ acquireread locks. Is a thread allowed to acquire a reader lock multiple times,provided it is not done recursively?
In short: no. You can do it with rw_tryenter(), but note that rw_tryenter()
can't be called in a loop. Why: even if you hold the lock read held, it may
be "closed" for new read holds, because a writer is waiting on the lock and
the RW lock code has decided that the writer should get priority. It's to
prevent "starvation".

Good to know. I was not sure about the ownership + counter use of ourrwlock(9) implementation.


--
Jean-Yves Migeon
jeanyves.migeon%free.fr@localhost

Follow-Ups:
- Re: Some questions about x86/xen pmap
  - From: Manuel Bouyer
- Re: Some questions about x86/xen pmap
  - From: Christoph Egger

References:
- Some questions about x86/xen pmap
  - From: Jean-Yves Migeon
- Re: Some questions about x86/xen pmap
  - From: Andrew Doran

Prev by Date: Re: PCI passthrough corrupting data?
Next by Date: Re: Some questions about x86/xen pmap
Previous by Thread: Re: Some questions about x86/xen pmap
Next by Thread: Re: Some questions about x86/xen pmap
Indexes:

Home | Main Index | Thread Index | Old Index