Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

aarch64 pmap tweaks for review


I made some tweaks to the aarch64 pmap based on lessons learned in the x86
pmap recently.  They reduce memory consumption and speed up things like
fork/exec/exit/UBC a little:


Approximate times for kernel build on RK3399 with all 6 cores running at

	before	1354.07s real  6092.55s user  1591.35s system
	after	1307.90s real  6026.60s user  1432.83s system

Description below.  Comments welcome.


- Fix a lock order reversal via pmap_page_protect().

- Align struct pmap to a cache line boundary.

- Move wired/resident count update out from PMAPCOUNTERS ifdef in one place.
  It shouldn't depend on it.

- Make sure pmap is always locked when updating stats.  Then atomics are no
  longer needed to update stats.

- Remove unneeded traversal of PV lists in pmap_enter_pv().

- Shrink struct vm_page from 136 to 128 bytes (cache line sized - reduce
  cache misses).

- Shrink struct pv_entry from 48 to 32 bytes (power of 2 sized - reduce cache

- Embed a pv_entry in each vm_page.  That means PV entries don't need to be
  allocated for pages that are mapped only once, for example private
  anonymous memory / COW pages / most UBC mappings.  Dynamic PV entries are
  then used only for stuff like shared libraries and shared memory.

- Comment out PMAPCOUNTERS option because global counters are costly on MP
  due to cache coherency overhead.  The problem gets exponentially worse as
  more CPUs are added.

- Use the pmap as a source of pre-zeroed pages for the VM system.

- Do unlocked checks in pmap_page_protect() and pmap_clear_modify(): avoid
  taking the lock if the page has no mappings.

Home | Main Index | Thread Index | Old Index