Port-arm archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
aarch64 pmap tweaks for review
Hi,
I made some tweaks to the aarch64 pmap based on lessons learned in the x86
pmap recently. They reduce memory consumption and speed up things like
fork/exec/exit/UBC a little:
http://www.netbsd.org/~ad/2020/aarch64.diff
Approximate times for kernel build on RK3399 with all 6 cores running at
600MHz:
before 1354.07s real 6092.55s user 1591.35s system
after 1307.90s real 6026.60s user 1432.83s system
Description below. Comments welcome.
Thanks,
Andrew
- Fix a lock order reversal via pmap_page_protect().
- Align struct pmap to a cache line boundary.
- Move wired/resident count update out from PMAPCOUNTERS ifdef in one place.
It shouldn't depend on it.
- Make sure pmap is always locked when updating stats. Then atomics are no
longer needed to update stats.
- Remove unneeded traversal of PV lists in pmap_enter_pv().
- Shrink struct vm_page from 136 to 128 bytes (cache line sized - reduce
cache misses).
- Shrink struct pv_entry from 48 to 32 bytes (power of 2 sized - reduce cache
misses).
- Embed a pv_entry in each vm_page. That means PV entries don't need to be
allocated for pages that are mapped only once, for example private
anonymous memory / COW pages / most UBC mappings. Dynamic PV entries are
then used only for stuff like shared libraries and shared memory.
- Comment out PMAPCOUNTERS option because global counters are costly on MP
due to cache coherency overhead. The problem gets exponentially worse as
more CPUs are added.
- Use the pmap as a source of pre-zeroed pages for the VM system.
- Do unlocked checks in pmap_page_protect() and pmap_clear_modify(): avoid
taking the lock if the page has no mappings.
Home |
Main Index |
Thread Index |
Old Index