Subject: port-i386/37193: x86 pmap concurrency strategy could use improvement
To: None <email@example.com, firstname.lastname@example.org,>
From: None <email@example.com>
Date: 10/24/2007 08:10:01
>Synopsis: x86 pmap concurrency strategy could use improvement
>Arrival-Date: Wed Oct 24 08:10:01 +0000 2007
>Originator: Andrew Doran
The NetBSD Project
This applies to the vmlocking branch but the same (unused) strategy
is in HEAD.
- It should be possible to use atomics to adjust the pmap reference
count, instead of adjusting the count under lock. The uvm_object
is passed into MI code in one or two places. Need to check any
reference count changes made by those calls.
- The per-CPU pv cache generates contention and adds an extra 4 bytes
to pv_head. Since the allocations are now done without holding pmap
locks, it should be possible to change it to use pool_cache instead.
- pmap_main_lock is 'cache hot' due to it being taken on nearly every
pmap operation; it would be nice to get rid of it.
- pmap_test_attrs/pmap_clear_attrs acquire too many locks. The global
pmap_main_lock is write locked by these routines, so it causes
contention on pmap_main_lock and back-pressure on other locks like
uvm_pageqlock. Also, they scroll through all the pmaps that have the
target page mapped and lock/unlock them. pmap_page_remove has a
similar problem but it does not appear to be called much.
- The splay tree should probably be replaced by a red-black tree.
Lookup operations are modifying on a splay tree and that's likely
cause false sharing of cache lines between CPUs.
Code inspection / testing.
See above.. For pmap_test_attrs/pmap_clear_attrs:
1. Lock the current pmap in order to make use of its APTE space.
Provide a per-CPU APTE space and disable preemption to use it.
2. Lock the pv_head. It will prevent the referenced pmaps from
disappearing while we operate on them.
3. Scroll through each pmap, mapping whatever is necessary into
the APTE space, and do the operation.