Subject: port-i386/37193: x86 pmap concurrency strategy could use improvement
To: None <port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: None <ad@netbsd.org>
List: netbsd-bugs
Date: 10/24/2007 08:10:01
>Number: 37193
>Category: port-i386
>Synopsis: x86 pmap concurrency strategy could use improvement
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-i386-maintainer
>State: open
>Class: change-request
>Submitter-Id: net
>Arrival-Date: Wed Oct 24 08:10:01 +0000 2007
>Originator: Andrew Doran
>Release: 4.99.34
>Organization:
The NetBSD Project
>Environment:
N/A
>Description:
This applies to the vmlocking branch but the same (unused) strategy
is in HEAD.
- It should be possible to use atomics to adjust the pmap reference
count, instead of adjusting the count under lock. The uvm_object
is passed into MI code in one or two places. Need to check any
reference count changes made by those calls.
- The per-CPU pv cache generates contention and adds an extra 4 bytes
to pv_head. Since the allocations are now done without holding pmap
locks, it should be possible to change it to use pool_cache instead.
- pmap_main_lock is 'cache hot' due to it being taken on nearly every
pmap operation; it would be nice to get rid of it.
- pmap_test_attrs/pmap_clear_attrs acquire too many locks. The global
pmap_main_lock is write locked by these routines, so it causes
contention on pmap_main_lock and back-pressure on other locks like
uvm_pageqlock. Also, they scroll through all the pmaps that have the
target page mapped and lock/unlock them. pmap_page_remove has a
similar problem but it does not appear to be called much.
- The splay tree should probably be replaced by a red-black tree.
Lookup operations are modifying on a splay tree and that's likely
cause false sharing of cache lines between CPUs.
>How-To-Repeat:
Code inspection / testing.
>Fix:
See above.. For pmap_test_attrs/pmap_clear_attrs:
1. Lock the current pmap in order to make use of its APTE space.
-or-
Provide a per-CPU APTE space and disable preemption to use it.
2. Lock the pv_head. It will prevent the referenced pmaps from
disappearing while we operate on them.
3. Scroll through each pmap, mapping whatever is necessary into
the APTE space, and do the operation.