[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Patch: significanly reduce TLB shootdowns in x86 pmap
The below patch does a few different things.
- It adds a TLBSTATS option for developers that adds event counters to track
why TLB shootdowns are occuring.
- It defers freeing ptp pages for a dying pmap until pmap_destroy(), unless
the system is low on memory. This allows us to skip one whole-tlb flush
during process exit (woo hoo).
- Instead of issuing shootdowns at time of request, they are batched in a
per-CPU structure and issued when pmap_update() is called. Batching allows
us to collapse many operations into one. With the patch, no more than 6
single 'invlpg' instructions will be issued. Any more will be collapsed
into a single whole-tlb flush. TLB refills from L1 or L2 cache are
relativly cheap but single page invaldations are expensive. This is a big
win for fork(), where tens->thousands of invlpg calls are replaced with a
single tlbflush() when faulting for COW mappings is being set up.
- It eliminates the APTE space which is expensive because its use causes
many global TLB shootdows. The APTE space is mainly used during exit()
when we are destroying the caller's pmap. Instead of using the APTE space,
it temporarily switches the target pmap onto the CPU. During exit, the
APTE space is used when we switch to the kernel's pmap in order to tear
down the caller's pmap. However due to lazy pmap switching the dying pmap
is ~always still installed on the CPU, so with the patch this means no
pmap switch or shootdowns are required during exit().
- It disables lockless pmap_extract() for user pmaps. It is not currently
safe for this to be lockless due to races with pmap_page_remove(). It
could potentially be made safe with the UVM patch I posted to tech-kern.
The result is 50% fewer TLB shootdown interrupts during build.sh on an 8-cpu
system, and ?? million fewer calls to invlpg() during the same (I have not
counted these). It seems to reduce the time for build.sh by about 1%.
- I get an error on one system during autoconf. I think this is probably
due to multiple invlpg() calls being collapsed into a tlbflush() instead
of a tlbflushg(). I have not investigated yet.
- I have only enabled the APTE trick for the native i386 and amd64 ports.
I do not know if it can easily be done for xen; xen/amd64 looks tricky.
Manuel, do you have time to take a look at some point if I commit it
- The patch is against 5.0 and may not cleanly apply to -current.
Main Index |
Thread Index |