Subject: misc. MMU: NUMA, big pages, idle zero, ring buffers, PAE, ...
To: None <>
From: Edward B. DREGER <>
List: tech-kern
Date: 04/29/2006 05:23:37
Greetings all,

[ Apologies for earlier cross-posting.  No idea why I sent to -perform
instead of -kern. ]

Note:  Some of these ramblings are ia32/aa64-focused, but the principles
are general.

While exploring PAE last November, I wound up browsing through uvm/pmap
code.  I've had a few additional ideas, and would like some [more]

/* Big Pages */

Begin by allocating memory stride 2M/4M (former iff PAE, latter iff
!PAE).  Track wasted 4K [sub]pages.  Split big pages into smaller ones
when needed, but avoid using page tables until then.  Coalesce smaller
pages into bigger ones when free RAM permits.

Rationale:  Hopefully less MMU management overhead and fewer TLB misses
while memory is plentiful.  Fall back to standard behavior when needed.

/* Fractional/Checkpointed Zeroing of Big Pages */

I whipped up a crude program that performed 1000 bzero(3) iterations on
a 2M chunk.  Each iteration took about 9 ms on a PIII/500 notebook.
Should the idle-zero loop zero a fraction of a big page?  What about
dedicating a PDE slot (Intel terminology) to the zero code?

Rationale:  Several milliseconds -- although certainly less than 9 ms
when on faster CPU and with optimized zeroing code -- is an eternity.

/* Per-CPU Management */

Both of the above, as well as free page lists, should be per-CPU.  Can a
CPU be forced to work with the memory closest to it?  (Consider NUMA
performance, such as multiprocessor Opteron systems.)

Rationale:  Reduced inter-CPU contention.  Assuming processes have
significant CPU affininty, using "nearby" memory would reduce reduce
both interconnect bandwidth use and memory access time.

/* Ring Buffers */

A native mapping for ring buffers would be nice:

  	u_char *ringbuf = mmapringbuf(..., MAP_RINGBUF, ...) ;

would allocate a memory region from <base> to <base + 2 * size>.  i.e.,

  	base + size

would both be aliased to the same physical pages.  Voila!  Simple,
linear ringbuf where the MMU handles wraparound at the region's end.

Rationale:  It's just so much easier this way. :-)

/* mremap() */

Zero-copy allocation-size changes are convenient.

Rationale:  Obvious.

Everquick Internet -
A division of Brotsman & Dreger, Inc. -
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 785 865 5885 Lawrence and [inter]national
Phone: +1 316 794 8922 Wichita
DO NOT send mail to the following addresses: -*- -*-
Sending mail to spambait addresses is a great way to get blocked.
Ditto for broken OOO autoresponders and foolish AV software backscatter.