Subject: New ARM32 pmap code
To: None <port-arm@netbsd.org>
From: Steve Woodford <scw@wasabisystems.com>
List: port-arm
Date: 04/17/2003 00:26:36
Hi folks,

Some of you may be aware that Wasabi Systems has been working on a new,
improved ARM32 pmap module. I'm pleased to say that Wasabi Systems has
decided to contribute the new pmap back to the NetBSD community.

To that end, I plan on merging the changes into the NetBSD tree some time
over this coming weekend.

Note: This will not automatically replace the existing pmap; the new pmap
will be selectable at kernel config time using "options ARM32_PMAP_NEW".
Without this option, the old pmap will still be used. This is because the
new code requires changes to each port's initarm() to properly bootstrap
the pmap.

A few arm32 ports (shark and some evbarm boards) have already been
converted, but the remaining ports will have to be converted by people
with access to hardware.

There are two levels of new pmap "conversion". The first, and simplest,
requires only a few changes to initarm() and uses a similar kernel virtual
memory layout to the old pmap. The second level is enabled by dropping
"options ARM32_NEW_VM_LAYOUT" into the kernel config file, in addition to
ARM32_NEW_PMAP. This requires a bunch more changes to initarm(), the net
effect of which is to completely replace the legacy KVM spaces defined by
{A,}PTE_BASE, KERNEL_BASE, and KERNEL_VM_BASE, with just a single linear
kernel virtual memory space starting at KERNEL_BASE.

I have converted one evbarm port to this new layout so far, as an example
for other ports.

There are a couple of reasons for having two options, the main one being
that it makes the transition simpler for whoever does the leg work: start
with the new pmap, then go for the new VM layout once things stabilise.
In the long run, *all* ports should switch to both the new pmap and the
new VM layout.

So, what are the main features/benefits of the new pmap?

 - It allows L1 descriptor tables to be shared efficiently between
   multiple processes. A typical "maxusers 32" kernel, where NPROC is set
   to 532, requires 35 L1s. A "maxusers 2" kernel runs quite happily
   with just 4 L1s. This completely solves the problem of running out
   of contiguous physical memory for allocating new L1s at runtime on a
   busy system.

 - Much improved cache/TLB management "smarts". This change ripples
   out to encompass the low-level context switch code, which is also
   much smarter about when to flush the cache/TLB, and when not to.

 - Faster allocation of L2 page tables and associated metadata thanks,
   in part, to the recent pool_cache changes contributed to NetBSD by
   Wasabi Systems a week or so ago.

 - Faster VM space teardown due to accurate referenced tracking of L2
   page tables.

 - Better/faster cache-alias tracking.

I also like to think the code is a bit more understandable too. ;-)

Cheers, Steve

-- 

Wasabi Systems Inc. - The NetBSD Company - http://www.wasabisystems.com/