Subject: Re: pmap tweaking (was Re: Things to work on)
To: Jason R Thorpe <thorpej@zembu.com>
From: Chris Gilbert <chris@paradox.demon.co.uk>
List: port-arm
Date: 06/01/2001 11:29:42
On Friday 01 June 2001  1:09 am, Jason R Thorpe wrote:
> On Fri, Jun 01, 2001 at 12:40:15AM +0100, Chris Gilbert wrote:
>  > I've had a play with the pmap stuff.  I've managed to get my head around
>  > how most of it works, and what the terminology is.  I've managed to get
>  > the ./lat_proc fork  (from lmbench) time down to half on a cats.  down
>  > to 8000 microseconds from 16000 microseconds, my PII 333 gets 1000
>  > microseconds. Note that this is by no means accurate or a real world
>  > test, just a show that something is better (note that this is with a
>  > PMAP_DEBUG on, DIAGNOSTICS on, and a whole pile of other debug stuff
>  > on).  I'll retest with something more realistic at some point, eg time
>  > make configure for gmake.
>
> Sweet.

I think the results above maybe bogus, testing of time make configure (having 
already done make patch) shows:
20.4u 45.2s 1:25.39 76.9% 0+0k 40+180io 360pf+0w
with the new pmap, and:
20.1u 46.9s 1:25.20 78.7% 0+0k 40+206io 360pf+0w
with the old.  So it looks like I've lowered the load slightly, and not 
noticably increased the time taken (anyone here tell .19 of a second :)  I've 
still got more tweaking to do though.  Note that I do actually do some proper 
locking now (which may account for some slow down), and I've also implemented 
pmap_map_ptes (as based on version from Richard) which helps speed some areas 
up :)

>  > One reason for the above is that pmap_release currently scans the whole
>  > of the L1 table for entries, however by using a uvm_object and
>  > allocating the L2 tables and associating them with the uvm object, you
>  > can walk the uvm_object's list and free them off :)
>
> Actually, when a pmap is destroyed, you can assume there are no mappings
> in it at all.  Feel free to add some sort of assertion to this effect. 
> This is documented in pmap(9) (have you read through that document?  If so,
> please feel free to ask me questions and point out places where it can be
> clarified.)

Sadly there are.  I've just taken the idea from the i386 pmap of using a 
uvm_object, and allocating pages with that object, so that I can free them 
off faster.  It doesn't actually surprise me as pmap_remove doesn't actually 
free any ptp's up.  (the code that would is in pmap_pte_addref/delref, and 
that's in a #if 0 at the moment.  I've not actually played with pmap_remove 
yet, so perhaps it's about time I did :)

>  > However I'm having some issues with it that I need to look into.
>  >
>  > I'm also playing with some other tweaks to the code, but I really need
>  > to sit down again and work on making some clear notes, eg how to tell a
>  > page is wired, modified, referenced etc.
>  >
>  > I also implemented a pool for the pmap objects, so that we don't keep
>  > allocating and freeing them.
>
> You might want to consider some sort of home-grown cache of L1 tables.
> IIRC, the ARM uses a 16K table, so they can be tricky to allocate as
> memory gets fragmented.  The cache would also allow you to keep the L1
> tables "constructed", i.e. the kernel L1 PTEs always valied in the top
> N slots of the table (the ARM is a single-address-space system, right?)

Currently we use a list of l1pt's which which have that 16k attached to them. 
This is allocated at startup.  We use them until we run out, after which we 
allocate from available ram, which as you say fragments, however I'm not sure 
if it's the VM space that can't find the 16k or the real memory, I need to 
look into it further, I might even consider dropping the base of the kernel 
down to 0xE0000000 to gain oodles more space for kernel VM.  However before 
doing that I'd want to implement pmap_growkernel.

Cheers,
Chris