Crazy idea #1: uvm_fpageqlock

To: tech-kern%netbsd.org@localhost
Subject: Crazy idea #1: uvm_fpageqlock
From: Andrew Doran <ad%netbsd.org@localhost>
Date: Sat, 2 Apr 2011 13:10:02 +0000

Ok, so relating to the freelist I have an idea for uvm_fpageqlock.  This
would be a mid term solution not taking direct account of NUMA and so forth.

Looking at all of the accounting data we have that decides how the system
behaves such as uvmexp.free and so on (currently protected by
uvm_fpageqlock), we don't really need locked access to these when reading
because we continually check and re-recheck those values.  So the system
will eventally sort itself out even if we get a bad picture of things from
time to time.  All that really matters is that we maintain the values
consistently, using atomics or locks.  So uvm_fpageqlock's not needed there.

uvm_fpageqlock also protects one set of data that is not directly related to
free memory and that's the pagedaemon wakeup and pageout state.  We could
put in a new low traffic mutex there, say uvm_pageout_lock.  (Incidentally
it looks like updates to uvmexp.pdpending might be racy, just noting it here
so I remember.)

So that leaves only the page allocator needing uvm_fpageqlock.

Currently the page allocator maintains per-CPU and global lists of free
pages.  Pages reside on both lists.  We prefer to hand out pages from the
per-CPU list: on machines with physically indexed caches, it's likely that
we'll have lines from those pages in cache on the local CPU, which is
beneficial when it comes time to fill the pages.  All lists are protected by
uvm_fpageqlock.

What I propose is to maintain the global list of pages pretty much as is,
but to split off the per-CPU lists so that they would have their own locks. 
With the exception of uvm_pglistalloc() they would only be accessed by the
local CPU, effectively functioning as a local cache of free pages.

When allocating, we'd try the local list first and then try the global list
if no pages are available.  When freeing, we'd always put back to the local
list.  When allocating from and freeing back to this local list of free
pages we would not touch any global state, even uvmexp.free.  The idlezero
code would only consider the local list of pages.

At some point we'd need to redistribute those cached pages back to the
global list of free pages.  This would be a fairly neat and tidy operation
as all we'd need to do is go through the color buckets, chop the list of
pages out and splice it into the head of the global list, then do some
accounting updates (e.g. uvmexp.free).

I'm thinking this redistribution should happen fairly regularly so perhaps
we could change the xcall thread on each CPU to awaken once per second.
Change cv_wait() in xc_thread() into a cv_timedwait(), and have it hand back
cached pages if (a) not done recently or (b) the system is struggling.

The pagedaemon would get code to directly trigger the redistribution when
under pressure but I am thinking that some sort of rate limiting would be
needed.

Thoughts?

Follow-Ups:
- Re: Crazy idea #1: uvm_fpageqlock
  - From: David Laight

References:
- Re: lockstat from pathological builds Re: high sys time, very very slow builds on new 24-core system
  - From: Andrew Doran
- re: lockstat from pathological builds Re: high sys time, very very slow builds on new 24-core system
  - From: matthew green

Prev by Date: Re: extent-patch and overview of what is supposed to follow
Next by Date: pg->offset and pg->flags
Previous by Thread: re: lockstat from pathological builds Re: high sys time, very very slow builds on new 24-core system
Next by Thread: Re: Crazy idea #1: uvm_fpageqlock
Indexes:

Home | Main Index | Thread Index | Old Index