Subject: Re: Moving scheduler semantics from cpu_switch() to kern_synch.c
To: matthew green <mrg@eterna.com.au>
From: Jason Thorpe <thorpej@shagadelic.org>
List: tech-kern
Date: 09/21/2006 10:33:03
On Sep 21, 2006, at 10:05 AM, matthew green wrote:

> so every process is bound to a cpu?  i guess i don't understand how
> this works to avoid cpus idling while lwps are waiting for "their"
> cpu to become free...  who runs a new process first?  right now it
> is who ever tries to first.

There are strategies for handling this, and plenty of papers written  
on the subject.  Note that waiting for "their" CPU to become free can  
actually perform BETTER than running on some other random CPU, because  
you avoid all of the wasted cache usage and nasty cache coherency  
logic that the hardware has to deal with.

David Black, I think, wrote a paper on this some years ago in the  
context of Mach, described something called "processor sets":

	David L. Black.  Scheduling Support for Concurrency and
	Parallelism in the Mach Operating System.  IEEE Computer,
	23(5):35-43, May 1990

The idea is that you have the notion of a "processor set", which  
contains a list of "processors".  The processor set contains the run  
queues.  A task is bound to a "processor set".  This gives you a  
convenient way to handle asymmetric processor topology (think: x86  
HT), and you can have strict affinity by putting only one processor in  
each set.  Note this would actually work REALLY well for multi-HT CPU  
systems... each processor set would get both virtual CPUs for each  
physical CPU, and since it is the PROCESS that is bound to the  
processor set, you ensure that the LWPs for that process are only run  
on that one CPU, which is optimal for multi-threaded processes on HT  
systems.

Note that I'm hand-waving the situation where a task's concurrently is  
greater than the number of processors in its processor set ... I need  
to re-read the paper ... but I figured I might as well start the  
discussion :-)

Note that processor sets also provides infrastructure for dynamically  
adding and removing CPUs.

As for "which processor set runs it first", you could implement a  
number of policies -- random, round-robin, fewest-bound-tasks, lowest- 
average-load, etc.  There might even be some good paper material  
here :-)

There is also literature out there that describes strategies for when  
you decide to pay the penalty to migrate a task from one processor (or  
processor set) to another, but I'd have to refresh my memory on them  
before discussing it further.  But, basically, the scheduler could  
periodically look at the load on each CPU and perform some sort of re- 
balancing.

Anyway, there are a lot of good lessons to be learned from other  
systems, here...

-- thorpej