tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Making run queues independent of the pluggable scheduler



On Sat, Apr 05, 2008 at 09:49:43PM +0200, Christoph Egger wrote:

> Andrew Doran wrote:
> | Hi,
> |
> | The diff below extracts the per-CPU run queue code from the M2
> scheduler and
> | makes it non optional, removing the 4BSD scheduler's global run queue.
> With
> | the patch, it means that the pluggable scheduler is responsible only for
> | adjusting the priority of timeshared jobs.
> |
> | Reasons for doing this:
> |
> | - 4BSD gains processor sets/affinity, although I haven't tested that yet.
> | - 4BSD gets a huge performance boost on producer/consumer workloads like
> |   sysbench OLTP.
> | - We have less code to maintain.
> |
> | There are a couple of other changes:
> |
> | - It makes sched_enqueue responsible for causing a preemption if needed.
> |   Previously this was left up to the caller and was only done at one site
> |   (sleepq_remove).
> |
> | - It changes the CPU selection algorithm slightly. Weak affinity is not
> |   considered until the job has context switched a preset number of times,
> |   currently 5. This is to try and better distribute jobs among the
> CPUs.  It
> |   uses the new call idle_pick to find an idle CPU if possible. If no idle
> |   CPUs, it does a circular scan of CPUs instead of always starting at the
> |   first CPU. That's to try and ensure that we don't unfairly overload one
> |   CPU. I will make the CPU selection changes a seperate commit if they
> have
> |   been demonstrated to be worthwhile.
> 
> That sounds like the new CPU selection algorithm runs most efficient
> on a single-socket multi-core machine. Can you elaborate how it is
> intended to scale on NUMA machines, please ?

It doesn't take NUMA into consideration, although on some systems the CPU
scan is more likely to find a near CPU "by accident". We're going to branch
for 5.0 soon, so unless we get code to handle NUMA systems soon then it will
be something we do for 6.0. :-). Also, I think Joerg mentioned that we may
want to take power consumption into consideration although I don't know
exactly what he has in mind.

Andrew


Home | Main Index | Thread Index | Old Index