tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Making run queues independent of the pluggable scheduler


The diff below extracts the per-CPU run queue code from the M2 scheduler and
makes it non optional, removing the 4BSD scheduler's global run queue. With
the patch, it means that the pluggable scheduler is responsible only for
adjusting the priority of timeshared jobs.

Reasons for doing this:

- 4BSD gains processor sets/affinity, although I haven't tested that yet.
- 4BSD gets a huge performance boost on producer/consumer workloads like
  sysbench OLTP.
- We have less code to maintain.

There are a couple of other changes:

- It makes sched_enqueue responsible for causing a preemption if needed.
  Previously this was left up to the caller and was only done at one site

- It changes the CPU selection algorithm slightly. Weak affinity is not
  considered until the job has context switched a preset number of times,
  currently 5. This is to try and better distribute jobs among the CPUs.  It
  uses the new call idle_pick to find an idle CPU if possible. If no idle
  CPUs, it does a circular scan of CPUs instead of always starting at the
  first CPU. That's to try and ensure that we don't unfairly overload one
  CPU. I will make the CPU selection changes a seperate commit if they have
  been demonstrated to be worthwhile.

... and a couple of notes:

- Some or all of the items in runqueue_t could be safely merged into 
  schedstate_percpu, but I think it's better to integrate things piecemeal
  if possible.

- Previously M2's per-CPU approach performed poorly on but with
  yesterdays changes to rwlocks and turnstiles it matches the global run
  queue used by 4BSD. This shows the number of seconds to complete
  -j16 release on an 8-core machine:



Home | Main Index | Thread Index | Old Index