Subject: Re: Scheduler project status and further discussion
To: Daniel Sieger <dsieger@TechFak.Uni-Bielefeld.DE>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 01/16/2007 08:29:10
Hi Daniel,

On Sun, Jan 14, 2007 at 08:49:14PM +0100, Daniel Sieger wrote:

> here's a quick summary of the current status of my scheduler project
> as well as some questions about where to go from here. The good news
> is that I've accomplished the minimal goals for my university
> project. That is:
>
> 1. There is a first scheduler API which allows different algorithms to
>    be implemented.
> 2. I've implemented one other scheduler, though this is only a dumb
>    fixed priority scheduler, but with more runqueues than our current
>    scheduler.
> 
> Patches can be found at [1]. csf.diff is against -current and
> includes idle lwp stuff, scheduler.diff is agaist the idle lwp patch
> for easier review. The above mentioned test scheduler is not included,
> since it is admittedly quite hackisch and not relevant for further
> progress (although I intend to port the increased number of runqueues
> to sched_4bsd).
> 
> In short, the patch does the following:
> 
> - Seperate functions and definitions specific to the 4.4BSD scheduler
>   from those independent of scheduler implementation.
>   
> - Define a first scheduler API in sched.h
> 
> - Adapt 4.4BSD scheduler as well as the rest of the kernel to the 
>   interface.
> 
> - Add a kernel option to select which scheduler to use at
>   compile-time.
> 
> - Make userland independent from scheduler implementation (e.g. top(1)
>   and ps(1)).
> 
> The CHANGES file contained in the tarball has some more details.

Cool stuff. I have some comments on on the diff if you're interested.

- kinfo_proc2 is a intended to be 'highly compatible', so we shouldn't
  change the size of it.

- Changing the size of struct proc isn't as big of a problem, but it
  shouldn't change because of a kernel option (consider LKMs). It would be
  good to be able to use the 'specificdata' system for processes, but it
  doesn't look like that's suitable for use from interrupt context just
  yet. Perhaps allocate some memory per process using pools, or have a
  fixed size block in struct proc?

- Is there a reason for the run queues to be tied in with the scheduler
  implementation?

- Eventually I'd really like to see the scheduler set up so that different
  LWPs in a process can have a different scheduler, like the SVR4
  approach.  One spanner in the works is the 4.4BSD scheduler, or at least
  the way NetBSD implements it for LWPs. A lot of state is tied to the
  process, which doesn't lend itself to the per-LWP scheduler idea very
  well. Maybe that's just something that we have to live with though.

- To my mind sched_add() and rem() aren't very descriptive names. How
  about sched_enqueue() and sched_dequeue()?

> 1. What to do about priority range? It is currently fixed to
>    0..127. Different scheduler implementations may want to use a
>    different priority range. But I'm not sure if allowing a variable
>    priority range is really sane. All other systems have a fixed range
>    (FreeBSD has 255, Solaris has 169, Linux has 139), IIRC.
> 
>    Personally, I'd vote to increase MAXPRI at least to 159, which
>    should be fairly enough for any scheduler.

I'm stating the obvious, but I think the main considerations are keeping
traversal of the queues cheap, while still providing decent granularity.
 
> 2. Do we want to have SystemV scheduling classes? While this approach
>    has some very nice features and is quite flexible, it does not come
>    without a cost. We'd need to introduce a whole bunch of additional
>    abstractions, which could make things much more complicated and
>    could possibly have a negative impact on performance ([2] also
>    mentions this).

I for one think that this is a worthwhile as an eventual goal.

Thanks for all your work on this so far.

Cheers,
Andrew