Subject: Re: Discussing the future of the NetBSD scheduler
To: Eduardo Horvath <eeh@netbsd.org>
From: Andrew S. Gardner <andrewfromaz@mac.com>
List: tech-kern
Date: 03/09/2005 12:38:05
On Wednesday, March 09, 2005, at 10:14AM, Eduardo Horvath <eeh@NetBSD.org> wrote:
>> > or something more
>> > sophisticated like a pluggable scheduler framework found in Solaris or
>> > Linux?

I like the idea of a pluggable scheduler framework for lots of reasons.  IIRC, one of the reasons that the first patches to improve the preemption latency in the Linux kernel weren't accepted into the main tree was that people with high throughput work loads (like file servers) would rather have a longer mean time between preemptions, even at the expense of some priority inversion.  Since there's a wide spectrum of opinion on what makes a good scheduler, I think machine independent and pluggable is the way to go.

>Most of the neat features that could be added would be in the
>real-time or SMP areas.  But that will require kernel preemption
>and fine-grained locking which would result in many years of
>teething problems like Solaris and FreeBSD.

Some interesting things to ponder, although implementing any of them would mean going down the long (and possibly very unpleasant) road to fine-grained locking:

http://www.timesys.com/products/ldk/reservations/
http://www-2.cs.cmu.edu/~rajkumar/Raj.html#ResourceKernels
TimeSys implements reservations of CPU, disk, and network bandwith in a Linux kernel.  Their 2.4 kernel also has interrupts split out into separate, schedulable kernel threads under a fixed priority, preemptive scheduler. 

http://www.mvista.com/products/cge/features.html
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=bks&fname=/SGI_Developer/books/REACT_PG/sgi_html/front.html&srch=REACT
IIRC, the MontaVista kernel has two schedulers, a fixed priority, preemptive one, and the default fairness one.  Being able to plug two schedulers in at the same time would be a nice feature, and somewhat like the REACT package for SGI IRIX.  Being able to use sysmp() to restrict the processes that can run on a given processor is important in building big real-time simulations (which is an admittedly small niche), but I can imagine it useful in other SMP or NUMA situations.

http://www.arinc.com/aeec/draft_documents/653_d1s2.pdf
It also fun (I think) to look at the extreme cases.  As far as real-time goes, the requirements don't get much stricter than ARINC-653.  ARINC-653 partitioning of a system might be useful for more than just avionics systems, but I imagine this would be pretty far down on the list of things to implement.

http://www.dragonflybsd.org/goals/threads.cgi
There is, of course, also the DragonFly path, which means not doing fine-grained locking and still getting more than one thread running in the kernel address space at a time.

Andrew

(I've been lurking on tech-kern for a while, but I've never had anything to contribute until now.  I hope some of this is helpful.)