Subject: Re: Moving scheduler semantics from cpu_switch() to kern_synch.c
To: Jason Thorpe <>
From: Daniel Carosone <>
List: tech-kern
Date: 09/22/2006 11:47:56
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Sep 21, 2006 at 10:33:03AM -0700, Jason Thorpe wrote:
> >so every process is bound to a cpu?  i guess i don't understand how
> >this works to avoid cpus idling while lwps are waiting for "their"
> >cpu to become free...  who runs a new process first?  right now it
> >is who ever tries to first.
> There are strategies for handling this, and plenty of papers written =20
> on the subject. =20

I like the solaris model, with per-cpu run queues.  The idle thread on
one cpu looks at the queues of other cpus for runnable items that
aren't getting serviced, and steals them to rebalance the load.

> Note this would actually work REALLY well for multi-HT CPU =20
> systems... each processor set would get both virtual CPUs for each =20
> physical CPU, and since it is the PROCESS that is bound to the =20
> processor set, you ensure that the LWPs for that process are only run =20
> on that one CPU, which is optimal for multi-threaded processes on HT =20
> systems.

I'm not sure it's optimal at all.  HT(TM Intel) processors share a
cache (so affinity is good) but they also share internal processor
resources like integer & fp units. If the process has several threads
processing similar workloads, they may be more likely to collide on
these resources than on data - and if they differ on data they will
suffer more by tossing each other's working set out of the shared
cache.  Such programs are of course also most likely to be the ones
that will hit the case you mentioned - they're probably threaded this
way precisely to try and use all available CPUs on a parallelisable

I don't know how best to recognise this in something that a scheduler
can use, though hints from the program will be a big factor, where
given. Perhaps lwp's that share a processor/group with a high rate of
cache faults should be the most likely to be split to other groups,
and processes with lwp's on multiple processor/groups where several
processors in the group have high fault rates should be the most
likely to be rejoined when there are other processes to run too.

Whatever the specific use and granularity of scheduling, the idea of
processor sets, or of multiple nested groupings of processors to
reflect common resources, is useful.  We might also need another
level: some multi-core chips can share or optimise cache between local
cores, with those cores also being multi-threaded.


Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.4.5 (NetBSD)