tech-kern: Re: Scheduler project status and further discussion

Subject: Re: Scheduler project status and further discussion
To: Mindaugas <unex@linija.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 01/16/2007 10:49:25
On Mon, Jan 15, 2007 at 01:09:32AM +0200, Mindaugas wrote:

> > - Add locking rules to scheduler API. We need to make sure certain API
> >   functions are only called with appropriate locks held etc. Any
> >   suggestions how to do this best are more than welcome.
> I was thinking of locking and other rules needed for API... lets summarize
> few:
> 
> o When LWP sleeps i.e. goes to sleep-queue, ltsleep/mtsleep (and other, if
>   any) functions must call sched_rem(). When LWP is awaken, wakeup/wakeup_one
>   (and other, if any) functions must call sched_add(), which would back the
>   LWP to runqueue.
> Why? For keeping l_back/l_forw consistent in other schedulers - see below.
> 
> o When LWP is not in run-queue i.e. not added to run-queue or removed
>   by sched_rem(), one could use l_back and l_forw variables of LWP as
>   it needs. When LWP is in run-queue i.e. added with sched_add(),
>   l_back and l_forw variables could be modified only by scheduler's
>   module i.e. implementation-specific functions'.
> Why? Same reason, just after Andrew's kern_sleepq.c introduce it is probably
> not importatn (seems "outside" doesn't use l_back/l_forw).

The locking scheme that I worked out is described briefly here:

http://nxr.netbsd.org/source/xref/sys/kern/kern_lwp.c?r=1.40.2.10#144

In a nutshell, the same lock used to guard a synchonisation object like a
run queue also guards LWPs associated with the object. So from one side:

	lwp_lock(l);
	if (l->l_stat == LSSLEEP)
		/* the sleep queue is also locked. */;
	lwp_unlock(l);

And the other:

	sleepq_lock(sq);
	LIST_FOREACH(l, sq->sq_list, l_sleepchain)
		/* the LWPs on the queue are also locked */;
	sleepq_unlock(sq);

LWPs change locks in step with their state except in the uniprocessor case,
where there is only ever one lock (sched_mutex).

> o mi_switch() calls scheduler-specific sched_nextlwp(), which must be
>   responsible for removing the new (next) LWP from run-queue and adding
>   to run-queue an old (current) LWP. One must know, that IDLE LWPs is
>   not managed in run-queues (neither adding nor removing). On empty
>   run-queue's case, sched_nextlwp() must return NULL.
> Why? There could not only be a removing/adding to runqueue operations. For
> example, for Linux-smilar scheduler which I'm writing, doing
> sched_rem/sched_add is not fine.
> 
> Talking about locking, one should note, that soon we will have runqueues per
> CPU, so locks will be per runqueue (probably currently used sched_mutex could
> be removed?)

The scheme outlined above is designed to handle that and it should be easy
enough to adapt it. Anywhere there is a sched_lock() now you need to have
it take the per-CPU lock, and remember to change the LWP's lock pointer
whenever it migrates to another CPU.

> and these locks are in scheduler-specific part. Hence, these
> locks probably should be handled inside sched_tick, sched_nextlwp, sched_add,
> sched_rem and other functions. I am not sure about mi_switch(), but it uses
> kernel lock and disables interrupts in critical sections, but probably there
> shouldn't be a problem (keep in mind that LWP is separated per CPU).
> Well, if I am wrong - someone will correct me.

You need to protect not only the individual objects or actions in isolation
(the run queues, or LWPs, or whatnot) but also their combined state and
effect. So I think that we need to take the locks before calling into these
routines.

Cheers,
Andrew