Subject: Re: Further works on yamt-idlelwp
To: Mindaugas R. <rmind@NetBSD.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 03/05/2007 16:50:43
On Mon, Mar 05, 2007 at 02:27:16AM +0200, Mindaugas R. wrote:

> Hello,
> I have made a small patch for further scheduler works. Shortly what it does:
> 
> 1) Add sched_rq, sched_proc_sd, sched_lwp_sd in struct cpu_data, proc, lwp.
> These would be a pointers for dynamically allocated scheduler-specific data.
> Currently, not used for SCHED_4BSD (yet).
> For now, there are dummy functions in sched_4bsd.c, IMHO it would be good to
> nano-optimize these useless calls, even if they aren't expensive.

These should be prefixed as done for existing members of the structure, e.g
p_sched, l_sched and so on. I think schedstate_percpu is probably a better
place for any per-CPU data than struct cpu_data itself.

> 2) Add an additional sched_lwp_fork/sched_lwp_exit hooks, for allocating per
> LWP scheduler-specific data. Also, add sched_slept() hook, which would handle
> LWP sleeping/blocking.

If sched_slept() needs to be called with the sched_mutex held then it may
need to happen earlier, perhaps in sleepq_enter() - or later, in mi_switch().
Otherwise it could mean acquiring sched_mutex again which is expensive. Note
that until mi_switch() is entered, it's not certain that the LWP is going to
sleep. If there is a pending signal, sleepq_block() will be exited early.

> 4) Move old LWP enqueue from mi_switch() to sched_nextlwp(). Other
> implementations of scheduler could (and will) do that differently. In this
> case, maybe it would worth changing the name, eg, to sched_switch()?

Is it still worth having a "newl" argument passed to mi_switch()? I think it
should be removed, and new LWPs to run should always be selected from the
run queues.

> 1. Currently there are two general functions sched_lock() and sched_unlock()
> for runqueue (and all the scheduler) locking. From now, one is going to use a
> runqueues per CPU, hence this should be changed.
> a) Add a kmutex_t in struct cpu_data (there is a general MI data) and it
> would be a generic lock for runqueue.
> b) Add kmutex_t in scheduler-specific area and move sched_lock/sched_unlock
> to scheduler module. This would be more flexible, IMHO.
> In any case, prototype would probably change to:
>     static inline void sched_lock (struct cpu_info *ci, const intheldmutex);
> Any other suggestions?

I don't like the idea of having two locks per cpu, it complicates things
somewhat and increases overhead. Is there a particular reason you want to do
that? One change I have but have made but not checked in is to rename
sched_lock/unlock to spc_lock/unlock, and add a cpu_info argument as you
mention. The idea being that there would be very little remaining global
state if any - meaning no global run queue.

> 2. As already discussed, currently, there is a problem with p_estcpu and
> p_pctcpu, which should be scheduler-independent. As mentioned, there is also
> a ccpu value, which is SCHED_4BSD specific. These are used in ps(1), top(1),
> etc. It would be good to find a solution what and how to deal with these. All
> suggestions are welcome.

I think both of those should be per-LWP, and the sysctl/libkvm interface
should mimic the old behaviour by adding the values for all LWPs in a
process. Why does p_pctcpu need to be scheduler specific? Like you say, it's
only for reporting by tools like /bin/ps.

> --- sys/sysctl.h	27 Feb 2007 16:55:18 -0000	1.166.2.1
> +++ sys/sysctl.h	4 Mar 2007 03:46:22 -0000
> @@ -239,7 +239,7 @@ struct ctlname {
>  #define	KERN_PROC2		47	/* struct: process entries */
>  #define	KERN_PROC_ARGS		48	/* struct: process argv/env */
>  #define	KERN_FSCALE		49	/* int: fixpt FSCALE */
> -#define	KERN_CCPU		50	/* int: fixpt ccpu */
> +#define	KERN_OLDCCPU		50	/* old: fixpt ccpu */

I'm not sure we should rename the define in sysctl.h. However unlikely it
may be, there could be applications or scripts that depend on access to
kern.ccpu. If access is not possible, perhaps the best thing to do is to
have them fail at runtime.

If we're adding new sysctl values for schedulers, I think it would make
sense to denote which scheduler they belong to, however from a Google search
it looks like FreeBSD just puts the values under "kern.sched". So I think in
that instance it's better to follow their lead.

Cheers,
Andrew