Subject: sched_changepri, and priority levels
To: None <tech-kern@netbsd.org>
From: Andrew Doran <ad@netbsd.org>
List: tech-kern
Date: 03/06/2007 17:19:51
On Tue, Mar 06, 2007 at 04:47:23AM +0900, YAMAMOTO Takashi wrote:

> > > > Right now the drop in
> > > > priority takes place in userret(). I think that should be moved back into
> > > > remrunqueue() once cpu_switch() is eliminated.
> > > 
> > > i think userret() is a better place to unboost an lwp than remrunqueue().
> > > if it was in remrunqueue() and we support in-kernel preemption,
> > > an lwp can be preempted immediately after scheduled.
> > 
> > Perhaps not remrunqueue() but at the point where the LWP is picked and set
> > running, so mi_switch(). What was I thinking is that we only want to do
> > kernel preemption for LWPs running with priority above the user/kernel
> > level. Were you thinking of something else?
> 
> i'm not sure in which case kernel preemption should happen.
> i'm just wondering what's wrong with restoring it in userret().

To change it in userret() we would need to acquire a lock or use an atomic
sequence, which is too expensive to do on every syscall. I think your idea
of having a flag that says "the LWP has kernel priority" could be a good
way to deal with this. Dropping the level in mi_switch() would be another.
Either way we would need to reset one of those before hitting userret() in
order to avoid the atomic operations.

On the subject of priority levels, here is one suggestion and some notes:

160 - 191       Interrupt (32)
96 - 160        Real time (64)
64 - 95         Kernel (32)
0 - 63          User (64)

o FreeBSD positions "real time" below kernel, this is the other way around.
o FreeBSD also has the concept of idle priority levels. Is this something
  we also want? Are multiple levels required?
o Low really does mean low, and high means high. Traditionally the priority
  has been "inverted".
o It's fairly arbitrary!

Here's what FreeBSD currently provides:

 * Priorities range from 0 to 255, but differences of less then 4 (RQ_PPQ)
 * are insignificant.  Ranges are as follows:
 *
 * Interrupt threads:		0 - 63
 * Top half kernel threads:	64 - 127
 * Realtime user threads:	128 - 159
 * Time sharing user threads:	160 - 223
 * Idle user threads:		224 - 255

And Linux:

Normal 			100 - 139 (40)
Batch (idle, fixed) 	100 - 139 (40)
Real time		0 - 99 (100)

Comments?

Cheers,
Andrew