Subject: Re: Questions
To: None <mika@cs.caltech.edu, peter@netplex.com.au>
From: Ross Harvey <ross@ghs.com>
List: tech-kern
Date: 08/13/1999 19:31:46
> From: Mika Nystrom <mika@cs.caltech.edu>
>:::
>::: [ discussion of FreeBSD scheduler problems involving background
>:::   nice(3) jobs ]
>:::

> Hello,
>   We have a user here who insists on running rc5des on a lab full of
> NetBSD/i386 machines (all UP, of course).  I made a simple hack to
> the scheduler to solve this problem.  The scheduler on our machines
> simply reserves the bottom-most run queue for nice 20 processes.
> I PR'd it for NetBSD, but it's such a crude hack that I'm not surprised
> it hasn't gotten any attention.  Well, it's crude, but it really does
> do the trick as far as these rc5 clients are concerned.  You simply 
> cannot tell that one is running in the background.
>
> I was going to apply the patch to FreeBSD but FreeBSD has code that does
> exactly the same thing using the rtprio mechanism.. I would guess that
> using idprio on the FreeBSD des clients would work the same as my NetBSD
> hack.  Of course, processor affinity, etc. is a much bigger question
> that this two-liner does nothing to address.
>
>    Mika 
>
> void
> resetpriority(p)
>         register struct proc *p;
> {
>         register unsigned int newpriority;
>
>         newpriority = PUSER + p->p_estcpu / 4 + 2 * (p->p_nice - NZERO);
> #ifdef HARDNICE
>         newpriority = min(newpriority, MAXPRI-PPQ-1);
>         if (p->p_nice == (PRIO_MAX + NZERO) ) newpriority = MAXPRI;
> #else
>         newpriority = min(newpriority, MAXPRI);
> #endif
>         p->p_usrpri = newpriority;
>         if (newpriority < curpriority)
>                 need_resched();
> }
>
>

Hi Mika! Still walking around Booth with Starbucks mugs of LN02?

Yes, your NetBSD PR is still open, but only because I forgot to close
it last February when I fixed this problem as part of a general scheduler
rototilling. (It's fixed in NetBSD 1.4, as a result.)

(You did propose in the PR "rewrite the scheduler from scratch". It didn't
require quite that... :-)

Anyway, NetBSD has solved this problem in a general way, the attached
commit log from last February explains it.

Best regards...

	ross.harvey@computer.org

> revision 1.55
> date: 1999/02/23 02:56:03;  author: ross;  state: Exp;  lines: +39 -10
> Scheduler bug fixes and reorganization
> * fix the ancient nice(1) bug, where nice +20 processes incorrectly
>   steal 10 - 20% of the CPU, (or even more depending on load average)
> * provide a new schedclk() mechanism at a new clock at schedhz, so high
>   platform hz values don't cause nice +0 processes to look like they are
>   niced
> * change the algorithm slightly, and reorganize the code a lot
> * fix percent-CPU calculation bugs, and eliminate some no-op code
> 
> === nice bug === Correctly divide the scheduler queues between niced and
> compute-bound processes. The current nice weight of two (sort of, see
> `algorithm change' below) neatly divides the USRPRI queues in half; this
> should have been used to clip p_estcpu, instead of UCHAR_MAX.  Besides
> being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
> and it was done after decay_cpu() which can only _reduce_ the value.  It
> has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
> scheduler-penalize themselves onto the same queue as nice +20 processes.
> (Or even a higher one.)
> 
> === New schedclk() mechansism === Some platforms should be cutting down
> stathz before hitting the scheduler, since the scheduler algorithm only
> works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
> back and forth by 4 every time p_estcpu is touched (each occurance an
> abstraction violation), use p_estcpu without scaling and require schedhz
> to be generated directly at the right frequency. Use a default stathz (well,
> actually, profhz) / 4, so nothing changes unless a platform defines schedhz
> and a new clock.  Define these for alpha, where hz==1024, and nice was
> totally broke.
> 
> === Algorithm change === The nice value used to be added to the
> exponentially-decayed scheduler history value p_estcpu, in _addition_ to
> be incorporated directly (with greater wieght) into the priority calculation.
> At first glance, it appears to be a pointless increase of 1/8 the nice
> effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
> because it will ramp up linearly but be decayed only exponentially, thus
> converging to an additional .75 nice for a loadaverage of one. I killed
> this, it makes the behavior hard to control, almost impossible to analyze,
> and the effect (~~nothing at for the first second, then somewhat increased
> niceness after three seconds or more, depending on load average) pointless.
> 
> === Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
> Collect scheduler functionality. Try to put each abstraction in just one
> place.