tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kernel: scheduler issue

I sent the following mail to yamt@ a week ago and he didn't answer (that's
fine, probably busy or something). But meanwhile I still believe there's an
issue in our 4BSD scheduler. I'm posting it here with the hope someone else
can review all of this.

In short, l_estcpu is increased by the cpu the lwp is running on. But then, it
is sometimes decreased with a loadfactor divided by ncpu, as if it had
previously been increased by all of the cpus. This results in a loadfactor
which is sometimes lower than expected, which in turn implies the lwp will
sometimes have a higher priority than normal. The loadfactor should *not* be
divided by ncpu.

The reason I'm saying 'sometimes' is because the loadfactor() macro is inlined
in sched_pstats_hook, and it (mistakenly too, I believe) didn't get updated by

This priority issue is only triggered where loadfactor() is used, mainly when
an lwp wakes up from a > 1s sleep.

I would like to revert r1.26. Then, sched_pstats_hook will have to be fixed to
use loadfactor().


Le 02/07/2017 à 20:39, Maxime Villard a écrit :
I believe that your fix [1] is wrong. You created PR/31966 in 2005, because
there was an asymmetry between 'p_estcpu' and 'loadavg'. As I understand it,
you were referring back then to what is now sched_schedclock(), where l_estcpu
is increased by a constant.

It may have been true that back then the asymmetry existed; probably because
each thread of a multi-threaded process would increment the common value and
not reduce it proportionally.

But as far as I can tell, Andrew Doran fixed it in 2007:

And sched_schedclock() is always called with curlwp, so there is no reason the
LWP's l_estcpu would get incremented ncpu times. If I'm wrong please explain a
little more what you had in mind, because dividing the loadfactor by ncpu
inherently breaks the logic behind the algorithm.

Also, I find it highly suspicious that we are not using the loadfactor() macro
in sched_pstats_hook().

(please do not commit anything yet, I have other changes pending)

Thanks & regards,


Home | Main Index | Thread Index | Old Index