Subject: Re: sucky performance on i386/1.3I
To: None <>
From: Bill Sommerfeld <>
List: tech-kern
Date: 01/23/1999 11:57:56
[This is a revised version of a note I sent to a smaller set of people
earlier.  I got some details wrong in that note.. with the result that
the old behavior is even worse than i thought initially..]

I found an additional patch, to kern_fork.c, which improves
interactive performance still further above what Ross's patch does.

Anyhow, with our scheduler, p_usrpri (the effective priority) is
recomputed about once a second, based on the nice level of a process
and p_estcpu.  p_estcpu increases for every clock tick the process
runs, and slowly decays when it doesn't run; there's a large comment
in kern_synch.c above the schedcpu() function which describes this.

Anyhow, p_estcpu is not currently copied to the child on fork();
however, p_usrpri is, so the process *initially* runs with lower
priority; however, after it's run during four statclock() ticks, the
priority gets recomputed and the child gets a boost, which gradually
decays over roughly the next (UCHAR_MAX * ldavg) statclock ticks until
p_estcpu maxes out again.

This is essentially a loophole..  a cpu-bound job can avoid about half
of the p_estcpu penalty by forking and doing its work in the child.
Adding the child's estcpu back to the parent slows down how quickly it
can fork, and how much CPU the child initially gets, but the effect
doesn't last all that long in the child.

A variant of the program which (a) spawns multiple children, and (b)
has them buzzloop for more like 1s rather than 250ms, still shows

If you start off the child process with the same penalty as the
parent, this closes the "loophole".  If the child isn't cpu-bound,
this decays away fairly quickly.

Here's a patch which does this under the control of a global flag
(just like Ross's change); patching the variable in real time shows a
dramatic difference in interactive performance (specifically, moving
windows around on a shark):

--- kern_fork.c	1998/11/11 22:44:25	1.50
+++ kern_fork.c	1999/01/23 16:00:02
@@ -111,6 +111,8 @@
 	return (fork1(p, FORK_PPWAIT|FORK_SHAREVM, retval, NULL));
+int slowchild = 1;
 fork1(p1, flags, retval, rnewprocp)
 	register struct proc *p1;
@@ -275,6 +277,12 @@
 	memcpy(p2->p_cred, p1->p_cred, sizeof(*p2->p_cred));
 	p2->p_cred->p_refcnt = 1;
+	/*
+	 * slow us down if parent was cpu-bound
+	 */
+	if (slowchild)
+		p2->p_estcpu = p1->p_estcpu;
 	/* bump references to the text vnode (for procfs) */
 	p2->p_textvp = p1->p_textvp;

Here's my test program, which lets you play with more parameters from
the command line.  Try `./slowme 2000 5' while toggling chargeparent
and slowchild from gdb to observe the difference each tweak makes..

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <string.h>
#include <sys/time.h>
#include <sys/wait.h>

int delaytime;
int slewtime;
int nkids;
int maxkids;

/* send us a SIGALRM after t milliseconds */
void alarmafter (int t)
  struct itimerval it;

  memset(&it, 0, sizeof(it));
  it.it_value.tv_sec =  t / 1000;
  it.it_value.tv_usec = t % 1000;

  setitimer(ITIMER_REAL, &it, 0);

void waitforsig ()
  sigset_t nosigs;

void childsigalrm(int ignore)

void child()
  signal(SIGALRM, childsigalrm);

  alarmafter (delaytime);


void parentsigalrm(int ignore)

void byebye(int ignore)
  int status;
  printf("got sigchld\n");
  while (waitpid (-1, &status, WNOHANG) > 0)

int main(int argc, char **argv)
  signal (SIGCHLD, byebye);
  signal (SIGALRM, parentsigalrm);

  if (argc < 3) {
    fprintf(stderr, "usage: %s buzzms nkids\n", argv[0]);
  delaytime = atoi(argv[1]);
  maxkids = atoi(argv[2]);

  slewtime = delaytime / maxkids;

  nkids = 0;

  for (;;) {
    switch(fork()) {
    case 0:
    case -1:
      printf("nkids now %d\n", nkids);
      if (nkids < maxkids) {
    while (nkids >= maxkids) {
      printf("nkids now %d\n", nkids);