Subject: port-hp300/10571: process statistics bug
To: None <gnats-bugs@gnats.netbsd.org>
From: None <swp@alumni.rice.edu>
List: netbsd-bugs
Date: 07/11/2000 18:22:17
>Number:         10571
>Category:       port-hp300
>Synopsis:       process time stats can be wildly bogus
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-hp300-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 11 18:23:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Steve Peurifoy
>Release:        NetBSD 1.5_ALPHA (20000624)
>Organization:
>Environment:
System: NetBSD loop 1.5_ALPHA NetBSD 1.5_ALPHA (loop) #0: Wed Jun 28 03:37:23 MDT 2000 swp@loop:/usr/src/sys/arch/hp300/compile/loop hp300


>Description:
    If a newly created process runs for just the right length of time
    before mi_switch() is invoked, its time statistics can be set to
    ridiculous values.  This is because mi_switch() calls microtime()
    at splstatclock (== splclock) or splhigh.  This can result in
    catching the timer just after it rolls over but before the value
    of the global time is updated, thus making time "go backward" as
    compared to the time value in spc->spc_runtime.  Running calcru()
    on the resulting p->p_rtime produces bogus results (a -1 long
    converted to a u_quad_t becomes a big number).

>How-To-Repeat:
    Running "time find /usr -type f -exec ls -l {} \; >/dev/null" on
    my 380 reliably produces numbers such as -28092518 seconds of
    system or user time because several of the child processes trip
    on this bug and wreak havoc on the job totals.  This particular
    test may not be reliable for other testers though.

>Fix:
    The following patch to sys/arch/hp300/hp300/clock.c
    isn't all that pretty but it works for me.  Doing
    something more correct for the hp300 would be possible
    but quite a bit more complicated.  Perhaps a sanity
    check in calcru() as well wouldn't be a bad idea.

*** /tmp/clock.c	Tue Jul 11 15:20:24 2000
--- /tmp/newclock.c	Tue Jul 11 19:11:57 2000
***************
*** 320,325 ****
--- 320,326 ----
  {
  	volatile struct clkreg *clk;
  	int s, u, t, u2, s2;
+ 	static struct timeval prevtime = {0, 0};
  
  	/*
  	 * Read registers from slowest-changing to fastest-changing,
***************
*** 342,351 ****
--- 343,361 ----
  	} while (u != u2 || s != s2);
  
  	u += (clkint - t) * CLK_RESOLUTION;
+ 	/*
+ 	 * If called with clock interrupts blocked, the timer can
+ 	 * roll over without the global time being updated.  Try
+ 	 * to adjust for this.
+ 	 */
+ 	if ((u <= prevtime.tv_usec) && (s == prevtime.tv_sec))
+ 		u += tick;
  	if (u >= 1000000) {		/* normalize */
  		s++;
  		u -= 1000000;
  	}
+ 	prevtime.tv_sec = s;
+ 	prevtime.tv_usec = u;
  	tvp->tv_sec = s;
  	tvp->tv_usec = u;
  }

>Release-Note:
>Audit-Trail:
>Unformatted: