Subject: bin/10095: systat vmstat displays invalid data on long-running 64-bit systems
To: None <gnats-bugs@gnats.netbsd.org>
From: None <mhitch@montana.edu>
List: netbsd-bugs
Date: 05/10/2000 20:18:13
>Number:         10095
>Category:       bin
>Synopsis:       systat vmstat display invalid data on long-running 64-bit systems
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed May 10 20:19:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Michael L. Hitch
>Release:        NetBSD-current as of May 6, 2000 <NetBSD-current source date>
>Organization:
	Montana State University
>Environment:
	
System: NetBSD alpha.msu.montana.edu 1.4W NetBSD 1.4W (PC164) #17: Tue Apr 11 18:36:05 MDT 2000 mhitch@alpha.msu.montana.edu:/usr/cvsroot/src/sys/arch/alpha/compile/PC164 alpha


>Description:
	After a 64-bit system such as an Alpha has been running for a time
	sufficient to cause some counters to exceed 31 bits, the systat vmstat
	display begins displaying bogus data.  This is caused by the use of
	a 32 bit variable as a temporary when computing the change since the
	previous display for a number of items (interrupt counters and time
	in the cpu states for specific examples).  The variable is named t,
	and is defined as a time_t.

	There is also another potential for loss in the copyinfo() procedure
	caused by using sizeof(int) for computing the size of an array of
	long items.  It did not cause a problem on my AlphaPC 164 because the
	number of items in the interrupt count array is sufficiently larger
	than the highest active interrupt.
>How-To-Repeat:
	Run an Alpha system for several weeks with high network and disk
	activity.  Wonder why the system shows a constant 512 interrupts
	per second for both Ethernet interfaces, and a constant 50% system
	time, and no disk transfers when it's very obvious the disk is
	quite busy.
>Fix:
	This patch will define a different local variable for t (the variable
	used to calculate the value differences) as a long variable instead
	of a time_t.  The patch also uses sizeof(long) instead of sizeof(int)
	in copyinfo() when copying the array of interrupt counts.

	It might be better to use a different variable name than t in the
	macros used to calculate the display values to remove confusion
	with the static variable t.

--- /opt/src/usr.bin/systat/vmstat.c	Sat Jan 22 05:46:43 2000
+++ ./vmstat.c	Wed May 10 20:45:17 2000
@@ -349,6 +349,7 @@
 	int psiz, inttotal;
 	int i, l, c;
 	static int failcnt = 0;
+	long t;
 	
 	if (state == TIME)
 		dkswap();
@@ -647,7 +648,7 @@
 
 	intrcnt = to->intrcnt;
 	*to = *from;
-	memmove(to->intrcnt = intrcnt, from->intrcnt, nintr * sizeof (int));
+	memmove(to->intrcnt = intrcnt, from->intrcnt, nintr * sizeof (long));
 }
 
 static void
>Release-Note:
>Audit-Trail:
>Unformatted: