Subject: bin/29819: lastcomm mangles elapsed time calculation
To: None <gnats-admin@netbsd.org, netbsd-bugs@netbsd.org>
From: None <kre@munnari.OZ.AU>
List: netbsd-bugs
Date: 03/28/2005 21:36:00
>Number:         29819
>Category:       bin
>Synopsis:       lastcomm mangles elapsed time calculation
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 28 21:36:00 +0000 2005
>Originator:     Robert Elz
>Release:        NetBSD 2.0.1 ( *all* active NetBSD versions affected )
>Organization:
	Prince of Soungla University
>Environment:
System: NetBSD delta.noi.kre.to 2.0.1 NetBSD 2.0.1 (DELTA-1.11-20050129) #1: Sat Jan 29 15:31:14 ICT 2005 kre@lilac.noi.kre.to:/usr/obj/2.0/kernels/DELTA i386
Architecture: i386
Machine: i386
>Description:
	Lastcomm does inappropriate rounding of values it prints when
	printing the elapsed time for a command.

	For example, consider the following output, shown as a sequence
	of consecutive lines from lastcomm output, but with lots before
	and after deleted (or course) ...

sh         -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.00)
date       -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.00)
hostname   -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.00)
sh         -       kre      __         0.00 secs Tue Mar 29 01:59 (0:00:00.22)
xauth      -       kre      __         0.00 secs Tue Mar 29 01:59 (0:00:00.19)
sshd       -SF     sshd     __         0.17 secs Tue Mar 29 01:59 (0:00:01.66)
sshd       -SF     root     __         0.05 secs Tue Mar 29 00:24 (2:35:36.00)
sshd       -SF     kre      __       178.80 secs Tue Mar 29 00:24 (2:35:35.00)
rsync      -       kre      __       223.92 secs Tue Mar 29 00:24 (2:35:33.00)
rsync      -F      kre      __       501.38 secs Tue Mar 29 00:37 (1:22:36.00)
sh         -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.23)
date       -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.06)
hostname   -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.02)

	The "sh/date/lastcomm" sequence that brackets this is included just
	as an indicator of the absolute time - since those things run for
	(essentially) no time, their start times (shown) and end times
	are identical (to the nearest minute for sure.   This provides
	context for the lines between.   (One day I'll spend a few minutes
	and work out why my systems feel the need to run date & hostname
	every minute - but for now that turns out to be convenient).

	So, at 01:59 we can see that an rsync over ssh completed, the end
	time for that command (set of commands) must have been 01:59
	because of the sequence in the lastcomm output.

	The start time of the commands was 00:24 (for 3 of them) and 00:37
	for the child rsync process (the one that does most of the work).

	This tells us that the elapsed time should be about 1 hour 35 mins
	for 3 of the commands, and 1 hour 22 mins for the other one (simply
	by manually subtracting the known start time from the known end time).

	And sure enough, lastcomm prints 1:22:36 for the command that ran
	a few minutes less than the others - but 2:35:36 for the others.
	Somehow, between 00:24 and 01:59 the system managed to cram in
	2 and a half hours of elapsed time (and no, DST did not alter,
	there is no DST at all in the local timezone)

	The problem is the way the values are calculated in lastcomm.
	(The data provided by the kernel is fine).

>How-To-Repeat:

	Simply run lastcomm.   Loot for any command that ran for between
	30 and 60 minutes, or between 90 and 120 minutes, or between
	150 and 180 minutes (etc).   Notice the bogus output.

	You can do the same thing with the minutes value - any command
	running between 30 & 60 seconds, or between 90 and 120 seconds,
	(etc) - but for those ones it is not quite as obvious that the
	values are bogus.

>Fix:

	Apply the patch below.

	The output for the above test with the patched lastcomm will
	change to:

sh         -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.00)
date       -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.00)
hostname   -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.00)
sh         -       kre      __         0.00 secs Tue Mar 29 01:59 (0:00:00.22)
xauth      -       kre      __         0.00 secs Tue Mar 29 01:59 (0:00:00.19)
sshd       -SF     sshd     __         0.17 secs Tue Mar 29 01:59 (0:00:01.66)
sshd       -SF     root     __         0.05 secs Tue Mar 29 00:24 (1:34:36.00)
sshd       -SF     kre      __       178.80 secs Tue Mar 29 00:24 (1:34:35.00)
rsync      -       kre      __       223.92 secs Tue Mar 29 00:24 (1:34:33.00)
rsync      -F      kre      __       501.38 secs Tue Mar 29 00:37 (1:21:36.00)
sh         -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.23)
date       -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.06)
hostname   -       root     __         0.00 secs Tue Mar 29 01:59 (0:00:00.02)

	which is a lot more rational.   The rsync that ran for less time
	started about 13 minutes later, and now is reported as running 13
	minutes less (give or take the odd few seconds).

	Then notice that this bug has been in lastcomm ever since the
	code to print the elapsed time was added (in October 1995 !)

	This tells us two things.

	First, almost no-one bothers to actually look at lastcomm output
	(I do, and this bug had annoyed me for ages, but not until today
	enough to actually go find the cause).

	Second, the bug will be in all current (and most obsoleted)
	versions of NetBSD.

	That is, the patch below should be pulled up to all active
	versions of NetBSD.   I believe that curently means that
	pull-up requests should be sent by whoever applies this patch
	for
		netbsd-3
		netbsd-2
		netbsd-2-0
	and	netbsd-1-6

	And yes, even the NetBSD 2.0.x series should have this one
	fixed (incorrect output from lastcomm could cause problems for
	someone using the results as evidence in a security related
	hacking incident...)

	The relevant code has not changed since 1997, so the patch below
	should apply untouched (other than line number fuzz) to all of
	these brances (and to current of course, which is where the
	patch was built).

	The actual problem is obvious once it is clear what is happening,
	if not, the change in the patch below should make it clear I
	hope.

	kre


Index: lastcomm.c
===================================================================
RCS file: /cvsroot/NetBSD/src/usr.bin/lastcomm/lastcomm.c,v
retrieving revision 1.16
diff -u -r1.16 lastcomm.c
--- lastcomm.c	30 Oct 2004 19:34:01 -0000	1.16
+++ lastcomm.c	28 Mar 2005 20:52:23 -0000
@@ -144,8 +144,8 @@
 			     t / (double)AHZ, ctime(&ab.ac_btime));
 			delta = expand(ab.ac_etime) / (double)AHZ;
 			printf(" (%1.0f:%02.0f:%05.2f)\n",
-			       delta / SECSPERHOUR,
-			       fmod(delta, SECSPERHOUR) / SECSPERMIN,
+			       floor(delta / SECSPERHOUR),
+			       floor(fmod(delta, SECSPERHOUR) / SECSPERMIN),
 			       fmod(delta, SECSPERMIN));
 		}
 		/* are we at the beginning of the file yet? */