Subject: bin/29819: lastcomm mangles elapsed time calculation
To: None <gnats-admin@netbsd.org, netbsd-bugs@netbsd.org>
From: None <kre@munnari.OZ.AU>
List: netbsd-bugs
Date: 03/28/2005 21:36:00
>Number: 29819
>Category: bin
>Synopsis: lastcomm mangles elapsed time calculation
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Mar 28 21:36:00 +0000 2005
>Originator: Robert Elz
>Release: NetBSD 2.0.1 ( *all* active NetBSD versions affected )
>Organization:
Prince of Soungla University
>Environment:
System: NetBSD delta.noi.kre.to 2.0.1 NetBSD 2.0.1 (DELTA-1.11-20050129) #1: Sat Jan 29 15:31:14 ICT 2005 kre@lilac.noi.kre.to:/usr/obj/2.0/kernels/DELTA i386
Architecture: i386
Machine: i386
>Description:
Lastcomm does inappropriate rounding of values it prints when
printing the elapsed time for a command.
For example, consider the following output, shown as a sequence
of consecutive lines from lastcomm output, but with lots before
and after deleted (or course) ...
sh - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.00)
date - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.00)
hostname - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.00)
sh - kre __ 0.00 secs Tue Mar 29 01:59 (0:00:00.22)
xauth - kre __ 0.00 secs Tue Mar 29 01:59 (0:00:00.19)
sshd -SF sshd __ 0.17 secs Tue Mar 29 01:59 (0:00:01.66)
sshd -SF root __ 0.05 secs Tue Mar 29 00:24 (2:35:36.00)
sshd -SF kre __ 178.80 secs Tue Mar 29 00:24 (2:35:35.00)
rsync - kre __ 223.92 secs Tue Mar 29 00:24 (2:35:33.00)
rsync -F kre __ 501.38 secs Tue Mar 29 00:37 (1:22:36.00)
sh - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.23)
date - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.06)
hostname - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.02)
The "sh/date/lastcomm" sequence that brackets this is included just
as an indicator of the absolute time - since those things run for
(essentially) no time, their start times (shown) and end times
are identical (to the nearest minute for sure. This provides
context for the lines between. (One day I'll spend a few minutes
and work out why my systems feel the need to run date & hostname
every minute - but for now that turns out to be convenient).
So, at 01:59 we can see that an rsync over ssh completed, the end
time for that command (set of commands) must have been 01:59
because of the sequence in the lastcomm output.
The start time of the commands was 00:24 (for 3 of them) and 00:37
for the child rsync process (the one that does most of the work).
This tells us that the elapsed time should be about 1 hour 35 mins
for 3 of the commands, and 1 hour 22 mins for the other one (simply
by manually subtracting the known start time from the known end time).
And sure enough, lastcomm prints 1:22:36 for the command that ran
a few minutes less than the others - but 2:35:36 for the others.
Somehow, between 00:24 and 01:59 the system managed to cram in
2 and a half hours of elapsed time (and no, DST did not alter,
there is no DST at all in the local timezone)
The problem is the way the values are calculated in lastcomm.
(The data provided by the kernel is fine).
>How-To-Repeat:
Simply run lastcomm. Loot for any command that ran for between
30 and 60 minutes, or between 90 and 120 minutes, or between
150 and 180 minutes (etc). Notice the bogus output.
You can do the same thing with the minutes value - any command
running between 30 & 60 seconds, or between 90 and 120 seconds,
(etc) - but for those ones it is not quite as obvious that the
values are bogus.
>Fix:
Apply the patch below.
The output for the above test with the patched lastcomm will
change to:
sh - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.00)
date - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.00)
hostname - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.00)
sh - kre __ 0.00 secs Tue Mar 29 01:59 (0:00:00.22)
xauth - kre __ 0.00 secs Tue Mar 29 01:59 (0:00:00.19)
sshd -SF sshd __ 0.17 secs Tue Mar 29 01:59 (0:00:01.66)
sshd -SF root __ 0.05 secs Tue Mar 29 00:24 (1:34:36.00)
sshd -SF kre __ 178.80 secs Tue Mar 29 00:24 (1:34:35.00)
rsync - kre __ 223.92 secs Tue Mar 29 00:24 (1:34:33.00)
rsync -F kre __ 501.38 secs Tue Mar 29 00:37 (1:21:36.00)
sh - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.23)
date - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.06)
hostname - root __ 0.00 secs Tue Mar 29 01:59 (0:00:00.02)
which is a lot more rational. The rsync that ran for less time
started about 13 minutes later, and now is reported as running 13
minutes less (give or take the odd few seconds).
Then notice that this bug has been in lastcomm ever since the
code to print the elapsed time was added (in October 1995 !)
This tells us two things.
First, almost no-one bothers to actually look at lastcomm output
(I do, and this bug had annoyed me for ages, but not until today
enough to actually go find the cause).
Second, the bug will be in all current (and most obsoleted)
versions of NetBSD.
That is, the patch below should be pulled up to all active
versions of NetBSD. I believe that curently means that
pull-up requests should be sent by whoever applies this patch
for
netbsd-3
netbsd-2
netbsd-2-0
and netbsd-1-6
And yes, even the NetBSD 2.0.x series should have this one
fixed (incorrect output from lastcomm could cause problems for
someone using the results as evidence in a security related
hacking incident...)
The relevant code has not changed since 1997, so the patch below
should apply untouched (other than line number fuzz) to all of
these brances (and to current of course, which is where the
patch was built).
The actual problem is obvious once it is clear what is happening,
if not, the change in the patch below should make it clear I
hope.
kre
Index: lastcomm.c
===================================================================
RCS file: /cvsroot/NetBSD/src/usr.bin/lastcomm/lastcomm.c,v
retrieving revision 1.16
diff -u -r1.16 lastcomm.c
--- lastcomm.c 30 Oct 2004 19:34:01 -0000 1.16
+++ lastcomm.c 28 Mar 2005 20:52:23 -0000
@@ -144,8 +144,8 @@
t / (double)AHZ, ctime(&ab.ac_btime));
delta = expand(ab.ac_etime) / (double)AHZ;
printf(" (%1.0f:%02.0f:%05.2f)\n",
- delta / SECSPERHOUR,
- fmod(delta, SECSPERHOUR) / SECSPERMIN,
+ floor(delta / SECSPERHOUR),
+ floor(fmod(delta, SECSPERHOUR) / SECSPERMIN),
fmod(delta, SECSPERMIN));
}
/* are we at the beginning of the file yet? */