NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/54236: process accounting ridiculous times



>Number:         54236
>Category:       kern
>Synopsis:       process accounting ridiculous times
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun May 26 19:55:00 +0000 2019
>Originator:     David A. Holland
>Release:        NetBSD 8.99.25 (20181103)
>Organization:
>Environment:
System: NetBSD valkyrie 8.99.25 NetBSD 8.99.25 (VALKYRIE) #28: Thu Jan 17 20:46:50 EST 2019  dholland@valkyrie:/usr/src/sys/arch/amd64/compile/VALKYRIE amd64
Architecture: x86_64
Machine: amd64
>Description:

Now and then totally bogus time entries appear in process accounting,
like this one:

hostname   -       dholland pts/1    1099264.00 secs Sun May 26 13:04 (0:00:00.00)

(as printed by lastcomm(1))

This seems to happen once in a fairly long while but mostly during
builds with lots of small files or other bursts of small processes,
like starting a large X session. It does not seem to be correlated
with any particular program.

The symptom is that the value 0xa863 (always 0xa863) appears in one of
the time fields, either user or system time, in the accounting file.
This decodes to 1099264.00 seconds via the custom float encoding used
in this stuff and prints as shown.

The bogus value is coming from the kernel; the problem is not in the
decoding in lastcomm.

I don't see anything wrong with the encoding logic (encode_comp_t in
kern_acct.c) so probably we're getting this large number from
calcru().

If the encoding logic is correct the time coming from calcru must be
between 1099232 and 1099296 (roughly), but whatever it is it's most
likely still the same every time.

I don't have time to dig any deeper this weekend :-/

This behavior would be rapidly fatal for anyone trying to use process
accounting data for any kind of actual time accounting or billing
(rather than just logging), not that I expect anyone is.

% dmesg | grep timecounter
[     1.000000] timecounter: Timecounters tick every 10.000 msec
[     1.000000] timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
[     1.000003] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0

>How-To-Repeat:

Turn on accounting, watch, be patient.

>Fix:

Dunno.



Home | Main Index | Thread Index | Old Index