Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: What to do about "WARNING: negative runtime; monotonic clock has gone backwards"



> Date: Sun, 30 Jul 2023 14:56:53 -0400
> From: Brad Spencer <brad%anduin.eldar.org@localhost>
> 
> Taylor R Campbell <riastradh%NetBSD.org@localhost> writes:
> 
> > Can you please try running with the attached patch and share the
> > warnings it produces?  Should give slightly more information.
> 
> Caught another one.  As far as I know the system is up to date with all
> of the requested patches:
> 
> [ 19419.647972] WARNING: lwp 16 (system idle/1) flags 0xa0000020: timecounter went backwards from (19420 + 0x9e37cf0149d8f7bb/2^64) sec at netbsd:mi_switch+0x11e on cpu1 to (19419 + 0xad917b77bd0a7cd3/2^64) sec at netbsd:mi_switch+0x11e on cpu1

Can you run this dtrace script for a while (say, for a day, or from
start of boot until you see the WARNING above which only happens once
per boot), and then hit ^C?

dtrace -x nolibs -n 'sdt:xen:hardclock:jump { @ = quantize(arg1 - arg0) } sdt:xen:hardclock:jump /arg2 >= 430/ { printf("hardclock jump violated timecounter contract") }'

If my hypothesis is correct, you can just leave this running over any
particular workload and you'll get:

(a) a message printed whenever the hardclock delay is too long, and
(b) when you hit ^C at the end, a histogram of all the >1-tick
    hardclock jump delays.

(Avoiding the tick-10s probe, like I used in the last dtrace
suggestion, means you won't get updates printed every 10sec to your
terminal -- you'll have to hit ^C to see the results -- but as an
upside it won't instantly crash your kernel owing to the Xen/!Xen
module ABI mismatch for CLKF_USERMODE/PC.)


Home | Main Index | Thread Index | Old Index