tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: TSC improvement



Hi.

On 2020/06/15 7:08, Andrew Doran wrote:
On Thu, Jun 11, 2020 at 04:50:40AM +0000, Taylor R Campbell wrote:

What's trickier is synchronizing per-CPU timecounters so that they all
give a reasonably consistent view of absolute wall clock time -- and
so it's not just one CPU that leads while the others play catchup
every time they try to read the clock.  (In other words, adding atomic
catchup logic certainly does not obviate the need to synchronize
per-CPU timecounters!)

But neither synchronization nor global monotonicity is always
necessary -- e.g., for rusage we really only need a local view of time
since we're only measuring relative time durations spent on the
current CPU anyway.

    This is what the timecounter(9) API per se expects of timecounters,
    and right now tsc (along with various other per-CPU cycle counters)
    fails to guarantee that.

Howso, do you see a bug?  I think it's okay.  The TSC is only used for the
timecounter where it's known that it's insensitive to core speed variations
and is driven by PLL related to the bus clock.  Fortunately that means most
x86 systems, excepting a window of some years from roughly around the time
of the Pentium 4 onwards.

If tc_get_timecount goes backward by a little, e.g. because you
queried it on cpu0 the first time and on cpu1 the second time,
kern_tc.c will interpret that to mean that it has instead jumped
forward by a lot -- nothing in the timecounter abstraction copes with
a timecounter that goes backwards at all.

I thought about it some more and I just don't think we have this problem on
x86 anyway.  The way I see it, with any counter if you make explicit
comparisons on a global basis the counter could appear to go a tiny bit
backwards due to timing differences in execution - unless you want to go to
some lengths to work around that.

I think all you can really expect is for the clock to not go backwards
within a single thread of execution.  By my understanding that's all the
timecounter code expects and the TSC code on x86 makes sure of that.  I
changed tsc_get_timecount so it'll print a message out if it's ever
observed.
(There's also an issue where the `monotonic' clock goes backwards
sometimes, as reported by sched_pstats.  I'm not sure anyone has
tracked down where that's coming from -- it seems unlikely to be
related to cross-CPU tsc synchronization because lwp rtime should
generally be computed from differences between samples on a single CPU
at a time, but I don't know.)

Hmm.  There was a race condition with rusage and softints that I fixed about
6 months ago where proc0 had absurd times in ps/top but I have not seen the
"clock has gone backwards" one in a long time.  I wonder if it's related.

Andrew


I've committed my change now.

Some notes:

 - sys/kern/kern_cctr.c was first based on x86's timecounter code.
   Some archs use the code. It might be possible to improve.

 - dtrace/amd64/dtrace_subr.c has it's own rdtsc code. I've not touched
   it. Please someone(TM) modify it if you want.

Thanks.

--
-----------------------------------------------
                SAITOH Masanobu (msaitoh%execsw.org@localhost
                                 msaitoh%netbsd.org@localhost)


Home | Main Index | Thread Index | Old Index