Re: TSC improvement

To: Taylor R Campbell <campbell+netbsd-tech-kern%NetBSD.org@localhost>
Subject: Re: TSC improvement
From: Andrew Doran <ad%netbsd.org@localhost>
Date: Sun, 14 Jun 2020 22:08:44 +0000

On Thu, Jun 11, 2020 at 04:50:40AM +0000, Taylor R Campbell wrote:

> What's trickier is synchronizing per-CPU timecounters so that they all
> give a reasonably consistent view of absolute wall clock time -- and
> so it's not just one CPU that leads while the others play catchup
> every time they try to read the clock.  (In other words, adding atomic
> catchup logic certainly does not obviate the need to synchronize
> per-CPU timecounters!)
>
> But neither synchronization nor global monotonicity is always
> necessary -- e.g., for rusage we really only need a local view of time
> since we're only measuring relative time durations spent on the
> current CPU anyway.
> 
> > >    This is what the timecounter(9) API per se expects of timecounters,
> > >    and right now tsc (along with various other per-CPU cycle counters)
> > >    fails to guarantee that.
> > 
> > Howso, do you see a bug?  I think it's okay.  The TSC is only used for the
> > timecounter where it's known that it's insensitive to core speed variations
> > and is driven by PLL related to the bus clock.  Fortunately that means most
> > x86 systems, excepting a window of some years from roughly around the time
> > of the Pentium 4 onwards.
> 
> If tc_get_timecount goes backward by a little, e.g. because you
> queried it on cpu0 the first time and on cpu1 the second time,
> kern_tc.c will interpret that to mean that it has instead jumped
> forward by a lot -- nothing in the timecounter abstraction copes with
> a timecounter that goes backwards at all.

I thought about it some more and I just don't think we have this problem on
x86 anyway.  The way I see it, with any counter if you make explicit
comparisons on a global basis the counter could appear to go a tiny bit
backwards due to timing differences in execution - unless you want to go to
some lengths to work around that.

I think all you can really expect is for the clock to not go backwards
within a single thread of execution.  By my understanding that's all the
timecounter code expects and the TSC code on x86 makes sure of that.  I
changed tsc_get_timecount so it'll print a message out if it's ever
observed.

> (There's also an issue where the `monotonic' clock goes backwards
> sometimes, as reported by sched_pstats.  I'm not sure anyone has
> tracked down where that's coming from -- it seems unlikely to be
> related to cross-CPU tsc synchronization because lwp rtime should
> generally be computed from differences between samples on a single CPU
> at a time, but I don't know.)

Hmm.  There was a race condition with rusage and softints that I fixed about
6 months ago where proc0 had absurd times in ps/top but I have not seen the
"clock has gone backwards" one in a long time.  I wonder if it's related.

Andrew

Follow-Ups:
- Re: TSC improvement
  - From: SAITOH Masanobu

References:
- Re: TSC improvement
  - From: Andrew Doran
- Re: TSC improvement
  - From: Taylor R Campbell

Prev by Date: Re: makesyscalls (moving forward)
Next by Date: Re: makesyscalls (moving forward)
Previous by Thread: Re: TSC improvement
Next by Thread: Re: TSC improvement
Indexes:

Home | Main Index | Thread Index | Old Index