Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: timekeeping regression?



At Wed, 06 Mar 2024 04:25:45 +0000, "Mathew, Cherry G.*" <c%bow.st@localhost> wrote:
Subject: Re: timekeeping regression?
>
> >>>>> Greg A Woods <woods%planix.ca@localhost> writes:
>
>     > At Sun, 18 Feb 2024 01:40:52 -0800, "Greg A. Woods" <woods%planix.ca@localhost> wrote:
>     > Subject: Re: timekeeping regression?
>     >>
>     >> Still looking for a Round Tuit to do the investigation into why
>     >> "clocksource=tsc" isn't taking effect, and it'll have to wait a
>     >> couple more weeks now, so if anyone else beats me to it.....
>
>     > I did some code-reading and added some printk's to the Xen kernel
>     > and discovered the reason "clocksource=tsc" didn't work is because
>     > none of my Xen machines have X86_FEATURE_TSC_RELIABLE, and when I
>     > faked it the "warp" detection check invalidated it anyway.
>
>     > I think I just discovered the difference between the "good" and
>     > "bad" machines.  The "good" ones both still had
>     > "dom0_vcpus_pin=true".
>
>     > I'll reboot the bad one with that added again soon and see how it
>     > does.
>
> This is interesting. I imagine that the current timecounter(9) MD code
> doesn't factor in the backing hardware physical CPU being yanked from
> under it, assuming that it then relies on the TSC from it, for
> timekeeping.
>
> Did you say that this is only relevant for "clocksource=tsc" again ?

No, I'm saying "clocksource=tsc" has no effect whatsoever on any of the
machines I have.  The CPUs are Xenon 54xx and 56xx, and Xen finds their
TSC registers can "warp" backwards in time (i.e. they really are
unreliable), so Xen refuses to use them as its platform timer even if I
hack the code to fake a TSC_RELIABLE id bit.

	TSC warp detected, disabling TSC_RELIABLE

So the platform timer stays as HPET.

	Platform timer is 14.318MHz HPET

However that only seems to work reliably when the dom0 CPUs are
"pinned", or maybe if there's only one dom0 CPU.  Here I'm running all
dom0's with multiple CPUs (normally 2, but up to 8 in one case).

Normally I had always used "dom0_vcpus_pin=true", but I had removed it
on the one machine following what turns out to be incomplete advice in
the NetBSD Xen HowTo about this option.

Since I'm typing this mail on a VM of the "bad" (not-pinned) machine
I'll reboot it after it is sent.

I suspect the problem might be in how the dom0 timecounter is sourced
from the Xen kernel, but I don't know the code and I don't know how or
why not being pinned to a pCPU might affect it.

Note, I found some old discussion related to the origin of the
dom0_vcpus_pin option that suggested it was necessary to allow dom0
vCPUs to actually be pinned to pCPUs and in effect to prevent the Xen
kernel from trying to do CPU clock scaling (on those pCPUs), but I don't
think clock scaling is even possible in the first place on any of the
CPUs I have running Xen.

--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgp4taSzaUsrm.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index