[[ repost -- seems every time I include code in a message it goes into a black hole and never appears on the list ]] Current status: I've been rewriting some chunks of xen_clock.c (in a separate post [[hopefully!]]) and getting some good results, but some confusion remains. I've also reverted my Xen kernels to running the "credit" scheduler, which was the default in 4.11 and previous before "credit2" became the new default. It'll be a few more days before I know if this is the cause of the ~7.5 days of running OK before loosing track of time in domUs. W.r.t. my xen_clock.c changes, the main benefit so far seems to be that I've reduced the rate of "timecounter ran backwards" events (which I've renamed to "prevented global_ns from running backwards") to effectively zero in domUs. It still happens, but probably not often enough to worry about. However in dom0 it happens at a high rate (up to the 1000s, on some but not all vCPUs), even though I have my dom0 vCPUs pinned to individual pCPUs. I don't really understand this. The main difference between domUs and dom0 with respect to this anomaly is that in domU the TSC counter is emulated (by trapping RDTSC), with the purpose of scaling it to a consistent 1 GHz count rate. However as I understand the Xen code the emulated RDTSC value is still based on the real TSC register for the pCPU the guest vCPU is currently running on. If anything the behaviour in domUs should be more variable due to the time it takes to do the trap and calculate this scaling. In dom0 the RDTSC is not emulated so we get the "native" TSC value, but because Xen still provides scaling factors for dom0 vCPUs we can still calculate nanoseconds (i.e. a 1 GHz rate) from a delta in TSC values. Indeed the code does not differentiate whether it is running on dom0 or in a domU -- it does the same thing in all cases. There may be some difference in how and when Xen assigns the system_time and tsc_timestamp values for each vCPU in a dom0 and a domU, but I've not deciphered enough of the Xen code to know for sure yet. The only thing I think I know so far is that each vCPU's values are assigned separately so there will likely be a wee bit of drift between the system_time values for each vCPU (and of course without a TSC_VARIANT core the TSC values will be different for each pCPU). My current version of xen_clock.c does not yet make any improvement to the high rate of "missed hardclock" events. I'm not sure if the "lateness" of Xen timer callbacks (a comment in FreeBSD says "Xen timers may fire up to 100us off", and a comment in xen_delay() gives the number 110us) is enough to account for the high rate (~50-80% of the HZ rate!), but my gut feeling is no. Note I'm using HZ=1000 in both my XEN3_DOM0 and XEN3_DOMU kernels. Example event counters after almost two days of uptime: # xl vcpu-list Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 0 0 -b- 547.6 0 / all Domain-0 0 1 1 -b- 520.1 1 / all Domain-0 0 2 2 -b- 310.3 2 / all Domain-0 0 3 3 -b- 194.6 3 / all Domain-0 0 4 4 -b- 1412.0 4 / all Domain-0 0 5 5 -b- 1483.1 5 / all Domain-0 0 6 6 -b- 1332.4 6 / all Domain-0 0 7 7 r-- 1683.4 7 / all nbtest 1 0 3 -b- 1293.1 all / all nbtest 1 1 0 -b- 743.3 all / all nbtest 1 2 3 -b- 725.0 all / all nbtest 1 3 2 -b- 711.5 all / all dom0 $ vmstat -e | fgrep xen | fgrep -v xenev0 vcpu0 xen missed hardclock 166808411 976 intr vcpu0 xen global_ns prevented from running backwards 179771595 1052 intr vcpu1 xen missed hardclock 136315011 798 intr vcpu1 xen global_ns prevented from running backwards 34779858 203 intr vcpu2 xen missed hardclock 155390573 909 intr vcpu2 xen global_ns prevented from running backwards 18325188 107 intr vcpu3 xen missed hardclock 155508679 910 intr vcpu3 xen global_ns prevented from running backwards 17938070 105 intr vcpu4 xen missed hardclock 106451 0 intr vcpu4 xen global_ns prevented from running backwards 75363203 441 intr vcpu5 xen missed hardclock 12909193 75 intr vcpu5 xen global_ns prevented from running backwards 87882918 514 intr vcpu6 xen missed hardclock 103603 0 intr vcpu6 xen global_ns prevented from running backwards 163124346 954 intr vcpu7 xen missed hardclock 106193 0 intr vcpu7 xen global_ns prevented from running backwards 24936411 145 intr domU $ vmstat -e | fgrep xen | fgrep -v xenev0 vcpu0 xen missed hardclock 73033335 424 intr vcpu0 xen global_ns prevented from running backwards 3974 0 intr vcpu1 xen missed hardclock 108331673 629 intr vcpu1 xen global_ns prevented from running backwards 7863 0 intr vcpu2 xen missed hardclock 108601754 630 intr vcpu2 xen global_ns prevented from running backwards 6487 0 intr vcpu3 xen missed hardclock 108329599 629 intr vcpu3 xen global_ns prevented from running backwards 5042 0 intr Note there are some associated changes to sys/arch/x86/include/cpu.h but they are pretty self-evident. -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpnvEZ_Qz9mG.pgp
Description: OpenPGP Digital Signature