Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Weird clock behaviour with current (amd64) kernel



    Date:        Sun, 14 Aug 2022 00:28:38 +0200
    From:        Joerg Sonnenberger <joerg%bec.de@localhost>
    Message-ID:  <YvgllnAXjXt6afcR%bec.de@localhost>

  | Does the interrupt rate match HZ?

Yes.   I rebooted the system I tested yesterday (which behaved just the
same as reported then .. which I guess was technically the early hours
of this morning, but to me, that was yesterday).

I ran the same tests (actually, more of them - this time I also tested
switching to ACPI-Fast and i8254 as well (the next two highest q= values
in the kern.timecounter.choice output, after TSC and hpet0).

This time I watched interrupt rates while all of that was happening, and
the cpu0 timer interrupt rate stuck on a nice constant 100/sec for all of
this (didn't even see a 99 or 101).

The results of the tests were unchanged while TSC and hpet0 were being used
(with TSC, time ran about 2.6 times slower than real time, and sleep 1
paused for 1 of those long seconds; with hpet0 (after TSC) time ran at the
same rate as any other clock in the room, but sleep 1 would take 2 or 3
seconds (pattern of 3 3 2 3 3 2 3 3 2 ... indicating it was probably really
taking ~2.66 seconds, but I didn't run the test long enough to calculate to
2 decimal places).

Changing to ACPI-Fast or i8254 changed nothing, time kept running at
correct clock speed, and sleep kept on being slow.

    Date: Sat, 13 Aug 2022 21:32:15 -0000 (UTC)
    From: mlelstv%serpens.de@localhost (Michael van Elst)
    Message-id: <td958n$elf$1%serpens.de@localhost>

  | >In your case, you say it takes ~6 minutes between attachment and
  | >calibration and your hpet runs at 19.2MHz.

Well, to be strictly accurate, you concluded the former (I don't know
enough about the process to specify attachment and calibration) and
extracted the latter from the dmesg fragment I sent - but yes, that all
sounds correct.

  | >This is enough for HPET_MCOUNT_LO to overflow.

  | This patch adds a separate delay of ~0.1 seconds to calibrate the timers.
  | This should avoid any overflow.

I applied that patch (to dev/ic/hpet.c) and also updated x86/x86/cpu.c
to version 1.204, which apart from the RCSID stuff, looks to be identical
to the patched version I ran in the previous test (better to have fewer
patched files, so that's good).

This one worked fine (still boring sitting waiting for all that PCI CONFIG
DUMP data to scroll past) - an extract from the dmesg for this one
(selected manually, so it is more likely that I may have missed something,
but also more likely to contain data that doesn't say "TSC"):

[     1.000000] NetBSD 9.99.99 (JACARANDA:1.1-20220811) #15: Sun Aug 14 16:23:30 +07 2022
[     1.000000]         kre%jacaranda.noi.kre.to@localhost:/usr/obj/testing/kernels/amd64/JACARANDA_PCI_DUMP
[...]
[     1.000000] timecounter: Timecounters tick every 10.000 msec 
[...]
[     1.000000] timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
[...]
[     1.000003] cpu0: lapic_per_second set to 38400000
[     1.000003] cpu0: TSC freq CPUID 3417600000 Hz
[     1.000003] cpu0: TSC freq from CPUID 3417600000 Hz
[     1.000003] cpu0: [re]calibrating local timer
[     1.000003] cpu0: apic clock running at 38 MHz
[...]
[     1.000003] timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
[     1.027976] acpi0: ACPI-Fast 24-bit timer
[     1.027976] hpet0 at acpi0: high precision event timer (mem 0xfed00000-0xfed00400)
[     1.027976] timecounter: Timecounter "hpet0" frequency 19200000 Hz quality 2000
[...]
[     1.064455] attimer1 at acpi0 (TIMR, PNP0100): io 0x40-0x43,0x50-0x53 irq 0
[...]
[... pci config dump starts happening here ]
[...]
[     1.064455] cpu0: TSC freq from HPET 3417601000 Hz
[     1.064455] cpu0: CPU base freq 3400000000 Hz
[     1.064455] cpu0: CPU max freq 5500000000 Hz
[     1.064455] cpu0: TSC freq CPUID 3417600000 Hz
[     1.064455] cpu0: TSC freq calibrated 3417601000 Hz
[     1.064455] cpu0: [re]calibrating local timer 
[     1.064455] cpu0: apic clock running at 38 MHz
[     1.064455] timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
[     1.064455] UVM: using package allocation scheme, 1 package(s) per bucket
[     2.073341] cpu1: TSC skew=4 drift=0 
[     2.073341] cpu1: running
[     2.091459] cpu2: TSC skew=0 drift=2 
[     2.091459] cpu2: running
[     2.161459] cpu3: TSC skew=16 drift=67
[...]
[... omitting most of the cpu TSC skew/drift debug output ]
[...]
[     3.561459] cpu22: TSC skew=-34 drift=3
[     3.681459] cpu23: TSC skew=-30 drift=2 
[     3.681459] cpu23: running
[     3.751459] timecounter: Timecounter "TSC" frequency 3417601000 Hz quality 3000

That's it for relevant info from there (that I could see anyway - OK, I know
the UVM package allocation scheme line isn't relevant, but it was just 1 line
amongst a block of lines that were relevant!)

The "while sleep 1; do date; done" test worked just fine ie: behaved as it
would on a kernel without the pci config dump.

The interrupt cpu timer rate remained a nice constant 100 for this as well
(this time, I didn't see the point in switching to some other clock source
and repeating the loop there, so didn't).

So, success.    Thanks.

Pity this won't help with PR 43997 - but I conclude from your response about
that, that if the host running qemu had HZ set significantly higher than 100
then qemu (hosting a kernel with HZ==100) would probably work just fine?

If that is the way it is, then it would seem that the test run host system
should be running a kernel with HZ=1000 or something like that - I believe
it is a reasonably modern amd64 system, right?   It must be able to cope
with 1000 clock interrupts/sec without issue (that's more or less what the
alpha port uses, isn't it - alpha has been that way for decades, ie: on
much slower systems than what is available today).

Making that simple change would (if my conclusion is right) allow all of
the tests which are currently skipped because of 43997 to be reenabled, and
also remove all doubt whether or not some of the other tests which randomly
fail for apparently unknown reasons, are actually more side effects of the
clock issues in qemu as it is currently being run.

But surely there must be some defect in this reasoning, or that would
already be happening, wouldn't it?

kre



Home | Main Index | Thread Index | Old Index