Re: kern/53016: Clock not stable

To: Joerg Sonnenberger <joerg%bec.de@localhost>
Subject: Re: kern/53016: Clock not stable
From: Tero Kivinen <kivinen%iki.fi@localhost>
Date: Wed, 4 Apr 2018 12:41:39 +0300

Joerg Sonnenberger writes:
> On Wed, Apr 04, 2018 at 05:50:01AM +0000, Michael van Elst wrote:
> >  He already mentioned the deviations for all clock sources. 'lapic' and
> >  also 'clockinterrupt' were bad. The rest looked like an average PC that
> >  is not kept in sync by ntpd.
> 
> ...this makes me suspect that the system goes into deep sleep states
> occassionally, where the lapic is stopped. That would also explain time
> keeping problems of ntpd, since the ntp frequency adjustments are done
> in the time interrupt, not by fudging the time counter frequencies.

That is my guess, as ntpd seems to be synced in to the servers:

vielako (11:53) ~>ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*194.100.49.151  194.100.49.132   2 u   36   64  377    0.974   -0.065   8.740
+ntp1.aalto.fi   194.100.49.134   2 u   38   64  377   14.225   -0.096   0.031
-mail.bbnetworks 185.103.110.248  3 u   35   64  377    0.503   -0.142   0.108
+ftp.mikes.fi    194.100.49.133   2 u   38   64  377    1.227   -0.105   0.087
-ntp2.aalto.fi   194.100.49.132   2 u   39   64  377   14.111   -0.242   0.083
vielako (11:53) ~>

Also when I we run the tests we DID NOT have ntpd running, but the
test were run only for few minutes at times, and we didn't see any
jumps during that (except for clockinterrupt and lapic).

On the other hand the bahvior I saw yesterday was so that time was 1.7
seconds off, got fixed by ntpd (most likely realized drift was too
big, and jumped it to correct time), then seemed to be mostly correct
for few minutes while I still checked it (drift in order of few ms),
but then when I came back an hour later the time was 2.7 off again.

I did not see when this jump happened. I will put test script in the
machine now to measure the time every 10 seconds, so I can see how it
behaves over longer period of time.

The current timecounter I am using is the default, i.e., TSC.

As I said in my original bug report this might be related to the
#53017 where the machine crashes because it cannot save the fpu
context.

Looking at that code in sys/arch/x86/x86/fpu.c it spins until ticks !=
hardclock_ticks (ticks stored from the beginning) and this loop has
just x86_pause and spins++ in it.

Then the machine panics if we managed to spin more than 100000000
times in that loop within one hardclock_ticks. Now if the
hardclock_ticks is missing ticks, it might take so long that this loop
expires and panics the whole machine.

I think hardclock_ticks is 10ms, so to be able to run 100M loop
iterations within 10ms, I should have clock speed over 10GHz (and of
course the actual loop takes more than one clock cycle to run). 

Hmm... now looking at the code I think there is a bug there too, as it
does not reread the ticks when the loop ends because of ticks !=
hardclock_ticks, but this does not cause crash, it just causes
different busy loop.

I.e., code reads ticks from hardclock_ticks, starts outer loop:

        spins = 0;
        ticks = hardclock_ticks;
        for (;;) {
	    ...

tries to do saving, does not manage to do that, sends ipi:

      	    	...
                x86_send_ipi(oci, X86_IPI_SYNCH_FPU);
		...

and then goes to wait for this to happen:

                while (pcb->pcb_fpcpu == oci && ticks == hardclock_ticks) {
                        x86_pause();
                        spins++;
                }

Now if this exists because the ticks != hardclock_ticks, it exits this
loop, and if spins is small enough it will go to the main loop again,
try to save context again, does not succeed, sends ipi, comes to this
loop again, but now as ticks is not reread this exists immediately
etc, so then it will loop inside the main loop until it can save the
context. 

                if (spins > 100000000) {
                        panic("fpusave_lwp: did not");
                }

And this panic with "very descriptive" text is the one that is
happening also in this machine.

So if the does not get advanced, pcb->pcb_fpcpu isn't updated for some
reason or long time, then we will hit this.

I assume the hardclock_ticks is updated by one cpu and perhaps that
same cpu is in some kind of deep sleep state, and also has the fpu
context, so thats why another cpu has enough time to loop 100M times
here or something? I will put some debugging prints there to see what
happens when it actually crashes... 

        }

-- 
kivinen%iki.fi@localhost

References:
- Re: kern/53016: Clock not stable
  - From: Michael van Elst
- Re: kern/53016: Clock not stable
  - From: Joerg Sonnenberger

Prev by Date: Re: misc/53153: Fix rump build in DragonFlyBSD
Next by Date: Re: kern/53016: Clock not stable
Previous by Thread: Re: kern/53016: Clock not stable
Next by Thread: Re: kern/53016: Clock not stable
Indexes:

Home | Main Index | Thread Index | Old Index