Re: kern/53016: Clock not stable

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,kivinen%iki.fi@localhost
Subject: Re: kern/53016: Clock not stable
From: Tero Kivinen <kivinen%iki.fi@localhost>
Date: Wed, 4 Apr 2018 09:45:01 +0000 (UTC)

The following reply was made to PR kern/53016; it has been noted by GNATS.

From: Tero Kivinen <kivinen%iki.fi@localhost>
To: Joerg Sonnenberger <joerg%bec.de@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
        netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/53016: Clock not stable
Date: Wed, 4 Apr 2018 12:41:39 +0300

 Joerg Sonnenberger writes:
 > On Wed, Apr 04, 2018 at 05:50:01AM +0000, Michael van Elst wrote:
 > >  He already mentioned the deviations for all clock sources. 'lapic' and
 > >  also 'clockinterrupt' were bad. The rest looked like an average PC that
 > >  is not kept in sync by ntpd.
 > 
 > ...this makes me suspect that the system goes into deep sleep states
 > occassionally, where the lapic is stopped. That would also explain time
 > keeping problems of ntpd, since the ntp frequency adjustments are done
 > in the time interrupt, not by fudging the time counter frequencies.

 That is my guess, as ntpd seems to be synced in to the servers:

 vielako (11:53) ~>ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
 *194.100.49.151  194.100.49.132   2 u   36   64  377    0.974   -0.065   8.740
 +ntp1.aalto.fi   194.100.49.134   2 u   38   64  377   14.225   -0.096   0.031
 -mail.bbnetworks 185.103.110.248  3 u   35   64  377    0.503   -0.142   0.108
 +ftp.mikes.fi    194.100.49.133   2 u   38   64  377    1.227   -0.105   0.087
 -ntp2.aalto.fi   194.100.49.132   2 u   39   64  377   14.111   -0.242   0.083
 vielako (11:53) ~>

 Also when I we run the tests we DID NOT have ntpd running, but the
 test were run only for few minutes at times, and we didn't see any
 jumps during that (except for clockinterrupt and lapic).

 On the other hand the bahvior I saw yesterday was so that time was 1.7
 seconds off, got fixed by ntpd (most likely realized drift was too
 big, and jumped it to correct time), then seemed to be mostly correct
 for few minutes while I still checked it (drift in order of few ms),
 but then when I came back an hour later the time was 2.7 off again.

 I did not see when this jump happened. I will put test script in the
 machine now to measure the time every 10 seconds, so I can see how it
 behaves over longer period of time.

 The current timecounter I am using is the default, i.e., TSC.

 As I said in my original bug report this might be related to the
 #53017 where the machine crashes because it cannot save the fpu
 context.

 Looking at that code in sys/arch/x86/x86/fpu.c it spins until ticks !=
 hardclock_ticks (ticks stored from the beginning) and this loop has
 just x86_pause and spins++ in it.

 Then the machine panics if we managed to spin more than 100000000
 times in that loop within one hardclock_ticks. Now if the
 hardclock_ticks is missing ticks, it might take so long that this loop
 expires and panics the whole machine.

 I think hardclock_ticks is 10ms, so to be able to run 100M loop
 iterations within 10ms, I should have clock speed over 10GHz (and of
 course the actual loop takes more than one clock cycle to run). 

 Hmm... now looking at the code I think there is a bug there too, as it
 does not reread the ticks when the loop ends because of ticks !=
 hardclock_ticks, but this does not cause crash, it just causes
 different busy loop.

 I.e., code reads ticks from hardclock_ticks, starts outer loop:

         spins = 0;
         ticks = hardclock_ticks;
         for (;;) {
 	    ...

 tries to do saving, does not manage to do that, sends ipi:

       	    	...
                 x86_send_ipi(oci, X86_IPI_SYNCH_FPU);
 		...

 and then goes to wait for this to happen:

                 while (pcb->pcb_fpcpu == oci && ticks == hardclock_ticks) {
                         x86_pause();
                         spins++;
                 }

 Now if this exists because the ticks != hardclock_ticks, it exits this
 loop, and if spins is small enough it will go to the main loop again,
 try to save context again, does not succeed, sends ipi, comes to this
 loop again, but now as ticks is not reread this exists immediately
 etc, so then it will loop inside the main loop until it can save the
 context. 

                 if (spins > 100000000) {
                         panic("fpusave_lwp: did not");
                 }

 And this panic with "very descriptive" text is the one that is
 happening also in this machine.

 So if the does not get advanced, pcb->pcb_fpcpu isn't updated for some
 reason or long time, then we will hit this.

 I assume the hardclock_ticks is updated by one cpu and perhaps that
 same cpu is in some kind of deep sleep state, and also has the fpu
 context, so thats why another cpu has enough time to loop 100M times
 here or something? I will put some debugging prints there to see what
 happens when it actually crashes... 

         }

 -- 
 kivinen%iki.fi@localhost

Follow-Ups:
- Re: kern/53016: Clock not stable
  - From: Tero Kivinen

Prev by Date: Re: kern/53016: Clock not stable
Next by Date: port-amd64/53155: OS wedges after <12h uptime when >2 bnx network interfaces in use
Previous by Thread: Re: kern/53016: Clock not stable
Next by Thread: Re: kern/53016: Clock not stable
Indexes:

Home | Main Index | Thread Index | Old Index