Re: kern/53016: Clock not stable

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,kivinen%iki.fi@localhost
Subject: Re: kern/53016: Clock not stable
From: Tero Kivinen <kivinen%iki.fi@localhost>
Date: Wed, 4 Apr 2018 16:35:01 +0000 (UTC)

The following reply was made to PR kern/53016; it has been noted by GNATS.

From: Tero Kivinen <kivinen%iki.fi@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/53016: Clock not stable
Date: Wed, 4 Apr 2018 19:31:06 +0300

 Tero Kivinen writes:
 >  I did not see when this jump happened. I will put test script in the
 >  machine now to measure the time every 10 seconds, so I can see how it
 >  behaves over longer period of time.
 
 I put a loop running which queries the clock every 10 seconds from ntp
 server and prints out the results.
 
 It worked without issues for 8 hours and then the clock jumped:
 
  4 Apr 18:01:13 ntpdate: adjust time server 194.100.49.151 offset -0.000322 sec
  4 Apr 18:01:30 ntpdate: adjust time server 194.100.49.152 offset -0.000314 sec
  4 Apr 18:01:46 ntpdate: adjust time server 130.233.224.52 offset -0.000330 sec
  4 Apr 18:02:02 ntpdate: adjust time server 130.233.224.60 offset -0.000338 sec
  4 Apr 18:02:20 ntpdate: step time server 2a00:1c30:1:100::10 offset 1.788490 sec
  4 Apr 18:02:37 ntpdate: step time server 194.100.49.151 offset 1.789461 sec
  4 Apr 18:02:54 ntpdate: step time server 194.100.49.152 offset 1.789493 sec
  4 Apr 18:03:13 ntpdate: step time server 130.233.224.52 offset 1.789210 sec
 
 So the issue is NOT related to TCS timecounter being inaccurate, but
 it is related for the machine skipping ticks for 1-2 seconds every now
 and then.
 
 During this the ntp looses sync:
 
 
 vielako (18:27) ~>ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
  194.100.49.151  194.100.49.134   2 u  189 1024  277    1.847  1789.52 1656.97
 *ntp1.aalto.fi   193.166.4.49     2 u  933 1024  377   14.269   -0.314 676.392
 +mail.bbnetworks 84.20.147.187    3 u  964 1024  377    0.617   -1.125 676.472
  ftp.mikes.fi    194.100.49.132   2 u   98 1024  377    1.438  1789.56 1656.91
  ntp2.aalto.fi   193.166.4.49     2 u   63 1024  377   14.191  1789.06 1714.93
 vielako (18:27) ~>
 
 
 I.e. others are already 1.7 seconds off, but the master and backup
 have not noticed this yet. 
 
 
 vielako (18:28) ~>ntpq -p                     
      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
  194.100.49.151  194.100.49.134   2 u  323 1024  277    1.847  1789.52 1656.97
 +ntp1.aalto.fi   193.166.4.49     2 u 1067 1024  377   14.269   -0.314 676.392
  mail.bbnetworks 84.20.147.187    3 u   56 1024  377    0.609  1788.69 1657.05
  ftp.mikes.fi    194.100.49.132   2 u  230 1024  377    1.438  1789.56 1656.91
  ntp2.aalto.fi   193.166.4.49     2 u  195 1024  377   14.191  1789.06 1714.93
 
 
 After a while master notices issue, and then bit later the backup sees
 the issue too: 
 
 
 vielako (18:29) ~>ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
 ==============================================================================
  194.100.49.151  194.100.49.134   2 u  344 1024  277    1.847  1789.52 1656.97
  ntp1.aalto.fi   193.166.4.49     2 u    2 1024  377   14.287  1789.42 1657.04
  mail.bbnetworks 84.20.147.187    3 u   77 1024  377    0.609  1788.69 1657.05
  ftp.mikes.fi    194.100.49.132   2 u  251 1024  377    1.438  1789.56 1656.91
  ntp2.aalto.fi   193.166.4.49     2 u  216 1024  377   14.191  1789.06 1714.93
 
 
 and then we have lost sync, and need to wait for a while to readjust
 everything and fix the situation. I.e., this is the reason it takes so
 long to fix the issue...
 
 On the other hand ntpd is not designed to cope with machine clock
 loosing 1.7 seconds during 10 seconds...
 
 I will leave the script running and see if this occurs close to the
 exact hour later too (this time it happened between
 18:02:02-18:02:20).
 
 This is our user database machine and every hour it checks whether it
 needs to regenerate the config files, and it did regenerate some of
 those including dnssec signing and so on. Looking at the times it
 seems this time jump might have happened during the dnssec signing
 operation using dnssec-signzone... The signed file was created
 18:02:02 which matches time when clock drift happened.
 -- 
 kivinen%iki.fi@localhost

Prev by Date: Re: kern/53016: Clock not stable
Next by Date: Re: kern/53140 (Setting up etherip(4) results in panic)
Previous by Thread: Re: kern/53016: Clock not stable
Next by Thread: Re: kern/53016: Clock not stable
Indexes:

Home | Main Index | Thread Index | Old Index