Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: PSA: Clock drift and pkgin



On 2023-12-21 13:47, Jan-Benedict Glaw wrote:
On Thu, 2023-12-21 12:07:56 +0000, Maciej W. Rozycki <macro%orcam.me.uk@localhost> wrote:
On Thu, 21 Dec 2023, Jan-Benedict Glaw wrote:
  I've seen that too, but I have no answer to this one.

Why was two days ago's reachability so limited?

  It's the normal procedure when `ntpd' cannot cope with the drift and the
sync drops.  It then adjusts system tick duration and tries to resync from
scratch.  Reachability will cycle through 1, 3, 7, 17, 37, 77, 177, to 377
octal then.  This also means `ntpd' was still in sync two days ago, though
my samples didn't actually catch it (at 1024 poll rate I'd have to wait
long).

The "377" is an octal representation of a (shifted) bitlist showing
the most recent 8 time requests for that peer. As you wrote, you
synced (ntpdate?) that box about a week ago, so I would have expected
(as the two oldest `peers` stats are waaay after that) that it, since
then, always reached its peers. But the non-377 values tell that ntpd
thinks it didn't reach that peer at those times.

Right. However, when ntp does a reset of its state, because it either changed the time, or the synced peer, or I think also under some other circumstances, it clears out the mask. So that non-377 value is probably just because you are seeing it just a short time after such a reset.

...and finally: Why didn't ntpd react with increasing the poll
interval?

  Well, 1024 is already the maximum AFAIK.

Eh, that was expressed misleadingly :(  I ment: It recognized trouble
keeping time, why didn't it increase poll rate (ie. _reducing_ the
poll _interval_ from its maximum of 1024 seconds)?

ntp usually don't unless it is actively trying to get into sync with a source. At which time it wants to go more frequently for a while. To me it looks more like it's not syncing at any time, and have basically resigned itself to just on a regular bases pull information out.

Do you, by chance, know the approximate times when you fetched
"yesterdays" and "todays" numbers? Between that time, it drifted away
by some 30 sec, but over what interval?

  I can restart `ntpd' with logging enabled, but really what has to be done
at this point it is fixing the high-resolution timer frequency set in the
kernel, and only then it will make sense to fiddle with NTP further.

  I do hope to have some time next week or maybe one after next to patch up
the kernel and rebuild (I've never done that before and need to figure out
if I am able to cross-build the kernel on my POWER9/Linux system (with GCC
14, to make things more interesting) or will I have to resort to a native
build, which I suppose can take forever (but will be closer to how GENERIC
has been built)).

That's just a GENERIC kernel? You can easily cross-build NetBSD on any
Linux box and copy over the kernel to the NetBSD system. I usually
just let it build on my CI host and fetch whatever I need from stored
build artifacts. NetBSD will use the host compiler to build its
internal toolchain (GCC 10 based) and use that to actually cross-build
the release artifacts.  Feel free to drop me patches, you'll get free
compilation-as-a-service. :)

As soon as I have completed my native build, I plan to start digging into the whole time issue as well. It seems clear it's a problem not only on something like simh, but on real hardware as well. So we might very well also have multiple problems... Next week I will be able to fire up my real 4000/90, to avoid any issues by simh.

  Johnny

--
Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: bqt%softjar.se@localhost             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol


Home | Main Index | Thread Index | Old Index