Re: number of seconds since the epoch

To: Robert Elz <kre%munnari.OZ.AU@localhost>
Subject: Re: number of seconds since the epoch
From: Taylor R Campbell <campbell+netbsd%mumble.net@localhost>
Date: Fri, 10 Dec 2010 00:50:28 +0000
(Apologies for the long message.  99.999998% of the time, this subject
doesn't matter much.  But during a leap second, catastrophes that
shouldn't happen sometimes do happen -- not because of leap seconds,
but because of POSIX time, which fundamentally corrupts the clock seen
by programs.

If everyone used leap seconds only for calendrical purposes, and for
timing purposes disregarded whether any second is labelled as a leap
second in UTC, everything would be hunky-dory.  But alas, POSIX made a
disastrous mistake in its definition of `POSIX time' which has
probably cost many parties billions of dollars.)

   Date: Thu, 09 Dec 2010 18:50:23 +0700
   From: Robert Elz <kre%munnari.OZ.AU@localhost>

   What I suggested was using the zic tool (its on every NetBSD system)
   to build your own private copy of your particular timezone file.
   I didn't say you should install it anywhere special, just anywhere you
   know where it is (definitely not /usr/share/zoneinfo).

OK.  Nevertheless, I should like to avoid installing my own modified
time zone database, since all I really want is a TAI clock, which the
NetBSD kernel does maintain, provided that the ntpd informs it
correctly.

(When I say `TAI clock' here, and for the rest of this message, I mean
a hardware clock that is intended to tick SI seconds and be
synchronized, within the tolerable margin of error of the system in
question, to the time reported by the official TAI clocks around the
world.)

     | Also, it looks
     | like even if I do replace the zoneinfo files, time, gettimeofday, and
     | clock_gettime will still behave badly on a leap second, and rewind.

   Ah, no, leap seconds don't work like that.   The system as a whole has
   never heard of leap seconds, and will just ignore them.   That means, the
   leap second is treated exactly the same as any other second (not that we
   pretend it never happened and time stood still).

That's what I assumed for a long time.  Unfortunately, it doesn't
match my observations, the definition of POSIX time, or my
understanding of the relevant code in NetBSD.

A leap second is simply a second on the time line that has a funny-
looking name in UTC-based systems for naming time.  The second passes;
the TAI clocks tick it just like any other second.  (For a negative
leap second, UTC simply skips the name that would normally be used for
that second: ...23:59:57, ...23:59:58, ...24:00:00/00:00:00.)

`The system will just ignore leap seconds' can have one of two
different meanings:

(a) The clock doesn't even know whether any particular second is
    written as a leap second in UTC.

(b) The clock knows about the second, and deliberately doesn't count
    it, or rewinds just before it.

The sensible choice is (a) -- leap seconds are a calendrical issue,
just like February 29th, not a clock issue.  We don't rewind our
clocks at the end of February 28th in a leap year; we use a funny-
looking name on our calendars for the next day.  UTC doesn't rewind
just before a leap second; it uses a funny-looking name for the next
second, such as 2008-12-31T23:59:60Z (written in the ISO 8601 format).

Unfortunately, the mind-boggling brain damage that is called POSIX
time means (b).  In particular, POSIX time is either

. a system for naming second-duration intervals of time, that lacks
  names for some seconds, and may have names for seconds that don't
  exist; or

. a system for naming variable-duration intervals of time.

Whichever interpretation one takes, POSIX time is a poor system for
naming time.  And POSIX clocks tend to behave very badly during leap
seconds -- often by just rewinding by a second.

I haven't personally observed a NetBSD system during a leap second,
but I know that on Linux systems, starting at 2008-12-31T23:59:58Z,
time, gettimeofday, and clock_gettime returned the following numbers;
I have written an ISO 8601 name in UTC for each second next to the
number given by time/gettimeofday/clock_gettime on Linux during that
second:

1230767998      (2008-12-31T23:59:58Z)
1230767999      (2008-12-31T23:59:59Z)
1230768000      (2008-12-31T23:59:60Z)
1230768000      (2009-01-01T00:00:00Z)
1230768001      (2009-01-01T00:00:01Z)

Although I haven't observed a NetBSD system during a leap second, or
during a simulated leap second, here's what I see in the kernel.
hardclock calls tc_ticktock; tc_ticktock calls tc_windup in kern_tc.c.
tc_windup calls ntp_update_second to find the clock adjustment from
the NTP state, and bumps the integral part of the time base (our
notion of the POSIX time when we booted) by the amount requested by
the NTP subsystem:

        static void
        tc_windup(void)
        {
                struct bintime bt;
                time_t t;
                ...
                        t = bt.sec;
                        ntp_update_second(&th->th_adjustment, &bt.sec);
                        ...
                        if (bt.sec != t)
                                timebasebin.sec += bt.sec - t;
                ...
        }

ntp_update_second rewinds bt.sec at the end of the day if the NTP
state said that there was a leap second pending (i.e, if time_state ==
TIME_INS, which the kernel learns from ntp_adjtime, which the ntpd
calls when the NTP server has announced a pending leap second):

        void
        ntp_update_second(int64_t *adjustment, time_t *newsec)
        {
                ...
                /*
                 * Leap second processing. If in leap-insert state at
                 * the end of the day, the system clock is set back one
                 * second; if in leap-delete state, the system clock is
                 * set ahead one second. The nano_time() routine or
                 * external clock driver will insure that reported time
                 * is always monotonic.
                 */
                switch (time_state) {
                        ...
                        /*
                         * Insert second 23:59:60 following second
                         * 23:59:59.
                         */
                        case TIME_INS:
                        if (!(time_status & STA_INS))
                                time_state = TIME_OK;
                        else if ((*newsec) % 86400 == 0) {
                                (*newsec)--;
                                time_state = TIME_OOP;
                                time_tai++;
                        }
                        break;
                        ...
                }
                ...
        }

Note, by the way, that after decrementing *newsec, this increments
time_tai, the current TAI - UTC offset.  So NetBSD does offer a smooth
clock, through ntp_gettime, which is set to TAI if time_tai is set
correctly when the clock is synchronized the first time around (which
ntpd will do, as Daniel Hagerty explained, if configured so).

Returning from my analysis to the conversation...

   If you're using ntp (or something) to sync the time, then as soon as the
   server notices that your system has "drifted away" from the true time,
   it will begin to bring it back in line.  That is, future seconds will all
   be a little longer than real seconds (or shorter, if it was a deleted
   leap second), until your system is back in sync with real time (which may
   take a few hours to accomplish).   True elastic seconds...

The system's hardware clock hasn't drifted from the `true' time, i.e.
TAI; only the system's notion of POSIX time has drifted from the rest
of the world's notion of POSIX time.  The NTP gives enough information
to clients to keep their TAI clocks synchronized, whether or not the
NTP server or the client even knows about the leap second table.  The
time reported by the NTP server won't jump by a second.

(The format of NTP timestamps in the wire protocol will jump, but at
the same time the `pending leap second' flag will lower, after having
been raised for some time (about a day), so clients can still get a
smooth view of TAI if they poll the NTP server more than once per day
(or, more than once per 86400 SI seconds).)

It is reasonable, or at least understandable, for the operating system
to slightly adjust the speed of its POSIX clock near a leap second so
that time, gettimeofday, and clock_gettime don't exhibit such erratic
behaviour as rewinding by a second.  This would be a compromise
between sane clock behaviour and the insanity of POSIX time.

That said, I hope that if NetBSD is ever modified to do this, then
ntp_gettime will still reflect a TAI clock, not something whose
fractional part is halfway between a TAI clock and a POSIX clock,
kinda like a UT1 clock.  (It would be quite a trick if NetBSD had a
UT1 clock device: open("/dev/sundial0")...)

   But unless someone goes explicitly setting the clock (causing time to jump)
   you'll get a relatively smooth progression of exactly 86400 ticks every day.
   (This is why posix seconds, and time_t values, are defined the way they are,
   as truly processing leap seconds correctly is a real shock to most 
   applications).

Applications that deal with timing can safely ignore leap seconds --
in sense (a) above -- without any shock at all: the ticking of a TAI
clock goes on unfazed by how the second was named in UTC.  But POSIX
doesn't let applications ignore leap seconds: the POSIX clock behaves
badly during them.

Applications that deal with calendars need to know about leap seconds
anyway, just like they need to know about February 29th or the number
of days in a month or time zones and daylight saving time rules, or
else they will fail to interoperate with reality.

For instance, the GNU `date' utility doesn't understand UTC:

% date -u +%Y-%m-%dT%H:%M:%SZ -d '2008-12-31 23:59:59'
2008-12-31T23:59:59Z
% date -u +%Y-%m-%dT%H:%M:%SZ -d '2008-12-31 23:59:60'
date: invalid date `2008-12-31 23:59:60'

Our `date' utility accepts this input, but passes it through a broken
system for naming times, and fails to represent it correctly:

% date -u -j +%Y-%m-%dT%H:%M:%SZ 200812312359.60
2009-01-01T00:00:00Z

   I'd be kind of interested to learn what kind of application you're needing
   this for...

Any robust application that needs subsecond time synchronization
between networked agents.  When I say `robust', I mean, for example,
that it should proceed happily even in the face of an NTP server whose
leap second table is outdated (I won't name names here).

Assume all the clocks in a network of POSIX agents and NTP servers are
accurate and synchronized to a <<<1s margin of error.  The agents will
commit suicide if they think their clocks are in error with respect to
the other agents by more than the tolerable margin.  One of the NTP
servers has an outdated leap second table -- not a bad clock (all the
clocks are accurate and synchronized), just an outdated leap second
table.  This is important -- every clock in the network is good; it's
only the leap second table (and POSIX time) that went wrong.  That
should be OK: leap seconds matter only for calendars, not for timing
or timestamping.

Now a leap second happens.  Here are three possible scenarios:

(a) An agent polling the ill-informed NTP server and the well-
    informed NTP server doesn't know what to do: POSIX dictates that
    it should skip a leap second, but the two NTP servers disagree
    about whether the current second is a leap second.  The agent
    oscillates between the two options and crashes.

(b) One group of the agents polls ill-informed NTP servers only, and
    the rest of the agents poll well-informed NTP servers only.  The
    minority group sees a POSIX time that is off by a second from the
    majority group's, and every agent in the minority group
    simultaneously commits suicide because it has fallen out of
    synchrony with the majority.

(c) Like (b), but the operating system does what you and others have
    suggested: it slows the clock down a little near a leap second,
    and speeds it up again, in order to give the simulacrum of a
    smooth time transition.  Unfortunately, the minority group *still*
    sees that its view of POSIX time has drifted by more than the
    tolerable margin of error, and commits suicide (though maybe at a
    slightly different time).

Remember: all of the clocks involved are good, and there are no
calendars involved.  Only the leap second table (and POSIX time) went
wrong.

Even if a hardware clock did go bad, it is not likely to happen to a
large collection of agents simultaneously.  That's the real danger of
POSIX time: its bogosity manifests infrequently, but simultaneously to
everyone.

I would give more details about an actual event, but I am a moron and
I signed an NDA.  This event wouldn't have happened if the agents
involved used a TAI clock instead of a POSIX clock.
References:
- Re: number of seconds since the epoch
  - From: Robert Elz
Prev by Date: Re: number of seconds since the epoch
Next by Date: iwn 5100AGN for netbsd-5 success
Previous by Thread: Re: number of seconds since the epoch
Next by Thread: More persistent device names when using wedges & raidframe?
Indexes:
Home | Main Index | Thread Index | Old Index