tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: strftime(3) oddities with %s, %z



> Date: Wed, 2 Nov 2022 15:59:00 +0300
> From: Valery Ushakov <uwe%stderr.spb.ru@localhost>
> 
> In other words, class tm doesn't have a public constructor that
> provides a way to specify TZ info.  There are other factory methods
> that allow one to obtain an instance of tm that has the TZ info (in
> its private parts). ...

Suppose you create a struct tm _without_ gmtime(3) or localtime(3),
using designated initializers or memset for zero-initialization, with
only what is included in POSIX:

struct tm tm = {
	.tm_sec = 56,
	.tm_min = 34,
	.tm_hour = 12,
	.tm_mday = 1,
	.tm_mon = 12 - 1,	/* December */
	.tm_year = 2021 - 1900,
	.tm_wday = 3,		/* Wednesday */
	.tm_yday = 334,		/* zero-based day of year (%j - 1) */
	.tm_isdst = 0,
};

Nothing I've found in POSIX suggests you can't construct a struct tm
like this and use it with mktime, and the EXAMPLES section of
<https://pubs.opengroup.org/onlinepubs/009695399/functions/mktime.html>
certainly suggests you can -- indeed, tm_wday and tm_yday could even
be omitted.  (If you think otherwise: Why do you think you can't
construct a struct tm like this?)

This struct tm doesn't specify a time zone in which to interpret the
calendar date.  So what time_t do you get out of mktime(&tm), or what
number is represented by the string you get out of strftime(..., "%s",
&tm)?

First, in any particular context, I hope these should be the same!
(If you think otherwise: Why should they be different?)

If TZ=UTC, I think we can all agree that the answer should be
1638362096.  (If you think otherwise: What should the answer be?)

Now what if TZ is not UTC, say TZ=Europe/Berlin?  It obviously depends
on whether mktime and strftime examine TZ or tm_gmtoff.  Here are some
possible rules to answer this:

1. mktime and strftime ignore tm_gmtoff and respect TZ, as if
   tm_gmtoff did not exist.

   In that case, we should get 1638358496, which is 1638362096 - 3600
   because Europe/Berlin is +0100 at that calendar date, 1h ahead of
   UTC.

   This is the semantics that portable applications currently rely on,
   so whatever the rule is had better agree with this!

2. mktime and strftime respect tm_gmtoff and ignore TZ.

   In that case, we should get 1638362096 because tm_gmtoff=0 in this
   code.  But suddenly this is different from what portable
   applications can rely on, so this can't be the right rule.

3. mktime and strftime respect tm_gmtoff if it is nonzero, meaning it
   has been initialized by something not currently portable in POSIX,
   and use TZ if tm_gmtoff=0.

   In that case, we should get 1638358496, because tm_gmtoff=0 in this
   code, so it interprets the struct tm in TZ=Europe/Berlin.

   However, this has a funny side effect.  Suppose we get struct tm
   values from the following pseudocode:

	TZ=Atlantic/Reykjavik localtime_r(1638362096, &tm_is);
	TZ=Europe/Rome localtime_r(1638362096, &tm_it);
	TZ=Israel localtime_r(1638362096, &tm_il);
	TZ=Asia/Baghdad localtime_r(1638362096, &tm_iq);

   Since these have filled in the time zones, it shouldn't matter what
   TZ is set to when we feed tm_is/it/il/iq into mktime or
   strftime("%s"), right?

   Unfortunately, it _does_ matter.  If we have, say, TZ=Europe/Rome,
   then under this rule we would get:

	tm_is: 1638358496
	tm_it: 1638362096
	tm_il: 1638362096
	tm_iq: 1638362096

   That's because with TZ=Atlantic/Reykjavik, tm_gmtoff=0.  (Same with
   some others like TZ=Europe/London, at least during the winter.)

   So although this rule preserves the semantics of portably
   constructed struct tm, it has wacky semantics for struct tm
   constructed with a tm_gmtoff-aware localtime(3) -- I think we can
   all agree this is obviously wrong.

4. mktime and strftime respect tm_gmtoff if tm_zone is nonnull, and
   use TZ if tm_zone is null.

   First, I'm not sure if tm_zone is always initialized to something
   nonnull by localtime and gmtime -- it's unclear to me what naming
   standard it follows, and I wouldn't be surprised if that included
   sometimes leaving it as null.  But let's suppose it is always
   initialized to nonnull.

   In that case, we should get 1638358496, because tm_gmtoff=0 in this
   code, so it interprets the struct tm in TZ=Europe/Berlin.

   But this avoids conflating zero-initialized tm_gmtoff with a
   baked-in time zone of UTC, so with the various localtime calls in
   case (3) we would always get 1638362096 out of mktime.

   I think this might be closer to what uwe@ and dholland@ want: if
   you didn't specify a time zone, then mktime uses TZ, but you can
   specify a time zone or let localtime record what TZ was and it will
   be passed on to mktime no matter what TZ is later.

   However, this still changes semantics that portable applications
   can currently rely on _even if they don't construct their own
   struct tm objects_.

   For example, POSIX currently guarantees that the following program
   prints 1638362096 -- but under this rule, it would print
   1638358496:

	#include <stdio.h>
	#include <stdlib.h>
	#include <time.h>

	int
	main(void)
	{
		time_t t = 1638358496;
		struct tm tm;

		setenv("TZ", "Europe/Berlin", 1);
		tzset();
		localtime_r(&t, &tm);
		setenv("TZ", "UTC", 1);
		tzset();
		if ((t = mktime(&tm)) == -1) {
			perror("mktime");
			return 1;
		}
		printf("%lld\n", (long long)t);
		fflush(stdout);
		return ferror(stdout);
	}

   (This program may be a little silly, but it could be used to find
   how long you have to wait from when it's a certain local time in
   one place to when it is the `same' local time in another place.)

   So while this rule might be a more sensible API design, it still
   substantively changes the semantics of portable programs.

If you want a map from struct tm to time_t that recognizes the
difference between an input obtained by localtime and an input
obtained by gmtime, I don't think you can do that with mktime or
strftime("%s") without changing the semantics that existing programs
might rely on, silly as the original semantics may seem.

It seems to me either we need a new API, or we risk breaking existing
programs.


Home | Main Index | Thread Index | Old Index