tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: strftime(3) oddities with %s, %z



    Date:        Wed, 2 Nov 2022 02:39:39 +0000
    From:        David Holland <dholland-tech%netbsd.org@localhost>
    Message-ID:  <Y2HYaxFVSyDJFqxs%netbsd.org@localhost>


  | (a) It follows from the observations so far that if I set ->tm_gmtoff
  | to 101 and ->tm_zone to "abracadabra" (as well as populating the rest
  | of the structure), and ask strftime to print those fields, it will not
  | print them but print something else. I continue to fail to see how
  | this is anything other than a bug.

OK, I have looked into that now, and it turns out not to be a bug,
but by design.   The whole story wrt these two is a little complex...

First, the basic definition of %Z (%z is essentially the same issue, just
with different data, I will concentrate on %Z so I can give specifics):

   %Z  Replaced by the timezone name or abbreviation, or by no bytes if
       no timezone information exists. [tm_isdst]

That's from POSIX, but should be more or less identical to what the C
standard says (I don't have a copy of, any version of, the C standard
to verify that, but it is generally possible from POSIX to tell when
something is intended to agree with C, and when things are being modified).

That by itself is not all that illuminating.   But the standard also says

     Local timezone information shall be set as though strftime()
     called tzset().

That's POSIX text, but I believe the C standard has a similar requirement,
worded a different way (as it does not have tzset()).

tzset() accesses "TZ" if set (or the system's default local timezone if not)
and loads the data needed to access that timezone.

So, from just this, it is clear (I believe) that %Z is intended to print the
local timezone name or abbreviation, always - with the only variation being
whether the standard or summer time name/abbr is printed (which is based upon
tm_isdst).   Note that the '[tm_isdst]' suffix on the definition of %Z says
that that field is the only one from the struct tm passed in that strftime()
will access to provide %Z information - and hence if only %Z were in the
format string, tm_isdst would be the only field the application need set.

At first glance that's why you're seeing EST/EDT (in an east coast US
timezone) despite having created the struct tm with gmtime().


But this is where things start getting more interesting.

POSIX also says:

    If a struct tm broken-down time structure is created or modified by
    gmtime() or gmtime_r(), it is unspecified whether the result of the %Z
    and %z conversion specifiers shall refer to UTC or the current local
    timezone, when strftime( ) is called with such a broken-down time
    structure.

Note that this is a variation from the C standard - but it would seem to
allow the behaviour that you want (at least for gmtime(), not for simply
setting the tm_zone field to an arbitrary string then expecting %Z to
print that string).

Further, at first glance, that's what it looks like the libc/time/strftime.c
code that we have should do:

                        case 'Z':
#ifdef TM_ZONE
                                pt = _add(t->TM_ZONE, pt, ptlim);
#elif HAVE_TZNAME

(the _add() internal function just adds string data to the output buffer,
nothing interesting about that one).  't' is the struct tm * passed to
strftime().

and in "private.h" which is included by strftime.c, we have:

/* NetBSD defaults */
#define TM_GMTOFF       tm_gmtoff
#define TM_ZONE         tm_zone

So, it would appear, at first glance anyway, that we should be adding
t->tm_zone to the buffer when we see a %Z conversion.

Yet that clearly is not happening.

The reason why appears earlier in strftime.c, after private.h
is included, but before there is any code:

/* 
** We don't use these extensions in strftime operation even when
** supported by the local tzcode configuration.  A strictly
** conforming C application may leave them in undefined state.
*/

#ifdef _LIBC
#undef TM_ZONE
#undef TM_GMTOFF
#endif

That is, inside strftime.c TM_ZONE is not defined after all, and
in the %Z code, we instead fall into that #elif

(HAVE_TZNAME is defined, so there code there is executed).

That code extracts the zone name from the current local timezone,
as loaded by tzset(), and inserts that (standard or summer, depending
upon tm_isdst, the one (and only) field of the struct tm that strftime()
is permitted to access when converting %Z).

That is what is happening is what the C standard requires, and which
is clearly at least permitted by POSIX, if not required (there must
be some other implementation of the *time* functions which alters
tzname[] when gmtime() is called, I would guess - but that is pure
speculation).

Or to be more blunt, it must be possible to write this code (which I
have not compiled, so is missing required #include, and could have
other immaterial errors)

	int
	main(int argc, char **argv)
	{
		struct tm t;
		char buf[128];

		t.tm_isdst = 0;
		(void) strftime(buf, sizeof buf, "%Z", &t);
		(void) printf("The local standard timezone name is: %s\n", buf);
		return 0;
	}

The standards promise that will work, and applications are entitled to
take advantage of that.   strftime() must not cause the program to
abort by accessing the uninitialised (random stack garbage) in the
tm_zone field of the struct tm.

This is really why you're getting EST when you use %Z on the results
of gmtime (as gmtime() sets tm_isdst to 0, and EST is the abbreviation
for standard time in the timezone you're using).  (Similarly -0500
as the %z offset).   This is not a bug -- it is doing both what it is
designed, and what it is required by the standards, to do.   That might
not have been what you would have designed the functions to do, but
that doesn't make it broken, just different from what you want, and
so you need to use something different.

I would note in concluding this issue, that our strftime.3 says in the
STANDARDS section:

   STANDARDS
     The strftime() function conforms to ISO/IEC 9899:1999 (?ISO�C99?).

(The '?' are fancy opening+closing double quotes, that my cut&paste
using just ASCII cannot duplicate).   Note that we do not claim that
strftime() complies with any posix standard.

     [if you have a 9.99.x version that is recent enough, but not very recent,
      there might be noise in there about phases of the moon - that was some
      kind of merge error, just ignore that gibberish, the same line appears
      again later, where it belongs (or at least where it was put) in the BUGS
      section.  When you ignore the interloper text, the text quoted here
      is still there]

What that means is that we promise that we will do what the C standard
(C99 version) requires (of course we can add extensions, but cannot break
anything which is required to work).


Finally, one concluding remark about all of this:

campbell+netbsd-tech-userlevel%mumble.net@localhost said:
  | It seems to me either we need a new API, or we risk breaking existing
  | programs.

What's most amazing here, is that appears that no-one participating
in this debate has even bothered to go look at our man page.   If that
was done (the man page for strftime() in this case, though you can do
the same for mktime()) you would see:

     size_t
     strftime_z(const timezone_t tz, char * restrict buf, size_t maxsize,
         const char * restrict format, const struct tm * restrict timeptr);

This is an addition to both C and POSIX, neither have this, even the
tzcode() reference implementation doesn't (though it does have mktime_z()
I think .. not certain about that one).

That's a variation of strftime() where you can tell it which time zone
you want to use, instead of local time, for the conversions, so if you
were to do

	z = tzalloc("UTC");
	t = gmtime(&some_time_t_variable);
	strftime_z(z, buf, sizeof buf, "whatever .. including %s %z and %Z", t);
	
then you'd get %s/%z/%Z values as specified by the UTC timezone, instead
of current local time.   (Don't forget to tzfree(z) somewhere - or just 
exit()).

That is, we don't need a new API, we already have the new API, just none
of the participants here seem to have bothered to notice it.

We also have strftime_l (which is a POSIX, but not C, function,
but added in a later version of POSIX than that which we claim to support,
still we seem to support it anyway), which allows the locale to be used
to be specified (rather than just using the default) and even strftime_lz()
(which allows both, and is neither POSIX nor C).

   [Aside: a truly bizarre factoid - when we do eventually claim, in
    unistd.h, that we support Posix.7 (ie: 2008) rather than Posix.6 (2001)
    the support we have for strftime_l() will actually vanish - the function
    will still be there, but will simply be a clone of strftime(),
    ignoring the passed in locale.   That wacky logic comes from
    tzcode, as many users of it have no locale handling at all,
    so our implementation will need local fixing sometime].

All the real code is in strftime_lz() the others just call it with the
appropriate locale and timezone parameters.

We do not, however, seem to have documentation for strftime_l() or
strftime_lz(), and that probably should get fixed (by the proverbial
someone).

kre




Home | Main Index | Thread Index | Old Index