tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: strftime(3) oddities with %s, %z



On Sat, Nov 05, 2022 at 08:49:57PM +0700, Robert Elz wrote:
 >   | (a) It follows from the observations so far that if I set ->tm_gmtoff
 >   | to 101 and ->tm_zone to "abracadabra" (as well as populating the rest
 >   | of the structure), and ask strftime to print those fields, it will not
 >   | print them but print something else. I continue to fail to see how
 >   | this is anything other than a bug.
 > 
 > OK, I have looked into that now, and it turns out not to be a bug,
 > but by design.   The whole story wrt these two is a little complex...
 > [...]

Ok, so it turns out that C99 does exactly specify which fields of
struct tm are used by each strftime conversion, which I missed on the
first go and nobody bothered to point out explicitly.

This means strftime must ignore any timezone fields present in struct
tm other than tm_isdst when printing %z and %Z, even though C99 leaves
all other timezone behavior as implementation-defined.

Therefore it is impossible to have timezone fields in struct tm, use
them in strftime for %z and %Z in what any reasonable person would
conclude is the correct manner, and still be C99-compliant. Or POSIX
compliant either since POSIX imports the same restrictions.

This is unfortunate. It seems to orphan this text in POSIX:

 :     If a struct tm broken-down time structure is created or modified by
 :     gmtime() or gmtime_r(), it is unspecified whether the result of the %Z
 :     and %z conversion specifiers shall refer to UTC or the current local
 :     timezone, when strftime( ) is called with such a broken-down time
 :     structure.

since in the absence of significant magic this clearly requires
having and using timezone fields in struct tm. However, it wouldn't be
the first time a standard intended to allow certain behavior without
actually making it legal.

I do wonder, however, since it was mentioned earlier in this thread
that POSIX is planning to add timezone fields to struct tm, what
they're intending in this regard. Do you have a link to that proposal?

Meanwhile, however, this doesn't affect %s since it isn't in C99 and
POSIX is free to define it as derived from whatever struct tm fields
they want. Furthermore, though I gather this has not actually been
done, it would be reasonable to extend the above permission to use UTC
to %s.

I also don't see that any of this constitutes permission to pass a
struct tm containing uninitialized fields, especially if the format
contains %s. There is nothing in the C99 description of mktime (or
POSIX either) that specifies which fields it uses, so passing a struct
tm with uninitialized fields to mktime, and thus for %s and strftime
given the proposed definition of %s, constitutes UB. Since both C99
and POSIX allow additional fields, there must be some form of
initialization that covers them.

Consequently I don't see any reason, other than the question of
extending the above permission, why %s and also mktime can't use the
timezone fields in struct tm, and therefore produce the correct answer
in the various scenarios that have been described.

If the state of the real world is that code that conses up struct tm
doesn't actually bzero it first (or use an explicit struct
initializer), then using the timezone fields is perhaps unwise in the
near term. I also wonder how POSIX plans to handle this if they're
proposing adding those fields.

(Note that questions about bzero vs. explicit initializers and all
bits zero are a red herring -- we do not run on the DS9000.)

 > That by itself is not all that illuminating.   But the standard also says
 > 
 >      Local timezone information shall be set as though strftime()
 >      called tzset().
 > 
 > That's POSIX text, but I believe the C standard has a similar requirement,
 > worded a different way (as it does not have tzset()).

C99 doesn't say anything about it. It is all implementation-defined.

However, all that text says is that the local timezone information is
set, so that it's available; it does _not_ say the local timezone
information is _used_, either there or in the definitions of %z and
%Z, and given the paragraph that attempts to give permission to print
GMT if the struct tm notionally reflects UTC, it does not seem
reasonable to assert that %z and %Z necessarily print the _local_
timezone.

(C99 doesn't even have the words "the time zone"; the only text it has
that says anything at all about what zone might be intended is "if no
time zone is determinable". So if you can determine a time zone, you
can use it to print %z or %Z; but you can't determine it based on
extra fields in struct tm.)

 > The reason why appears earlier in strftime.c, after private.h
 > is included, but before there is any code:
 > 
 > /* 
 > ** We don't use these extensions in strftime operation even when
 > ** supported by the local tzcode configuration.  A strictly
 > ** conforming C application may leave them in undefined state.
 > */
 > 
 > #ifdef _LIBC
 > #undef TM_ZONE
 > #undef TM_GMTOFF
 > #endif
 > 
 > That is, inside strftime.c TM_ZONE is not defined after all, and
 > in the %Z code, we instead fall into that #elif
 > [...]

Like I said above, it isn't clear to me that anything grants
permission to pass uninitialized fields to strftime, and it _is_ clear
(given the proposed and de facto definition of %s) that passing
uninitialized fields to strftime while using %s is UB.

 > Or to be more blunt, it must be possible to write this code (which I
 > have not compiled, so is missing required #include, and could have
 > other immaterial errors)
 > 
 > 	int
 > 	main(int argc, char **argv)
 > 	{
 > 		struct tm t;
 > 		char buf[128];
 > 
 > 		t.tm_isdst = 0;
 > 		(void) strftime(buf, sizeof buf, "%Z", &t);
 > 		(void) printf("The local standard timezone name is: %s\n", buf);
 > 		return 0;
 > 	}

Or maybe not.

Except that if there's significant amounts of code in the wild that
does this sort of thing (which I expect is probably the case) we
can't, as a pragmatic matter, break it without making a reasonable
attempt to find and patch it.

 > What's most amazing here, is that appears that no-one participating
 > in this debate has even bothered to go look at our man page.   If that
 > was done (the man page for strftime() in this case, though you can do
 > the same for mktime()) you would see:
 > 
 >      size_t
 >      strftime_z(const timezone_t tz, char * restrict buf, size_t maxsize,
 >          const char * restrict format, const struct tm * restrict timeptr);

It came up earlier, but got left by the wayside because it's not
portable and the question was what the portable functions should do.

 >    [Aside: a truly bizarre factoid - when we do eventually claim, in
 >     unistd.h, that we support Posix.7 (ie: 2008) rather than Posix.6 (2001)
 >     the support we have for strftime_l() will actually vanish - the function
 >     will still be there, but will simply be a clone of strftime(),
 >     ignoring the passed in locale.   That wacky logic comes from
 >     tzcode, as many users of it have no locale handling at all,
 >     so our implementation will need local fixing sometime].

Can you file a PR on that? It seems likely that if nobody does
anything about it sooner or later the way we'll discover it again is
by random barely-debuggable browser or JVM misbehavior.

(Also, most people are afraid to touch tzcode, with good reason...)

 > We do not, however, seem to have documentation for strftime_l() or
 > strftime_lz(), and that probably should get fixed (by the proverbial
 > someone).

That too.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index