tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: strftime(3) oddities with %s, %z



    Date:        Sun, 6 Nov 2022 17:31:59 +0000
    From:        David Holland <dholland-tech%netbsd.org@localhost>
    Message-ID:  <Y2fvj/+E7OOx4rbO%netbsd.org@localhost>

  | It seems to orphan this text in POSIX:
  |
  |  :  If a struct tm broken-down time structure is created or modified by
  |  :  gmtime() or gmtime_r(), it is unspecified whether the result of the %Z
  |  :  and %z conversion specifiers shall refer to UTC or the current local
  |  :  timezone, when strftime( ) is called with such a broken-down time
  |  :  structure.

Not necessarily:

  | since in the absence of significant magic this clearly requires
  | having and using timezone fields in struct tm.

No, it doesn't.   That text is there now, the tm_zone and tm_gmtoff
fields are only just being added in the next version.  The same text
was there in Issue6 TC1 of POSIX (2003) which is where it seems to
have been added (it isn't in the original Issue 6 (2001)).   All of
that is LONG before anyone in the standards community would have
been taking any notice of tm_zone or tm_gmtoff, the reason for it
needs to be something completely different than those.

As I said in an earlier message, my guess (and certainly, this is not
anything more than that) is that there are (were) implementations of gmtime
which set the tzaddr[] array to "UTC" (or "GMT") at least for tzaddr[0].
If one calls gmtime() after, and then strftime, it is entirely possible
that %z may produce 0 and %Z "UTC" (or "GMT") on such an implementation.

Since POSIX likes (mostly) to standardise what exists, allowing either the
local timezone info, or that for GMT in such a case is exactly the kind of
thing they would do.   No assumptions about tm_zone (etc) required.

  | I do wonder, however, since it was mentioned earlier in this thread
  | that POSIX is planning to add timezone fields to struct tm, what
  | they're intending in this regard. Do you have a link to that proposal?

Certainly:
	https://austingroupbugs.net/view.php?id=1533

which is a 2021 proposal - note which is much more recent than
the proposal to add %s to strftime, which was more or less complete
in 2009 (it just needed to wait for a new standard version to
actually be incorporated).

That bug forgot to update mktime() which needs doing as part of the
addition of those extra fields (because of the lazy way it was
specified).   I filed

	https://austingroupbugs.net/view.php?id=1613

just a few days ago to correct that.   There has been no reaction to
that one, which I suspect probably means it will be accepted more or
less as written (the actual wording might be altered, I do not always
to a very good job of explaining things clearly, so that often happens).

On the other hand

	https://austingroupbugs.net/view.php?id=1614

which is another defect report I filed against mktime - one which I
expected to be a no-brainer (much less controversial, I thought) than
1613, has turned out to generate a lot of ire...   (This one has nothing
at all to do with anything we have been discussing in this thread.)

  | Meanwhile, however, this doesn't affect %s since it isn't in C99 and
  | POSIX is free to define it as derived from whatever struct tm fields
  | they want.

They could have done, yes, but unless there is a very good reason to be
different from the implementations, they don't.   %s is defined to use
mktime() - what that uses I will come to below (and is the subject of
bug 1613 above).

  | Furthermore, though I gather this has not actually been
  | done, it would be reasonable to extend the above permission to use UTC
  | to %s.

No, only if implementations actually do that, and they don't.

Sice %s is specified to use mktime() it uses exactly the fields
of the struct tm that mktime() uses, and those are defined (defined
in a rather stupid and lazy way, which is harder than it needs to
be to understand, and not future proof, which it should be - that's
what bug 1613 is intended to fix).   But they are specified.

You are, of course, perfectly entitled to ask the POSIX maintainers
(the Austin Group) to modify things that way, if you like - you'd
need to supply exactly the language needed to change to make that
happen, and into what, and generally, note implementations which
already act as you are planning (and for this, NetBSD isn't really
generally considered to be a big enough player to count for much).

  | I also don't see that any of this constitutes permission to pass a
  | struct tm containing uninitialized fields, especially if the format
  | contains %s.

There is, actually.

  | There is nothing in the C99 description of mktime (or
  | POSIX either) that specifies which fields it uses,

I don't know what's in the C standards, but in POSIX that is specified.
I would assume (since nothing in POSIX indicates that this is an
extension) that the C standard is probably the same.   Maybe someone
with access to it (or a late draft, hint, hint) could tell us if it is
substantially the same or not.

What is said in XSH 3.mktime (this from the current POSIX standard,
Issue7 TC2, 2018 I think - but this text is much older than that) is:

	The mktime( ) function shall convert the broken-down time,
	expressed as local time, in the structure pointed to by timeptr,
	into a time since the Epoch value with the same encoding as that
	of the values returned by time( ). The original values of the
	tm_wday and tm_yday components of the structure shall be ignored,
	and the original values of the other components shall not be
	restricted to the ranges described in <time.h>.

Before anything else, note the "expressed as a local time" words (though
those are not relevant to this particular point).

That is, it uses the values from a struct tm, as defined by <time.h>
except that it ignores tm_wday and tm_yday (hence those fields need
not be set to anything, as they will not be referenced - they will
be set before a non-error return however, that is stated later).

To understand what that really means, one needs to also look at <time.h>

The relevant part of XBD 13.<time.h> is:

	The <time.h> header shall declare the tm structure, which shall
	include at least the following members:

		int tm_sec Seconds [0,60].
		int tm_min Minutes [0,59].
		int tm_hour Hour [0,23].
		int tm_mday Day of month [1,31].
		int tm_mon Month of year [0,11].
		int tm_year Years since 1900.
		int tm_wday Day of week [0,6] (Sunday =0).
		int tm_yday Day of year [0,365].
		int tm_isdst Daylight Savings flag.

	The value of tm_isdst shall be positive if Daylight Savings Time
	is in effect, 0 if Daylight Savings Time is not in effect, and
	negative if the information is not available.

While implementations are permitted to add fields (which is how we
get tm_zone and tm_gmtoff) conforming applications are not permitted
to expect any others to exist, and hence cannot expressly set them to
anything (they'd need to be relying on something extra, like NETBSD_SOURCE
or GNU_SOURCE or whatever it is called) which defines the extra fields,
and that would make a non-standards conforming application.

Hence, when XSH 3.mktime says "in the structure pointed to by timeptr"
(a struct tm) it can only possibly be requiring the fields it has
defined - and hence conforming applications only need set those (the
7 that remain after tm_wday and tm_yday are ignored).

I know this all sounds rather roundabout and convoluted, but trust me,
this is exactly how the standards are interpreted.

  | If the state of the real world is that code that conses up struct tm
  | doesn't actually bzero it first (or use an explicit struct
  | initializer), then using the timezone fields is perhaps unwise in the
  | near term. I also wonder how POSIX plans to handle this if they're
  | proposing adding those fields.

They are adding the fields as outputs from localtime(), mktime() etc.
They are not being used as input values to anything - if they were, then
existing conforming applications would automatically break, and that
isn't something that happens (application people would veto the idea).

Existing conforming applications can simply ignore the new fields
completely, and as long as they stick to the standard interfaces
(even including strftime("%s") since that's been around and coming
for 13 years now - an plenty of people knew that) the application
will continue to work just fine (just as such applications do when
compiled on NetBSD which has had the extra fields sice conception).

Applications that conform to the coming standard will be able to
use the tm_gmtoff and tm_zone fields (subject particularly to the
lifetime restrictions on the string tm_zone points to) however
they see fit - if desired, instead of using %Z and %z.

[Aside: the 'E' and 'O' modifiers for the strftime() conversions
currently have no defined meaning for %z or %Z, so while it would
be a wild variation to the current meanings of those modifiers on the
conversions to which they can be currently applied, an implementation
could, I suppose, modify strftime so that one of %EZ or %OZ (and %Ez
or %Oz) behaved the way you'd like them too - %s is more difficult,
though could also be done but would require more specification, and
the implementation would be very messy I suspect.]

  | (Note that questions about bzero vs. explicit initializers and all
  | bits zero are a red herring -- we do not run on the DS9000.)

Agreed, that's noise.

  | Like I said above, it isn't clear to me that anything grants
  | permission to pass uninitialized fields to strftime, and it _is_ clear
  | (given the proposed and de facto definition of %s) that passing
  | uninitialized fields to strftime while using %s is UB.

Only for the fields that mktime() uses, as %s is defined to give
(as a decimal character string) the same value that mktime() returns
as a time_t


  | It came up earlier, but got left by the wayside because it's not
  | portable and the question was what the portable functions should do.

OK - I mentioned it just as it was noted that we'd need a new API,
and it turns out that we already have it.

  | Can you file a PR on that? It seems likely that if nobody does
  | anything about it sooner or later the way we'll discover it again is
  | by random barely-debuggable browser or JVM misbehavior.

I could, but first let me deal with your followup message on this point
(there isn't likely to be any hurry here, we cannot alter the POSIX
version we support without auditing everything in the newer version of
the standard, and our implementations of it all, to make sure that we
do in fact implement all of that - that's not happening (finishing) any
time this year).


dholland-tech%netbsd.org@localhost said in a followup message:
  | The plot thickens! This is not true. There are two copies of strftime_l
  | in strftime.c,

Yes, I know.

  | one that works and one that behaves as you describe;

I'm not sure which you mean there ... the one we use currently is the
one that uses the locale, the one we don't, is the one which ignores the
passed in locale (which is currently simply omitted).

  | the latter one is inside #if HAVE_STRFTIME_L, which I guess is false.

Yes, I think so.

  | So I guess if the _POSIX_VERSION gets bumped it'll just break the build. :-)

You mean because then there would be two ....   But no, I think not
(while I haven't actually tested this, and should, and certainly would
before filing a PR) I believe what happens in that case, is that the
strftime_l() which is currently omitted becomes strftime_l() instead
of the one we currently used, and the other one, by this piece of
magic in private.h

	# if HAVE_STRFTIME_L
	#  undef  strftime_l 
	#  define strftime_l tz_strftime_l
	# endif

gets renamed to be tz_strftime_l() instead.   Or at least that is what
is intended to happen.

dholland-tech%netbsd.org@localhost said in that same follup message:
   | Any reason not to remove that one?

I think you already answered that in the message to which I am
replying (the one quoted through most of this reply)

  | (Also, most people are afraid to touch tzcode, with good reason...)

Fortunately, as we have additions/changes to tzcode already (some of them,
controlled by #if NETBSD_INSPIRED in the tzcode sources) we have
some brave souls amongst the NetBSD developers, so all hope is not lost.

kre



Home | Main Index | Thread Index | Old Index