tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: strftime(3) oddities with %s, %z



    Date:        Sun, 30 Oct 2022 20:27:43 +0000
    From:        David Holland <dholland-tech%netbsd.org@localhost>
    Message-ID:  <Y17ePwxjzN6Ob3tS%netbsd.org@localhost>

  | "%s    is replaced by the number of seconds since the Epoch"

Yes, though as has been mentioned before, that's incomplete (no matter
what you think should happen here, that isn't enough info to conclude
what that would be).

  | That is, it's supposed to print the time_t value.

Yes, absolutely.

  | In the context of
  | time_t as it's universally known, there is no timezone offset and no
  | timezone handling; it is the absolute time (modulo leap second issues,
  | which are an entirely different issue)

Yes, agreed, that's all correct.

  | and timezones are only introduced when times are prepared for printing.

Not quite.   While output of a time is perhaps the most common reason for
converting to local time (either the user's local time, or anyone else's)
it is not the only one.   But that quibble isn't important here.

  | This is a fundamental design property of Unix time handling. Changing
  | it would be a mistake.

Of course, and no-one is proposing changing that.

  | Changing it just in this particular corner because of an
  | implementation defect is particularly silly.

No-one is changing it here either.   Further, if there is an implementation
defect, it would require intrusive changes to correct it.

That's because you have forgotten one vital issue in all of this.

Go back to the first sentence (quote) from your message - which is also the
first (quoted from your message) in this reply.

"The number or seconds since the epoch" obtained from ????   Where does
the time_t that this represents come from?   There's no time_t arg to
strftime() (one could be added, but that would be a significant API change)
and no time_t in the struct tm which strftime() does have as its arg (and
changing that would be a massive ABI change, probably an API change for
some users as well).

All we have is the broken down time in the struct tm, which needs to be
converted back to a time_t in order to print it as the %s value of strftime().

Note: there is (definitely was when the interface was created) no guarantee
that there is any timezone information in the struct tm at all, so to create
a portable definition of that conversion, none can be assumed.   You also
cannot assume that this info must have been saved somewhere, as %z and %Z
aren't required to be able to produce meaningful values at all.   Note that
the struct tm that is passed to strftime() doesn't need to have come from
localtime() or from gmtime() (nor any of their siblings with additional args
for the various reasons those exist).   It can be created entirely by

	tm.tm_sec=23; tm.tm_min=19; tm.tm_mday=11; tm.tm_mon=2; tm.tm_year=52;

(though one would probably write the "52" as 2022-1970 to make it more obvious)

plus
	tm.tm_isdst = -1;

That is a nominally complete specification of a wallclock time & date.

It should be possible (and in fact is) to take just that info, and use
strftime upon it (including %s).   strftime() requires only the fields
that are to be used by the format to be initialised, so if you want to
know what the French call Monday, you can set the LC_TIME locale to one
that gives French (along with any others that turn out to be needed,
perhaps just set LANG instead), and do
	tm_wday=1; strftime(buf, sizeof buf, "%A", &tm);
Nothing more is required.

To make a time_t from a struct tm we need to make some assumption
about the zone, and as struct tm (normally) represents local time (somewhere),
rather than UTC, that is the decision that was made when the %s interface
was added.   Simply call mktime() on the tm, and use its answer.
mktime() is the inverse of localtime(), so in effect, to use %s, the
struct tm must represent a time in the current local timezone.

One could have defines strftime() to assume the struct tm is in UTC (or GMT,
as if it came from gmtime()) if one had wanted, but I suspect that most
people would find that to not be what they want.

There has to be some zone applied, the only two that always (in some form or
other) exist are local time and UTC (though sometimes getting one of those
can not be as easy as we'd like) and so the designers of the interface
picked local time as the time reference (when it matters).

Do note that this interface (%s) was added to strftime() quite a long time
ago now (it is new to POSIX, but not elsewhere) and was added at a time when
many implementations of struct tm did not include the zone information, so
it really never had a chance to depend upon that (which isn't easy anyway,
as even today, the zone info in struct tm (ie: tm_gmtoff and tm_zone) are not
sufficient to actually locate the appropriate timezone.  tm_gmtoff ought to
be enough to just create the time_t, in a simple case, but I believe it
isn't always - particularly when out of range values in the struct_tm are
permitted).

I suspect that you're going to say that now posix are adding tm_gmtoff,
they should have changed the definition of what %s produces to take that
into account - but doing so would make almost all of the implementations
of %s non-compliant.  That they are not going to do.  Making the output
of %s be implementation defined or unspecified would be the same as the
situation we're in now (where %s isn't in the current standard at all)
and so, what it produces if used is simply unspecified.   It might be
possible to make it better than "anything" by having it say "converted
to the string representation of an unknown integer", but that doesn't
really help does it?   For anyone to be able to reliably use %s, it needs
a proper definition, and that definition needs to agree with the 
implementations, or the whole thing is useless.   %s is far too useful
in scripts ( via T=$(date +%s) ) to keep it a hidden buried undefined
interface.   Any definition is better than that.


  | No, there is no obligation to standardize incorrect behavior.

Of course not.  But there is nothing incorrect here.  It is simply not
the interface you might have liked to see.   Kind of surprising that you're
only complaining about it now, since it has been like this in NetBSD since
at least 1997 (when strftime.c was moved from wherever it was before into
src/lib/libc/time/ - I didn't bother to track down its previous location)

You've had 25 years to complain about it, and until now, never have.  It
cannot really have been much of an issue.

  | When existing implementations have incorrect or variant behavior they
  | don't want to bother to fix, the correct response by the standards
  | committee is to make the result unspecified, or implementation-defined,
  | or put it in some other similar category.

Sure, and that is what happens.   But "not what David Holland expects" is
not the definition of incorrect.   There is essentially no variant behaviour
here, certainly not enough to (effectively) discard the feature entirely.

  | For my part, I had never had any idea you couldn't use strftime with
  | gmtime() results,

You can, mostly.   In general strftime() is simply an appropriate
printf format applied to fields from the struct tm by the format
string.   For most things that works.   For %z and %Z things get
messier (along with some of the other invented/calculated data, such
as "week of the year") - for those the struct tm must represent an
actual time that can have occurred, and so out of range values need
to be handled.   They could be considered as erroneous (and to an
extent are if you ask to print the name of month 23 - not an actual
error return, but just a "?" for the name - at least in our implementation)
but in general are not - when required, things get normalised instead,
and doing that requires a zone, and we're back where we started.

  |  > If you want something different, implement that, and then get the rest
  |  > of the implementations to actually adopt it - and then propose it to
  |  > POSIX, you never know, it might appear in the standard following the
  |  > next one (in another 5-10 years).
  |
  | Standardizing what is clearly _wrong_ behavior precludes fixing it.

For this, which is not wrong at all, just not what you want, it wouldn't
matter even if it were wrong.   It's entrenched.   Kind of like having
O_RDONLY==0 O_WRONLY==1 and O_RDWR==2 (and the value 3 being the error case)
where any sane definition would have O_RDONLY==1 O_WRONLY==2 and
O_RWDR==(O_RDONLY|O_WRTONLY) (with the 0 value either being the error
case, or possibly one, or both, or what is currently O_DIRECTORY or O_EXEC,
which is more or less "open, but neither of read() or write() work").

A saner interface would have been better, but far too late to change it now.
It was too late to change, even in the mid 1970's.   Once used, interfaces
like this, are almost impossible to alter.   %s in strftime() is like that.

The something different that you might like to invent would need to be a
different strftime() conversion, or perhaps a completely new function,
with different args.

  |  > I will look into the timezone stuff in strftime(),
  |  > It would help if you shared the "./test" program you used
  | Nothing surprising:

Wasn't expecting surprises, but small details could make a
difference in how things work here.

  | In the local environment here, $TZ isn't set explicitly by default,

Not for me either.

  | and /etc/localtime points to US/Eastern.

Mine isn't that, but while that would affect the actual timezone related
values, it shouldn't affect the variations.   (But I can use that for
testing).   (Aside: US/Eastern is an ancient name, which is unlikely to
go away, but could.   The canonical name is America/New_York).

I'm also assuming that you have no (relevant) locale vars set, so
everything is in the POSIX (aka C) locale  -- I normally have
LC_TIME=en_GB.UTF-8 which produces a completely different string for
the %c conversion than the default one (posix locale).   I have that
set as I also have LANG=en_AU.UTF-8 but if I allow that one to affect
time output, I get things in the truly absurd 12 hour clock representation
which I cannot personally abide.   The GB time definitions are much saner.

I will work out what is going on with the zones, and if it is a bug (I
suspect it might not be) I will see if it can be fixed.

kre



Home | Main Index | Thread Index | Old Index