tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: strftime(3) oddities with %s, %z



    Date:        Mon, 31 Oct 2022 17:56:43 +0000
    From:        David Holland <dholland-tech%netbsd.org@localhost>
    Message-ID:  <Y2AMW1Dgphx21MwF%netbsd.org@localhost>

  | Yes and no. The number of seconds since the epoch is only one value,
  | regardless of timezone.

Don't be absurd.   The value depends upon the particular time being
considered.   That's the whole issue here.   If a time_t could only
ever hold one value, then we wouldn't need %s, we'd just write 0
as that would be the obvious (only rational) only value that the
number of seconds since the epoch could be pegged to.

I know that is not what you meant, but it is what you said.   Not
being clear about what we say is how we get into these sorts of issues.

And while (when interpreted as I expect you intended) what you said is
kind of right, it is also wrong.   A particular struct tm (especially
one complying with the current standards) doesn't represent a single
time_t regardless of the timezone.   For the world's 30 (or however many
there actually are) different zone offsets, you'll get 30 different
time_t values from that single (unmodified) struct tm.

  | The question is what the interpretation of the timezone in struct tm is,

Yes.

  | and the fact remains that the principle of least surprise dictates
  | that a struct tm produced by gmtime() refers to UTC.

Then prepare to be surprised.   Like it or not, in the 1980's (maybe,
just perhaps, early 1990's) when %s was being added, there was nothing
in a standard struct tm (as used by most systems of the time) which gave
any clue at all about what function created it, or what the local
timezone (if that was used) might have been.

How are you proposing that a portable implementation of strftime()
done today, before the next posix standard is released (maybe next
year, but perhaps not until 2024) would achieve what you're requesting?

How do you tell what produced a struct tm?   Magic?

  | Furthermore, the principle of least surprise also fairly strongly
  | suggests that if in this case strftime prints an hour and zone name
  | that are inconsistent (meaning that the timezone informs the printing
  | of the hour but not the printing of the timezone itself), it is
  | wrong.

I haven't had time to look into what is happening there yet, but I will.

  | Yes, they are. They are proposing that %s print a nonsense value that
  | is not the number of seconds since the epoch.

Rubbish.   Every value can be a number of seconds since the epoch.
There isn't a single answer which is invalid by that criterion.

What matters is whether the result represents the value of the time
in the tm, and it does, when that tm is considered as a local time,
which is how it is supposed to work, whether you approve of that or not.

  | For that matter, since they're apparently also adding the timezone
  | info to struct tm, you can prepare a struct tm with the timezone info
  | you intended and you will then still get the wrong value out from %s.

Is it possible to write buggy code, and make false assumptions?  Yes.  So?

The implementations (and other documentation, just not ours, though ours
also works like this) and what is going into posix, say that %s produces
the time_t that mktime() would produce.   mktime() uses only the tm fields
I mentioned in the example I gave in a previous message, and performs
(well more than, but still) the inverse of localtime().  The zone fields
aren't used.   They really cannot be, as mktime() is an existing
interface, and the zone fields aren't in the existing standard struct tm.

  | No, I haven't. I am aware of what the implementation issue is, which
  | is that apparently in some implementations struct tm lacks timezone
  | info so it's impossible for it to work correctly.

No, it is impossible for them to work the way you'd like it to work.
That's precisely why it doesn't work that way.   The implementations
only implement what is possible (and generally, portable), not fantasy dreams.

  | However, as I've already said, there's a well-established path for
  | standardization in the presence of implementation issues that someone
  | doesn't want to have to fix.

Yes, there is, but that isn't the case here.

  | Standardizing _wrong_ behavior so it's impossible to produce an
  | implementation that is both correct and standards-compliant is
  | extremely foolish.

It would be, but the behaviour isn't wrong, it just differs from your
opinion of what it should be.

I (well sometimes) don't like the rounding that printf %f does.   Should
I file a bug report about how printf(3) is broken - just because it
doesn't round towards 0, when that's appropriate for the data I am printing?

  | "%s is replaced with the number of seconds since the Epoch. If the
  | time in the struct tm argument is not expressed in the current local
  | time zone, such as returned by the localtime() function, the value
  | output is unspecified."

Feel free to submit a POSIX bug requesting a change like that.   I doubt
it will go very far, as the implementations in general are consistent.

Further, that would require people (applications) using implementations
which happen to have the zone (tm_gmtoff in particular) fields to set
them, which isn't currently a requirement, isn't possible to do portably
without conditional compilation, and might break currently conforming
applications.   This is truly unlikely to happen.   You want a different
interface, you create a new one, not break one that exists and does what
it is intended to do, regardless of whether you agree with what that is
or not - particularly not decades after it was created.

  | Rubbish. In order to notice that it doesn't work you have to either
  | audit the code very carefully (since to a casual inspection it looked
  | like it was doing the right thing, only it doesn't actually work) or
  | have the bad luck to explicitly stumble on the particular stftime
  | formats that don't work. Since most of the ones likely to be used all
  | function correctly, and in particular the ones you _will_ use if you
  | call gmtime() and then strftime() and expect to print UTC do, this is
  | highly unlikely.

What you're saying there is that the current definition is just fine for
any practical purpose.

  | (That is: if you mean to print UTC and call gmtime and then strftime,
  | why would you ever include the zone name or zone offset in the output
  | format? You wouldn't.)

Agreed, you wouldn't.   Further, if you have just called localtime() or
gmtime() there's no point using %s either, you know the time_t value,
that's what you started with, you can simply print it.

  | "Inconsistent with other parts of the strftime output" is incorrect.
  | "Inconsistent with expectations about time_t" is also incorrect.

Neither of which is true, or particularly remarkable.  It is trivial
to generate inconsistent strftime output.

Eg: I just ran a slightly modified version of your test program
(one line was added) as...

(unset LANG LC_CTIME; TZ=America/New_York ./tt $(date +%s) )

("tt" is it - I don't like using "test" as a name).

The results were:

gmtime: Sat Oct 31 20:46:33 2022 EST -0500 (1667267193)
localtime: Sat Oct 31 16:46:33 2022 EDT -0400 (1667249193)

The issues that you are complaining about are still there.
But look carefully at the rest of it.

I made another quick mod to your test program, and made it print the
time_t, as mentioned just above -- this is added to the printf() call,
as:
	   printf("%s: %s [from: %jd]\n", desc, buf, (intmax_t)t);

where t is a time_t additional arg to your print() function.

Here's what I got when I ran it again (as above, cut&paste) just now
with that additional modification:

gmtime: Sat Oct 31 21:29:06 2022 EST -0500 (1667269746) [from: 1667251746]
localtime: Sat Oct 31 17:29:06 2022 EDT -0400 (1667251746) [from: 1667251746]

Further there's nothing inconsistent with any expectations about a time_t.
A time_t always counts UTC seconds (ignoring leap seconds) from the
epoch, to some particular time - but can be any particular time.
You're just not interpreting the way the struct tm is used to generate
the %s value the way it is intended to be interpreted (the way it should
be documented to be interpreted).

If you have some bizarre expectation that a struct tm is required to be
internally consistent in some way, then you're in la la land.

  | Producing an unexpected result because of structural issues internal
  | to the implementation

But that's not what it is.   It is the definition of how the interface
works.   It just happens to have been omitted from our man page.   That's
a bug (in strftime.3 not strftime.c)

  | It is in no way entrenched, or we would have been having this debate
  | 25 years ago; we've had %s in strftime(3) at least that long, so it's
  | taken 25 years for anyone to notice that there's a discrepancy here.
  | Therefore it's by no means too late to change it.

Who says it hasn't been noticed in all of that time?   You mean that
you didn't notice it?   I have known that %s used mktime() more or less
forever, and what the implications of that were.   I very much doubt
that I am the only one, particularly if the glibc doc actually says
that, as reported (in general I don't use, or even look at, gnu stuff
if I can avoid it, so I don't know for sure).

Further, the POSIX change request, to add %s to strftime() was filed
in October 2009, a proposed resolution mentions mktime() in a
note added (and not changed since) October 2009.    Long Long before
the zone fields were ever considered for addition.   It has taken this
long, to actually appear (which it still hasn't actually done) as it is
something new, not just a bug in the spec, and there has been no new version
of the standard since 2008, just TCs.

The final wording didn't appear until November 2009 (and had some slight
edits in 2016) but has retained the reference to mktime() throughout.

The request, in bug #191, was for %s to be added to date(1) - the response
was to add it to strftime(3) which is where date(1) gets its %X conversions
from, and is obviously the place to add it, though apparently when the
request was filed, that wasn't made clear in the spec for date - that was
fixed by another bug resolution --- this is all way before my time watching
what happens in POSIX - fascinating to see what happened).

That's actually important, in a way.  The intent is (was) to make the
current time_t (ie: the time now) available to scripts - which is just
as I use it all the time - via date(1).   Actual use of %s by strftime()
calls in C programs I would expect to be exceedingly rare.   The usage
in date (and other programs offering similar interfaces to strftime())
is fine just as it is defined now.

  | I'm aware of that. I'm not sure why on three different machines I have
  | variously (a) US/Eastern, (b) America/New_York, and (c) EST5EDT. All
  | three machines were set up long ago. Realistically, none of these
  | names are ever actually going to be removed.

You're right, they're not.   Backwards compat is important (though I'd look
carefully at (c) to make sure that's actually specifying the summer time
rules that you want.  It probably is, but certainly isn't required to,
that's one of the posix style TZ specifications, just being used as an
Olson type zone name.   It will also be the wrong name to use, even though
it might still work, if the US decides to abandon "daylight saving" which
I believe is a genuine possibility).

That's the same issue as here, backwards compat with > 25 years of
strftime(%s) is also important.   No-one sane is going to change it.

kre



Home | Main Index | Thread Index | Old Index