tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: strftime(3) oddities with %s, %z



On Mon, Oct 31, 2022 at 07:25:34PM +0700, Robert Elz wrote:
 >     Date:        Sun, 30 Oct 2022 20:27:43 +0000
 >     From:        David Holland <dholland-tech%netbsd.org@localhost>
 >     Message-ID:  <Y17ePwxjzN6Ob3tS%netbsd.org@localhost>
 > 
 >   | "%s    is replaced by the number of seconds since the Epoch"
 > 
 > Yes, though as has been mentioned before, that's incomplete (no matter
 > what you think should happen here, that isn't enough info to conclude
 > what that would be).

Yes and no. The number of seconds since the epoch is only one value,
regardless of timezone. The question is what the interpretation of the
timezone in struct tm is, and the fact remains that the principle of
least surprise dictates that a struct tm produced by gmtime() refers
to UTC.

Furthermore, the principle of least surprise also fairly strongly
suggests that if in this case strftime prints an hour and zone name
that are inconsistent (meaning that the timezone informs the printing
of the hour but not the printing of the timezone itself), it is
wrong.

 >   | Changing it just in this particular corner because of an
 >   | implementation defect is particularly silly.
 > 
 > No-one is changing it here either.

Yes, they are. They are proposing that %s print a nonsense value that
is not the number of seconds since the epoch.

For that matter, since they're apparently also adding the timezone
info to struct tm, you can prepare a struct tm with the timezone info
you intended and you will then still get the wrong value out from %s.

 > Further, if there is an implementation defect, it would require
 > intrusive changes to correct it.

Not necessarily.

 > That's because you have forgotten one vital issue in all of this.

No, I haven't. I am aware of what the implementation issue is, which
is that apparently in some implementations struct tm lacks timezone
info so it's impossible for it to work correctly.

However, as I've already said, there's a well-established path for
standardization in the presence of implementation issues that someone
doesn't want to have to fix.

Standardizing _wrong_ behavior so it's impossible to produce an
implementation that is both correct and standards-compliant is
extremely foolish.

 > Making the output
 > of %s be implementation defined or unspecified would be the same as the
 > situation we're in now (where %s isn't in the current standard at all)
 > and so, what it produces if used is simply unspecified.   It might be
 > possible to make it better than "anything" by having it say "converted
 > to the string representation of an unknown integer", but that doesn't
 > really help does it?   For anyone to be able to reliably use %s, it needs
 > a proper definition, and that definition needs to agree with the 
 > implementations, or the whole thing is useless.   %s is far too useful
 > in scripts ( via T=$(date +%s) ) to keep it a hidden buried undefined
 > interface.   Any definition is better than that.

Nonsense.

"%s is replaced with the number of seconds since the Epoch. If the
time in the struct tm argument is not expressed in the current local
time zone, such as returned by the localtime() function, the value
output is unspecified."

That makes everybody's code is compliant and leaves people with
non-broken implementations free to make it work correctly. And then we
can make it work correctly in NetBSD, and add something to the man
page like "Portable software should call tzset(3) before printing
values in other timezones." And then maybe in ten years after some
other people fix their code and others drop off the market, the
unspecified behavior can be removed from the standard.

 >   | No, there is no obligation to standardize incorrect behavior.
 > 
 > Of course not.  But there is nothing incorrect here.  It is simply not
 > the interface you might have liked to see.   Kind of surprising that you're
 > only complaining about it now, since it has been like this in NetBSD since
 > at least 1997 (when strftime.c was moved from wherever it was before into
 > src/lib/libc/time/ - I didn't bother to track down its previous location)
 > 
 > You've had 25 years to complain about it, and until now, never have.  It
 > cannot really have been much of an issue.

Rubbish. In order to notice that it doesn't work you have to either
audit the code very carefully (since to a casual inspection it looked
like it was doing the right thing, only it doesn't actually work) or
have the bad luck to explicitly stumble on the particular stftime
formats that don't work. Since most of the ones likely to be used all
function correctly, and in particular the ones you _will_ use if you
call gmtime() and then strftime() and expect to print UTC do, this is
highly unlikely.

(That is: if you mean to print UTC and call gmtime and then strftime,
why would you ever include the zone name or zone offset in the output
format? You wouldn't.)

 > Sure, and that is what happens.   But "not what David Holland expects" is
 > not the definition of incorrect.   There is essentially no variant behaviour
 > here, certainly not enough to (effectively) discard the feature entirely.

"Inconsistent with other parts of the strftime output" is incorrect.
"Inconsistent with expectations about time_t" is also incorrect.

Producing an unexpected result because of structural issues internal
to the implementation that are invisible in the interface -- that is
practically the definition of a bug.

 > For this, which is not wrong at all, just not what you want, it wouldn't
 > matter even if it were wrong.   It's entrenched.

It is in no way entrenched, or we would have been having this debate
25 years ago; we've had %s in strftime(3) at least that long, so it's
taken 25 years for anyone to notice that there's a discrepancy here.
Therefore it's by no means too late to change it.

 >   | and /etc/localtime points to US/Eastern.
 > 
 > Mine isn't that, but while that would affect the actual timezone related
 > values, it shouldn't affect the variations.   (But I can use that for
 > testing).   (Aside: US/Eastern is an ancient name, which is unlikely to
 > go away, but could.   The canonical name is America/New_York).

I'm aware of that. I'm not sure why on three different machines I have
variously (a) US/Eastern, (b) America/New_York, and (c) EST5EDT. All
three machines were set up long ago. Realistically, none of these
names are ever actually going to be removed.

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index