NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
The following reply was made to PR lib/36528; it has been noted by GNATS.
From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: David Holland <dholland-bugs%netbsd.org@localhost>
Cc: gnats-bugs%netbsd.org@localhost
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
Date: Mon, 15 May 2023 01:59:50 +0700
Date: Sun, 14 May 2023 17:17:40 +0000
From: David Holland <dholland-bugs%netbsd.org@localhost>
Message-ID: <ZGEXtDniLI8JsFSD%netbsd.org@localhost>
| I don't see how that's relevant to the example in this PR. The values
| passed govern all the outputs that are printed.
You mean they could, but that's not how strptime() is defined to work.
The strptime( ) function shall convert the character string pointed
to by buf to values which are stored in the tm structure pointed to
by tm, using the format specified by format.
[...]
a The day of the week, using the locale's weekday names;
(that's tm_wday)
b The month, using the locale's month names;
(tm_mon)
d The day of the month [01,31];
(tm_mday)
etc. What's telling is this (and a few others like it)
g The last 2 digits of the week-based year (see below) as a decimal
number [...]. The effect of this year, if any, on the tm structure
pointed to by tm is unspecified.
That is, given that, nothing needs to be done to the tm at all, as it
has no field for that info (in strftime() that value is computed from
others - in strptime() the implementation is not required to invert that
calculation, even if it has the necessary information available).
%G %U %V %W %z and %Z all have the same qualification (though %z and %Z
probably need to be fixed now that tm_gmtoff and tm_zone have been added
to struct tm).
Note that strptime()'s format parameter isn't required to have any
conversions in it at all - it could be used to simply match strings
in a kind of white space weird way.
The format is composed of zero or more directives. Each directive is
composed of one of the following: one or more white-space bytes;
an ordinary character (neither '%' nor a white-space byte); or a
conversion specification.
[...]
A conversion specification composed of white-space bytes is executed
by scanning input up to the first non-white-space byte (which remains
unscanned), or until no more characters can be scanned.
A conversion specification that is an ordinary character is executed
by scanning the next character from the buffer. If the character
scanned from the buffer differs from the one comprising the directive,
the directive fails, and the differing and subsequent characters
remain unscanned.
[%n %t processing spec omitted here, not relevant]
Any other conversion specification is executed by scanning characters
until a character matching the next directive is scanned, or until no
more characters can be scanned. These characters, except the one
matching the next directive, are then compared to the locale values
associated with the conversion specifier. If a match is found,
values for the appropriate tm structure members are set to values
corresponding to the locale information.
The plural on "tm structure members" is because some directives (eg: %T,
which is defined as %H:%M:%S) cause multiple fields to be set.
That's all it says about what happens to struct tm - nothing at all about
calculating values for other fields out of what was received for the ones
that were generated (so while %j, in combination with %C%Y) might convey
enough information to allow all of the date related fields to be set,
that isn't required to happen. And then the text that I quoted previously
It is unspecified whether multiple calls to strptime( ) using the
same tm structure will update the current contents of the structure
or overwrite all contents of the structure.
That is, an implementation can, if it wants, allow you to write
p = strptime(buf, " %j", &tm);
p = strptime(p, " %Y", &tm);
p = strptime(p, " %b", &tm);
p = strptime(p, " %a", &tm);
p = strptime(p, " %d", &tm);
and given buf containing "209 2023 Feb Wed 30" (assuming the POSIX/C locale)
and might end up setting tm such that tm_yday == 209, tm_year == 123,
tm_mon == 1, tm_wday == 4, and tm_mday == 30 ... despite there not being
a 30th of Feb (in any year) and as Feb 28 2023 was a Tue, the 30th if it
did exist could not be a Weds, and further nothing anytime in Feb or Mar
is the 209'th day of anyone's year.
Applications cannot rely upon that working, that way, but an implemantation
is permitted to make that happen.
Also note that there is no requirement to init the tm to anything at all
before calling strptime(), it can be full of trap invoking integers in all
of its fields (and any random valid, or invalid, pointer in tm_zone).
All strptime() does (that is, is required to do) is stick values corresponding
to any conversions it encounters in the format in the matching field of
the tm struct. It cannot really do more.
What would you expect to happen if the above were instead written as
p = strptime(buf, " %j %Y %b %a %d", &tm);
with the same input? This time we have a single call, and the same
input, so the struct tm the results really must contain the values
that the implementation which allowed the multiple calls would have
stored.
The other fields of the struct tm (the ones that aren't mentioned
here, can be set to whatever the implementation likes, or simply
left as they were on input).
There's nothing in the spec that says that the result must make sense.
There's definitely no mention of it calling mktime() on the result
(that would be absurd, as mktime() requires some fields of the struct
tm to be filled in, if they're not what happens is unspecified, or
even perhaps undefined) and as above, the struct tm passed to strptime()
doesn't need to be init'd first, and the format doesn't need to contain
any conversions at all, meaning no fields in the struct must be set to
anything.
All strptime() was ever really intended to be was an inverse to strftime().
Given (approximately) the same format string that strftime() used, strptime()
is intended to fill in the fields of the struct tm that strftime() used
to format the data. That's why strptime() has the %g %G ... conversions
(which aren't defined to do anything specific at all to the struct tm -
explicitly) and POSIX strptime() (but not the C version) has %s which
also does nothing (though it doesn't say what should happen to the number)
as POSIX strftime() has a %s conversion which C does not.
Just like the discussion about mktime() and strftime() earlier (last year?)
this might not be what you'd like the strptime() function to do, but it
is how it is defined, which is based upon historical implementations.
kre
Home |
Main Index |
Thread Index |
Old Index