Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)

To: lib-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,ekamperi%auth.gr@localhost
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Sun, 14 May 2023 19:15:01 +0000 (UTC)

The following reply was made to PR lib/36528; it has been noted by GNATS.

From: Robert Elz <kre%munnari.OZ.AU@localhost>
To: David Holland <dholland-bugs%netbsd.org@localhost>
Cc: gnats-bugs%netbsd.org@localhost
Subject: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
Date: Mon, 15 May 2023 01:59:50 +0700

     Date:        Sun, 14 May 2023 17:17:40 +0000
     From:        David Holland <dholland-bugs%netbsd.org@localhost>
     Message-ID:  <ZGEXtDniLI8JsFSD%netbsd.org@localhost>

   | I don't see how that's relevant to the example in this PR. The values
   | passed govern all the outputs that are printed.

 You mean they could, but that's not how strptime() is defined to work.

 	The strptime( ) function shall convert the character string pointed
 	to by buf to values which are stored in the tm structure pointed to
 	by tm, using the format specified by format.

 [...]

 	a The day of the week, using the locale's weekday names;
 (that's tm_wday)
 	b The month, using the locale's month names;
 (tm_mon)
 	d The day of the month [01,31];
 (tm_mday)

 etc.   What's telling is this (and a few others like it)

 	g The last 2 digits of the week-based year (see below) as a decimal
 	  number [...]. The effect of this year, if any, on the tm structure
           pointed to by tm is unspecified.

 That is, given that, nothing needs to be done to the tm at all, as it
 has no field for that info (in strftime() that value is computed from
 others - in strptime() the implementation is not required to invert that
 calculation, even if it has the necessary information available).
 %G %U %V %W %z and %Z all have the same qualification (though %z and %Z
 probably need to be fixed now that tm_gmtoff and tm_zone have been added
 to struct tm).

 Note that strptime()'s format parameter isn't required to have any
 conversions in it at all - it could be used to simply match strings
 in a kind of white space weird way.

 	The format is composed of zero or more directives. Each directive is
 	composed of one of the following: one or more white-space bytes;
 	an ordinary character (neither '%' nor a white-space byte); or a
 	conversion specification.

 [...]

 	A conversion specification composed of white-space bytes is executed
 	by scanning input up to the first non-white-space byte (which remains
 	unscanned), or until no more characters can be scanned.

 	A conversion specification that is an ordinary character is executed
 	by scanning the next character from the buffer. If the character
 	scanned from the buffer differs from the one comprising the directive,
 	the directive fails, and the differing and subsequent characters
 	remain unscanned.

 [%n %t processing spec omitted here, not relevant]

 	Any other conversion specification is executed by scanning characters
 	until a character matching the next directive is scanned, or until no
 	more characters can be scanned. These characters, except the one
 	matching the next directive, are then compared to the locale values
 	associated with the conversion specifier. If a match is found,
 	values for the appropriate tm structure members are set to values
 	corresponding to the locale information.

 The plural on "tm structure members" is because some directives (eg: %T,
 which is defined as %H:%M:%S) cause multiple fields to be set.

 That's all it says about what happens to struct tm - nothing at all about
 calculating values for other fields out of what was received for the ones
 that were generated (so while %j, in combination with %C%Y) might convey
 enough information to allow all of the date related fields to be set,
 that isn't required to happen.   And then the text that I quoted previously

 	It is unspecified whether multiple calls to strptime( ) using the
 	same tm structure will update the current contents of the structure
 	or overwrite all contents of the structure.

 That is, an implementation can, if it wants, allow you to write

 	p = strptime(buf, " %j", &tm);
 	p = strptime(p, " %Y", &tm);
 	p = strptime(p, " %b", &tm);
 	p = strptime(p, " %a", &tm);
 	p = strptime(p, " %d", &tm);

 and given buf containing "209 2023 Feb Wed 30" (assuming the POSIX/C locale)

 and might end up setting tm such that tm_yday == 209, tm_year == 123,
 tm_mon == 1, tm_wday == 4, and tm_mday == 30 ... despite there not being
 a 30th of Feb (in any year) and as Feb 28 2023 was a Tue, the 30th if it
 did exist could not be a Weds, and further nothing anytime in Feb or Mar
 is the 209'th day of anyone's year.

 Applications cannot rely upon that working, that way, but an implemantation
 is permitted to make that happen.

 Also note that there is no requirement to init the tm to anything at all
 before calling strptime(), it can be full of trap invoking integers in all
 of its fields (and any random valid, or invalid, pointer in tm_zone).
 All strptime() does (that is, is required to do) is stick values corresponding
 to any conversions it encounters in the format in the matching field of
 the tm struct.   It cannot really do more.

 What would you expect to happen if the above were instead written as

 	p = strptime(buf, " %j %Y %b %a %d", &tm);

 with the same input?   This time we have a single call, and the same
 input, so the struct tm the results really must contain the values
 that the implementation which allowed the multiple calls would have
 stored.

 The other fields of the struct tm (the ones that aren't mentioned
 here, can be set to whatever the implementation likes, or simply
 left as they were on input).

 There's nothing in the spec that says that the result must make sense.
 There's definitely no mention of it calling mktime() on the result
 (that would be absurd, as mktime() requires some fields of the struct
 tm to be filled in, if they're not what happens is unspecified, or
 even perhaps undefined) and as above, the struct tm passed to strptime()
 doesn't need to be init'd first, and the format doesn't need to contain
 any conversions at all, meaning no fields in the struct must be set to
 anything.

 All strptime() was ever really intended to be was an inverse to strftime().
 Given (approximately) the same format string that strftime() used, strptime()
 is intended to fill in the fields of the struct tm that strftime() used
 to format the data.  That's why strptime() has the %g %G ... conversions
 (which aren't defined to do anything specific at all to the struct tm -
 explicitly) and POSIX strptime() (but not the C version) has %s which
 also does nothing (though it doesn't say what should happen to the number)
 as POSIX strftime() has a %s conversion which C does not.

 Just like the discussion about mktime() and strftime() earlier (last year?)
 this might not be what you'd like the strptime() function to do, but it
 is how it is defined, which is based upon historical implementations.

 kre

Prev by Date: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
Next by Thread: Re: lib/36528 (strptime(3) doesn't fill in the 'tm' structure fields correctly)
Indexes:

Home | Main Index | Thread Index | Old Index