Hi Joseph, Thanks for the feedback! On Tue, Mar 18, 2025 at 05:20:19PM +0000, Joseph Myers wrote: > On Tue, 18 Mar 2025, Alejandro Colomar wrote: > > > 7.24.2 Numeric conversion functions > > New section _before_ 7.24.2.2 (The atof function). > > You're missing corresponding <wchar.h> functions. As with other proposals, I prefer leaving it for a different paper. I'm not an expert in wchar stuff. > Maybe there should also be a reference to N3183 (discussed in Strasbourg) > - which dealt with UB for numeric conversions in scanf rather than strto*, > but still seems related to this proposal. I have something in mind about it. My idea was to change the definition of atoi(3) et al. to be in terms of strtoi(3): int atoi(const char *s) { int n, e; n = strtoi(s, NULL, 10, INT_MIN, INT_MAX, &e); errno = e ?: errno; return n; } Which would make atoi(3) behave just like one would expect. And then define scanf(3) %d in terms of atoi(3). I'll add a 'Future directions' section mentioning that. > > While all this section is new, some text is pasted verbatim from > > 7.24.2.8. I'll write that text as if it was already existing > > in the diff below. > > > > I also renamed the parameters of strtol(3): > > nptr => s Because it's a string, not a pointer to a number. > > endptr => endp It's shorter and just as readable (if not more). > > > > @@ > > +7.24.2.* The <b>strtoi</b> and <b>strtou</b> functions > > + > > +Synopsis > > +1 #include <stdlib.h> > > + intmax_t strtoi(const char *restrict s, char **restrict endp, int base, > > + intmax_t min, intmax_t max, int *rstatus); > > + uintmax_t strtou(const char *restrict s, char **restrict endp, int base, > > + uintmax_t min, uintmax_t max, int *rstatus); > > intmax_t and uintmax_t are not declared in <stdlib.h>. Either the > synopsis should mention <stdint.h> as well, or those types should be added > to the ones declared by that header. Hmmm, my bad. This function is from <inttypes.h>. I should move it. > I'm also concerned that the names sound like int / unsigned int analogues > of strtol, but aren't. I don't get to choose the name. Anyway, my plans are to erradicate strtol(3) from history, eventually. I'm not especially concerned because the number and type of arguments is significantly different that mistakes are unlikely to happen; and I also don't have a better name for it. > > +Description > > +2 The <b>strtoi</b> and <b>strtou</b> functions > > convert the initial portion of > > the string pointed to by <tt>s</tt> > > + to <b>intmax_t</b> and <b>uintmax_t</b>, > > respectively. > > First, > > they decompose the input string into three parts: > > an initial, possibly empty, sequence of white-space characters, > > a subject sequence resembling an integer > > represented in some radix determined by the value of <tt>base</tt>, > > and a final string of one or more unrecognized characters, > > including the terminating null character of the input string. > > + Then, > > they attempt to convert the subject sequence to an integer. > > + Then, > > + they coerce the integer into the range [min, max]. > > + Finally, > > they return the result. > > > > Paste p3, p4, p5, p6 from 7.24.2.8, replacing the function and > > type names as appropriate. > > So the conversion is still locale-specific (p6). One thing that can be > useful for numeric conversions, and isn't covered well by the standard at > present, is ones that are guaranteed to be in the C locale. (That would > require a flags argument or similar to configure the functions.) NetBSD has strtoi_l(3), which has an extra parameter in which you can specify the locale. That should have a dedicated paper, though, just like the wchar variant. <https://man.netbsd.org/strtoi_l.3> I'll add this to 'Future directions'. > > @@ > > +7 If the value of <tt>base</tt> is different from > > + the values specified in the preceding paragraphs, > > + the behavior is implementation defined. > > It's "implementation-defined", with a hyphen. True. > And for that to be useful, > you need clear bounds on what is permitted (that is, an > implementation-defined set of sequences is accepted, and interpreted as > having implementation-defined numeric values). The choices should be: - Report an error. - Convert in an implementation-defined manner. > > @@ > > Returns > > +10 The <b>strtoi</b> and <b>strtou</b> functions > > return the converted and coerced value, if any. > > If no conversion could be performed, > > + zero is coerced into the range, > > + and then returned. > > > > The paragraph above doesn't mention the range of representable > > values (unlike 7.24.2.8) because that's already covered by the > > range coercion specified in p2 above. > > You don't seem to define how the coercion works. Modulo? Saturation? > Something else? ("Coerce" is not a term defined in the C standard, nor in > ISO 2382. So it has no semantics without them being explicitly defined > for these functions.) I have some wording in p2, but I should improve it. It is saturation. > What happens if min > max? You say below that there is an ERANGE error > for this case, but don't say what the return value is when it can't be in > the range. I don't have much to say. To be honest, when implementing it I just left it to chance. I do return MAX(min, MIN(max, n)); NetBSD has a slightly different algorithm which may or may not return the same value. We should say it returns an unspecified value. > > +Returns > > +10 The <b>strtoi</b> and <b>strtou</b> functions > > + return the converted value, if any. > > + If no conversion is returned, > > + these functions return the value in the range [min, max] > > + that is closer to 0. > > What if both are equally close to 0? "both" refers to min or max, but the paragraph specifies the entire range. Assuming that min<=max, - if 0<min, then min is the closest value - if min<0<max, then 0 is the closest value - if max<0, then max is the closest value. And if min>max, then it would be covered by the suggestion above of saying it returns an unspecified value. However, this duplication of p10 was an accident. I first wrote the second one, then the first one but forgot to remove the second one. I like the wording of the first better (with some tweaks I'll do). > > +Errors > > +11 These functions don't set <b>errno</b>. > > The standard does not use the abbreviation "don't", but says "do not". Ok. > > + Instead, they set the object pointed to by <tt>rstatus</tt> > > + to an error code, > > + or to zero on success. > > + > > +12 -- EINVAL The value in <tt>base</tt> is not supported. > > + -- ECANCELED The given string did not contain > > + any characters that were converted. > > + -- ERANGE The converted value was out of range > > + and has been coerced, > > + or the range was invalid (e.g., min > max). > > + -- ENOTSUP The given string contained characters > > + that did not get converted. > > Of these names, only ERANGE is actually defined in the C standard. You > don't have any updates to <errno.h> to add the others. Ok. > These functions would clearly also need several examples added to the > standard to illustrate their functionality, which are missing from this > proposal. Ok. I'll post r1 soon. Have a lovely night! Alex -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature