Hi all, Below is a draft of a proposal for standardization of strtoi/u(3) from NetBSD in ISO C2y. Please review. I've CCed everyone who was CCd in the discussions earlier this year about these APIs and alternate APIs. Please add anyone who is interested, or say if you want to be removed. I've kept several mailing lists in CC, since some of them are private. The <liba2i%lists.linux.dev@localhost> is public, and its archives can be found here: <https://lore.kernel.org/liba2i/>. Have a lovely day! Alex --- Name alx-0008r0 - Standardize strtoi(3) and strtou(3) from NetBSD Principles - Codify existing practice to address evident deficiencies. - Enable secure programming Category Standardize existing libc APIs Author Alejandro Colomar <alx%kernel.org@localhost> Cc: <liba2i%lists.linux.dev@localhost> Cc: <libbsd%lists.freedesktop.org@localhost> Cc: <sc22wg14%open-std.org@localhost> Cc: <tech-misc%netbsd.org@localhost> Cc: Bruno Haible <bruno%clisp.org@localhost> Cc: christos <christos%netbsd.org@localhost> Cc: Đoàn Trần Công Danh <congdanhqx%gmail.com@localhost> Cc: Paul Eggert <eggert%cs.ucla.edu@localhost> Cc: Eli Schwartz <eschwartz93%gmail.com@localhost> Cc: Guillem Jover <guillem%hadrons.org@localhost> Cc: Iker Pedrosa <ipedrosa%redhat.com@localhost> Cc: Michael Vetter <jubalh%iodoru.org@localhost> Cc: Robert Elz <kre%netbsd.org@localhost> Cc: <riastradh%NetBSD.org@localhost> Cc: Sam James <sam%gentoo.org@localhost> Cc: "Serge E. Hallyn" <serge%hallyn.com@localhost> History <https://www.alejandro-colomar.es/src/alx/alx/wg14/alx-0008.git/> r0 (2025-03-18): - Initial draft. Description The strtol(3) family of functions is do damn hard to use correctly. Only a handful of programmers in the world really know how to use it correctly in all the corner cases, and even those need to be really careful to not make mistakes. Several projects have tried to develop successor APIs, from which the only one that is generic enough to supersede them is strtoi/u(3) from NetBSD. Other APIs include OpenBSD's strtonum(3), but that API isn't generic, and cannot replace every use of strtol(3). gnulib has also some attempts to improve their situation, but they're also not suitable for standardization. strtoi/u(3) had originally a bug, which shows how difficult it is to correctly wrap strto{i,u}max(3) (from the strtol(3) family). That bug has been fixed, and after two years of research into string-to-numeric APIs, I can conclude that it is a net improvement over the existing APIs, and doesn't have any significant flaws. It is still not the ideal API in terms of type safety, and I'm working on a library that provides safer wrappers. However, such a library would still benefit from having strtoi/u(3) in the standard library, by being able to wrap around it. And user programs would immediately benefit from being able to replace strtol(3) et al. by strtoi/u(3). I have audited several projects which use strtol(3) et al., and they're full of bugs. It's an API that we should really deprecate some day. Prior art NetBSD provides strto{i,u}(3), which were introduced in NetBSD 7. libbsd ports these APIs to other POSIX systems. shadow-utils has its own implementation for internal use. See also <https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57828> Proposed wording Based on N3467. 7.24.2 Numeric conversion functions New section _before_ 7.24.2.2 (The atof function). While all this section is new, some text is pasted verbatim from 7.24.2.8. I'll write that text as if it was already existing in the diff below. I also renamed the parameters of strtol(3): nptr => s Because it's a string, not a pointer to a number. endptr => endp It's shorter and just as readable (if not more). @@ +7.24.2.* The <b>strtoi</b> and <b>strtou</b> functions + +Synopsis +1 #include <stdlib.h> + intmax_t strtoi(const char *restrict s, char **restrict endp, int base, + intmax_t min, intmax_t max, int *rstatus); + uintmax_t strtou(const char *restrict s, char **restrict endp, int base, + uintmax_t min, uintmax_t max, int *rstatus); + +Description +2 The <b>strtoi</b> and <b>strtou</b> functions convert the initial portion of the string pointed to by <tt>s</tt> + to <b>intmax_t</b> and <b>uintmax_t</b>, respectively. First, they decompose the input string into three parts: an initial, possibly empty, sequence of white-space characters, a subject sequence resembling an integer represented in some radix determined by the value of <tt>base</tt>, and a final string of one or more unrecognized characters, including the terminating null character of the input string. + Then, they attempt to convert the subject sequence to an integer. + Then, + they coerce the integer into the range [min, max]. + Finally, they return the result. Paste p3, p4, p5, p6 from 7.24.2.8, replacing the function and type names as appropriate. @@ +7 If the value of <tt>base</tt> is different from + the values specified in the preceding paragraphs, + the behavior is implementation defined. The above paragraph ensures that this function has no input-controlled UB. strtol(s, NULL, base) with a user-controlled base can result in UB, and thus vulnerabilities. It is trivial to report an error, so let's do it. This function is heavy enough that optimizing this is not worth. Even POSIX does this for strtol(3). @@ 8 If the subject sequence is empty or does not have the expected form, + or the value of <tt>base</tt> is not supported, no conversion is performed; the value of <tt>s</tt> is stored in the object pointer to by <tt>endp</tt>, provided that <tt>endp</tt> is not a null pointer. The above paragraph ensures that *endp can be read after a call to these functions. strtol(3) doesn't provide enough guarantees to be able to reliably read it, even in POSIX, and it's hard to portably write code that calls it and can inspect *endp after the call without UB. @@ Returns +10 The <b>strtoi</b> and <b>strtou</b> functions return the converted and coerced value, if any. If no conversion could be performed, + zero is coerced into the range, + and then returned. The paragraph above doesn't mention the range of representable values (unlike 7.24.2.8) because that's already covered by the range coercion specified in p2 above. +Returns +10 The <b>strtoi</b> and <b>strtou</b> functions + return the converted value, if any. + If no conversion is returned, + these functions return the value in the range [min, max] + that is closer to 0. + +Errors +11 These functions don't set <b>errno</b>. + Instead, they set the object pointed to by <tt>rstatus</tt> + to an error code, + or to zero on success. + +12 -- EINVAL The value in <tt>base</tt> is not supported. + -- ECANCELED The given string did not contain + any characters that were converted. + -- ERANGE The converted value was out of range + and has been coerced, + or the range was invalid (e.g., min > max). + -- ENOTSUP The given string contained characters + that did not get converted. + +13 If various errors happen in the same call, + the first one listed here is reported. The paragraph above is important to differentiate the following: strtoi("7z", &end, 0, 3, 7, &status); strtoi("42z", &end, 0, 3, 7, &status); -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature