On Wed, Mar 19, 2025 at 07:48:59PM +0100, Alejandro Colomar wrote: > On Wed, Mar 19, 2025 at 04:26:11PM +0100, Alejandro Colomar wrote: > > Hi Bruno, > > > > On Wed, Mar 19, 2025 at 01:15:30AM +0100, Bruno Haible wrote: > > > Alejandro Colomar wrote: > > > > > It would be useful to show how a success test looks like, after > > > > > strtoi (s, &end, base, min, max, &status) > > > > > for each of the four frequent use-cases: > > > > > -a. expect to parse the initial portion of the string, no coercion, > > > > > -b. expect to parse the initial portion of the string, silent coercion, > > > > > -c. expect to parse the entire string, no coercion, > > > > > -d. expect to parse the entire string, silent coercion. > > > > > > > > > > AFAICS, the success tests are: > > > > > -a. status == 0 || status == ENOTSUP > > > > > > > > Correct. > > > > > > > > > -b. status == 0 || status == ENOTSUP || status == ERANGE > > > > > > > > Correct (but most likely a bug). > > > > Actually, now I remember that status can be NULL, in which case it's not > > reported. This is a case where you could check for errors with a > > simpler expression: > > > > end != str > > > > but (status == 0 || status == ENOTSUP || status == ERANGE) is still a > > reasnoable one. > > > > like you can do with strtol(3), but with portability guarantess > > regarding EINVAL, because strtoi(3bsd) always writes *endp (if nonnull). > > > > I need to update the specification to mention that status can be NULL. > > > > > > > > > > > -c. status == 0 > > > > > > > > Correct. > > > > > > > > > -d. status == 0 || (status == ERANGE && end > s && *end == '\0') > > > > > > > > You don't need end>s, because that would preclude ERANGE. > > > > > > > > status == 0 || (status == ERANGE && end == '\0') > > > > > > > > Aaand, most likely a bug. > > > > > > Cases b. and d. are not bugs. Often, the programmer knows that treating > > > a value > ULONG_MAX is equivalent to treating the value ULONG_MAX. These > > > are *normal* uses of strto[u]l[l]. Often it is the programmer's intent > > > that the values "4294967297" and "4294967295" produce the same behaviour > > > (the same error message, for example). > > > > If you want ULONG_MAX + 1 to be treated like ULONG_MAX, and both > > result in an error, then you should probably clamp at ULONG_MAX - 1, > > and consider anything above an error. > > > > > It is for these cases that your specification contains the clamping / > > > coercion behaviour. > > > > > > Now, when you look at the table of success tests: > > > > > > -a. status == 0 || status == ENOTSUP > > > -b. status == 0 || status == ENOTSUP || status == ERANGE > > > -c. status == 0 > > > -d. status == 0 || (status == ERANGE && *end == '\0') > > > > > > it is immediately clear that the status return convention is ill-designed, > > > because the returned 'status' is not the only thing a programmer has to test > > > after calling the function. > > > > > > > Cases b and d are not real, IMO. I have never seen code where that is > > > > wanted, AFAIR, and I analyzed the entire Debian and NetBSD code bases > > > > looking precisely for that usage. > > > > > > I disagree. > > > > I didn't find any occurence of 'd' in calls to strtoi(3)/strtou(3). > > I didn't analyze calls to strtol(3) et al. > > > > > Any use of strtoul that does not test errno wants overflow > > > to be mapped to ULONG_MAX, that is, is in case b. or d. > > > Just looking in gnulib and gettext, I find already 6 occurrences: > > > gnulib/lib/getaddrinfo.c:299 > > > > lib/getaddrinfo.c-297- if (!(*servname >= '0' && *servname <= '9')) > > lib/getaddrinfo.c-298- return EAI_NONAME; > > lib/getaddrinfo.c:299: port = strtoul (servname, &c, 10); > > lib/getaddrinfo.c-300- if (*c || port > 0xffff) > > lib/getaddrinfo.c-301- return EAI_NONAME; > > lib/getaddrinfo.c-302- port = htons (port); > > > > You could remove the preceding conditional if you don't want to avoid > > leading whitespace. You could merge that into the strtou(3) call, which > > would report ECANCELED for non-numeric input). Except that a negative > > number is silently converted to a positive large value. This is why I > > use a wrapper function strtou_noneg() that rejects negative numbers. > > > > You could rewrite it as: > > > > port = strtou_noneg(servname, NULL, 10, 0, UINT16_MAX, &status); > > if (status != 0) > > return EAI_NONAME; > > port = htons(port); > > > > where strtou_noneg() is: > > > > uintmax_t > > strtou_noneg(const char *s, char **restrict endp, int base, > > uintmax_t min, uintmax_t max, int *restrict status) > > { > > int st; > > > > if (status == NULL) > > status = &st; > > if (strtoi(s, endp, base, 0, 1, status) == 0 && *status == ERANGE) > > return min; > > > > return strtou(s, endp, base, min, max, status); > > } > > > > I think this is not one case where you want silent saturation. You're > > indeed doing range checks [0, UINT16_MAX]. > > > > > gnulib/lib/nproc.c:402 > > > > lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP. > > lib/nproc.c-384- Return 0 for invalid values. */ > > lib/nproc.c-385-static unsigned long int > > lib/nproc.c:386:parse_omp_threads (char const* threads) > > lib/nproc.c-387-{ > > > > ... > > > > lib/nproc.c-398- /* Convert it from positive decimal to 'unsigned long'. */ > > lib/nproc.c-399- if (c_isdigit (*threads)) > > lib/nproc.c-400- { > > lib/nproc.c-401- char *endptr = NULL; > > lib/nproc.c:402: unsigned long int value = strtoul (threads, &endptr, 10); > > lib/nproc.c-403- > > lib/nproc.c-404- if (endptr != NULL) > > lib/nproc.c-405- { > > lib/nproc.c-406- while (*endptr != '\0' && c_isspace (*endptr)) > > lib/nproc.c-407- endptr++; > > lib/nproc.c-408- if (*endptr == '\0') > > lib/nproc.c-409- return value; > > lib/nproc.c-410- /* Also accept the first value in a nesting level, > > lib/nproc.c-411- since we can't determine the nesting level from env vars. */ > > lib/nproc.c-412- else if (*endptr == ',') > > lib/nproc.c-413- return value; > > lib/nproc.c-414- } > > lib/nproc.c-415- } > > > > First of all, the endptr!=NULL test seems misplaced. The only way that > > could be true is if the base is unsupported, and 10 is necessarily > > s/true/equal/ > > > supported. You should remove the initialization '= NULL', and the > > check, since both are dead code, IIRC. That's one of the things you > > don't need to care with strtoi(3), because it _always_ sets *endp. > > > > And you could probably remove the isdigit test by calling > > strtou_noneg(). > > > > This could be something like this (fixing the bugs reported above): > > > > char *end; > > u_long value; > > > > value = strtou_noneg(threads, &end, 10, 0, ULONG_MAX, NULL); > > if (end != threads) { > > end += strspn(end, " \t\n"); > > if (streq(end, "") > > return value; > > if (strprefix(end, ",")) > > return value; > > } > > > > > > This is one case where you seem to silently ignore saturation. Why > > don't you have any diagnostic message? > > > > > gnulib/lib/omp-init.c:48 > > > > lib/omp-init.c-47- char *endptr = NULL; > > lib/omp-init.c:48: unsigned long int value = strtoul (threads, &endptr, 10); > > lib/omp-init.c-49- > > lib/omp-init.c-50- if (endptr != NULL) > > lib/omp-init.c-51- { > > lib/omp-init.c-52- while (*endptr != '\0' && c_isspace (*endptr)) > > lib/omp-init.c-53- endptr++; > > lib/omp-init.c-54- if (*endptr == '\0') > > lib/omp-init.c-55- return value; > > lib/omp-init.c-56- /* Also accept the first value in a nesting level, > > lib/omp-init.c-57- since we can't determine the nesting level from env vars. */ > > lib/omp-init.c-58- else if (*endptr == ',') > > lib/omp-init.c-59- return value; > > lib/omp-init.c-60- } > > > > This seems identical to the previous case. > > > > > gettext/gettext-tools/src/msgfmt.c:287 > > > > gettext-tools/src/msgfmt.c-286- char *endp; > > gettext-tools/src/msgfmt.c:287: size_t new_align = strtoul (optarg, &endp, 0); > > gettext-tools/src/msgfmt.c-288- > > gettext-tools/src/msgfmt.c-289- if (endp != optarg) > > gettext-tools/src/msgfmt.c-290- alignment = new_align; > > > > This code will misbehave badly on platforms where size_t is narrower > > than u_long. Consider the case where you parse a high u_long, let's say > > SIZE_MAX + 1ul. It will be converted to 1. There's a bug due to a > > missing range check (but orthogonal to saturation). > > > > You also don't reject negative numbers, which I expect to be a bug, > > connected with the one from above. > > > > This could be rewritten to (fixing the bugs reported above): > > > > char *end; > > size_t new_align; > > > > new_align = strtou_noneg(optarg, &end, 0, 0, SIZE_MAX, NULL); > > if (optarg != end) > > alignment = new_align; > > > > This seems another case where you silently saturate. Why don't you have > > a diagnostic message for invalid input? > > > > > gettext/gettext-tools/src/msgl-check.c:379 > > > > gettext-tools/src/msgl-check.c-374- while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals)) > > gettext-tools/src/msgl-check.c-375- ++nplurals; > > gettext-tools/src/msgl-check.c-376- endp = nplurals; > > gettext-tools/src/msgl-check.c-377- nplurals_value = 0; > > gettext-tools/src/msgl-check.c-378- if (*nplurals >= '0' && *nplurals <= '9') > > gettext-tools/src/msgl-check.c:379: nplurals_value = strtoul (nplurals, (char **) &endp, 10); > > gettext-tools/src/msgl-check.c-380- if (nplurals == endp) > > gettext-tools/src/msgl-check.c-381- { > > gettext-tools/src/msgl-check.c-382- const char *msg = _("invalid nplurals value"); > > gettext-tools/src/msgl-check.c-383- char *help = plural_help (nullentry); > > gettext-tools/src/msgl-check.c-384- > > gettext-tools/src/msgl-check.c-385- if (help != NULL) > > gettext-tools/src/msgl-check.c-386- { > > gettext-tools/src/msgl-check.c-387- char *msgext = xasprintf ("%s\n%s", msg, help); > > gettext-tools/src/msgl-check.c-388- xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true, > > gettext-tools/src/msgl-check.c-389- msgext); > > gettext-tools/src/msgl-check.c-390- free (msgext); > > gettext-tools/src/msgl-check.c-391- free (help); > > gettext-tools/src/msgl-check.c-392- } > > gettext-tools/src/msgl-check.c-393- else > > gettext-tools/src/msgl-check.c-394- xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false, > > gettext-tools/src/msgl-check.c-395- msg); > > gettext-tools/src/msgl-check.c-396- > > gettext-tools/src/msgl-check.c-397- seen_errors++; > > gettext-tools/src/msgl-check.c-398- } > > > > You could get rid of a lot of code preceding the strtoul(3) call by > > calling strtou_noneg() instead. And strspn(3) would also help. > > > > nplurals += strspn(nplurals, " \t\n"); (Actually, strtou(3) skips leading space, so we don't really need this at all.) > > nplurals_value = strtou_noneg(nplurals, (char **) &end, 10, 0, ULONG_MAX); > > if (nplurals == end) > > ... > > > > On the other hand, I also wonder why you don't diagnose invalid input. > > Why is -10 an "invalid nplurals value", but ULONG_MAX+10 is a valid > > (albeit clamped) one? > > > > All of these cases look like missing error handling, IMO. > > > > > gettext/gettext-tools/src/read-stringtable.c:561 > > > > gettext-tools/src/read-stringtable.c-553- { > > gettext-tools/src/read-stringtable.c-554- char *last_colon; > > gettext-tools/src/read-stringtable.c-555- unsigned long number; > > gettext-tools/src/read-stringtable.c-556- char *endp; > > gettext-tools/src/read-stringtable.c-557- > > gettext-tools/src/read-stringtable.c-558- if (strlen (line) >= 6 && memcmp (line, "File: ", 6) == 0 > > gettext-tools/src/read-stringtable.c-559- && (last_colon = strrchr (line + 6, ':')) != NULL > > gettext-tools/src/read-stringtable.c-560- && *(last_colon + 1) != '\0' > > gettext-tools/src/read-stringtable.c:561: && (number = strtoul (last_colon + 1, &endp, 10), *endp == '\0')) > > gettext-tools/src/read-stringtable.c-562- { > > gettext-tools/src/read-stringtable.c-563- /* A "File: <filename>:<number>" type comment. */ > > gettext-tools/src/read-stringtable.c-564- *last_colon = '\0'; > > gettext-tools/src/read-stringtable.c-565- catalog_reader_seen_comment_filepos (catr, line + 6, number); > > gettext-tools/src/read-stringtable.c-566- } > > gettext-tools/src/read-stringtable.c-567- else > > gettext-tools/src/read-stringtable.c-568- catalog_reader_seen_comment (catr, line); > > gettext-tools/src/read-stringtable.c-569- } > > > > Let me try to rewrite it for readability first. > > > > char *filepos, *last_colon; > > u_long n; > > const char *numstr, *end; > > > > filepos = strprefix(line, "File: ") ?: (char []){""}; > > last_colon = strrchr(filepos, ':') ?: (char []){":"}; > > numstr = strprefix(last_colon, ":"); > > > > n = strtoul(numstr, (char **) &end, 10); > > if (numstr != end && streq(end, "")) { > > /* A "File: <filename>:<number>" type comment. */ > > strcpy(last_colon, ""); > > catalog_reader_seen_comment_filepos(catr, filepos, n); > > > > } else { > > catalog_reader_seen_comment(catr, line); > > } > > > > You're forgetting about negative numbers? Or are you certain that they > > can't happen? How about huge values? Assuming you want compatible code > > calling strtou(): > > > > n = strtou(numstr, (char **) &end, 10, 0, ULONG_MAX, NULL); > > if (status == 0 || status == ENOTSUP || status == ERANGE) { > > ... > > } else { > > ... > > } > > > > But again, I wonder why you don't do range checks. > > > > > > > I would therefore propose to change the status value to a bit mask, so that > > > > > the error conditions "The converted value was out of range and has been > > > > > coerced" and "The given string contains characters that did not get converted" > > > > > can be both returned together, without conflicting. > > > > > > > > Because it is theoretical conditions that a real program never wants, > > > > let's not do that. > > > > > > If you don't want to do that, I can only repeat what I said in the previous > > > mail: The proposal *does not achieve the goal* of avoiding the most common > > > programmer mistakes. For a robust API, the success test should *only* involve > > > testing the returned 'status', nothing else. > > > > Let's discuss this after your responses to the above. > > > > > > > > Bruno > > > > > > > > > > > > > > > > -- > > <https://www.alejandro-colomar.es/> > > > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature