tech-misc archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD
Alejandro Colomar wrote:
> > 2) nproc.c, omp_init.c
> >
> > > > gnulib/lib/nproc.c:402
> > >
> > > lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP.
> > > lib/nproc.c-384- Return 0 for invalid values. */
> > > lib/nproc.c-385-static unsigned long int
> > > lib/nproc.c:386:parse_omp_threads (char const* threads)
> > > lib/nproc.c-387-{
> > >
> > > ...
> > >
> > > lib/nproc.c-398- /* Convert it from positive decimal to 'unsigned long'. */
> > > lib/nproc.c-399- if (c_isdigit (*threads))
> > > lib/nproc.c-400- {
> > > lib/nproc.c-401- char *endptr = NULL;
> > > lib/nproc.c:402: unsigned long int value = strtoul (threads, &endptr, 10);
> > > lib/nproc.c-403-
> > > lib/nproc.c-404- if (endptr != NULL)
> > > lib/nproc.c-405- {
> > > lib/nproc.c-406- while (*endptr != '\0' && c_isspace (*endptr))
> > > lib/nproc.c-407- endptr++;
> > > lib/nproc.c-408- if (*endptr == '\0')
> > > lib/nproc.c-409- return value;
> > > lib/nproc.c-410- /* Also accept the first value in a nesting level,
> > > lib/nproc.c-411- since we can't determine the nesting level from env vars. */
> > > lib/nproc.c-412- else if (*endptr == ',')
> > > lib/nproc.c-413- return value;
> > > lib/nproc.c-414- }
> > > lib/nproc.c-415- }
> > > ...
> > > This is one case where you seem to silently ignore saturation.
> >
> > This is on purpose. A CPU never has more than 1000 processors. Therefore
> > all values > 1000 should be treated the same way, whether they are
> > <= ULONG_MAX or > ULONG_MAX. Clamping to the number of actually available
> > processors occurs later in the code.
>
> I guess you refer to the MIN() calls within num_processors(), right?
> Why do those clampings not result in diagnostics? Couldn't you
> calculate the limits before parsing the actual number, and so use them
> to perform the range checks during the strtou(3) call?
The essential code is like this:
unsigned long int omp_env_limit = ULONG_MAX;
if (query == NPROC_CURRENT_OVERRIDABLE)
{
omp_env_limit = parse_omp_threads (getenv ("OMP_THREAD_LIMIT"));
if (! omp_env_limit)
omp_env_limit = ULONG_MAX;
...
query = NPROC_CURRENT;
}
unsigned long nprocs = num_processors_ignoring_omp (query);
return MIN (nprocs, omp_env_limit);
and num_processors_ignoring_omp (query) is expensive to compute, since
it makes system calls. For this reason, we do the parsing of environment
variables first; this gives us the opportunity to optimize away the
system calls in particular circumstances.[1]
> > 3) msgfmt.c
> >
> > > > gettext/gettext-tools/src/msgfmt.c:287
> > >
> > > gettext-tools/src/msgfmt.c-286- char *endp;
> > > gettext-tools/src/msgfmt.c:287: size_t new_align = strtoul (optarg, &endp, 0);
> > > gettext-tools/src/msgfmt.c-288-
> > > gettext-tools/src/msgfmt.c-289- if (endp != optarg)
> > > gettext-tools/src/msgfmt.c-290- alignment = new_align;
> > > ...
> > > Why don't you have a diagnostic message for invalid input?
> >
> > Because this option is meant for users who know what an "alignment" is.
>
> But still, you don't need to actively ignore ERANGE, especially when it
> results in more complex code than actually checking. Once the API gives
> you the check for free, you should probably use it.
>
> In this case, I think I'd do
>
> int status;
> size_t new_align;
>
> new_align = strtou(optarg, NULL, 0, 1, 16, &status);
> if (status == 0)
> alignment = new_align;
>
> Don't you think this is more robust and just as simple?
> ...
> strtoul(3) makes it difficult for you to do the check,
> not because you actively want to not check.
Yes and no. Yes, it would be nice to have a range check at essentially
zero cost. No, clamping is not the right behaviour: an alignment of 5 or
17 should be rejected, not mapped to something valid.
> > 4) msgl-check.c
> >
> > > > gettext/gettext-tools/src/msgl-check.c:379
> > >
> > > gettext-tools/src/msgl-check.c-374- while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals))
> > > gettext-tools/src/msgl-check.c-375- ++nplurals;
> > > gettext-tools/src/msgl-check.c-376- endp = nplurals;
> > > gettext-tools/src/msgl-check.c-377- nplurals_value = 0;
> > > gettext-tools/src/msgl-check.c-378- if (*nplurals >= '0' && *nplurals <= '9')
> > > gettext-tools/src/msgl-check.c:379: nplurals_value = strtoul (nplurals, (char **) &endp, 10);
> > > gettext-tools/src/msgl-check.c-380- if (nplurals == endp)
> > > gettext-tools/src/msgl-check.c-381- {
> > > gettext-tools/src/msgl-check.c-382- const char *msg = _("invalid nplurals value");
> > > gettext-tools/src/msgl-check.c-383- char *help = plural_help (nullentry);
> > > gettext-tools/src/msgl-check.c-384-
> > > gettext-tools/src/msgl-check.c-385- if (help != NULL)
> > > gettext-tools/src/msgl-check.c-386- {
> > > gettext-tools/src/msgl-check.c-387- char *msgext = xasprintf ("%s\n%s", msg, help);
> > > gettext-tools/src/msgl-check.c-388- xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true,
> > > gettext-tools/src/msgl-check.c-389- msgext);
> > > gettext-tools/src/msgl-check.c-390- free (msgext);
> > > gettext-tools/src/msgl-check.c-391- free (help);
> > > gettext-tools/src/msgl-check.c-392- }
> > > gettext-tools/src/msgl-check.c-393- else
> > > gettext-tools/src/msgl-check.c-394- xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false,
> > > gettext-tools/src/msgl-check.c-395- msg);
> > > gettext-tools/src/msgl-check.c-396-
> > > gettext-tools/src/msgl-check.c-397- seen_errors++;
> > > gettext-tools/src/msgl-check.c-398- }
> >
> > > On the other hand, I also wonder why you don't diagnose invalid input.
> > > Why is -10 an "invalid nplurals value"
> >
> > "-10" designates a negative number. For the number of plural forms, that's
> > invalid.
> >
> > > but ULONG_MAX+10 is a valid (albeit clamped) one?
> >
> > Again, as mentioned above, values like 1000 and 10000000000000000 should
> > be treated the same way. Since ULONG_MAX+10 gets clamped to ULONG_MAX, that's
> > just fine.
>
> I see in line 433 a check
>
> if (min_nplurals < nplurals_value)
>
> And in line 449 a check
>
> else if (max_nplurals > nplurals_value)
>
> nplurals_value was parsed via strtoul(3) in line 379.
>
> And min_plurals and max_plurals were assigned in lines 293 and lines 298.
> Which makes me wonder: why don't you move the tests from lines 4** into
> the strtou(3) call?
>
> That is, why not call this?
>
> nplurals_value = strtou(nplurals, NULL, 10, min_nplurals, max_nplurals, &status);
>
> And then have the ERANGE handling there?
For two reasons:
- The program's logic is to parse simple things first and complicated things
afterwards. The user is likely to make a mistake in the simple things with
a lower probability than in the complicated things. When there is a
mismatch, the likely culprit is a mistake in the complicated things; the
diagnostics need to be directed at this scenario.
- It feels strange to move complicated validation logic into a library
function.
In summary, while the built-in range checks are welcome for things like
port numbers (0..65536), in this case it's better to keep the program's logic
explicit and situation-aware.
> In cases 2 and 4, I think you'd benefit from refactoring the code to use
> the limits in the strtou(3) call. Especially in case 4 where you
> already know the limits.
No. As the author of that code, I disagree.
Bruno
[1] https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=205078f891d87e0e966ad8616fdf0306437f0370
Home |
Main Index |
Thread Index |
Old Index