tech-misc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: alx-0008 - Standardize strtoi(3) and strtou(3) from NetBSD



Alejandro Colomar wrote:
> > 2) nproc.c, omp_init.c
> > 
> > > >   gnulib/lib/nproc.c:402
> > > 
> > > lib/nproc.c-383-/* Parse OMP environment variables without dependence on OMP.
> > > lib/nproc.c-384-   Return 0 for invalid values.  */
> > > lib/nproc.c-385-static unsigned long int
> > > lib/nproc.c:386:parse_omp_threads (char const* threads)
> > > lib/nproc.c-387-{
> > > 
> > > ...
> > > 
> > > lib/nproc.c-398-  /* Convert it from positive decimal to 'unsigned long'.  */
> > > lib/nproc.c-399-  if (c_isdigit (*threads))
> > > lib/nproc.c-400-    {
> > > lib/nproc.c-401-      char *endptr = NULL;
> > > lib/nproc.c:402:      unsigned long int value = strtoul (threads, &endptr, 10);
> > > lib/nproc.c-403-
> > > lib/nproc.c-404-      if (endptr != NULL)
> > > lib/nproc.c-405-        {
> > > lib/nproc.c-406-          while (*endptr != '\0' && c_isspace (*endptr))
> > > lib/nproc.c-407-            endptr++;
> > > lib/nproc.c-408-          if (*endptr == '\0')
> > > lib/nproc.c-409-            return value;
> > > lib/nproc.c-410-          /* Also accept the first value in a nesting level,
> > > lib/nproc.c-411-             since we can't determine the nesting level from env vars.  */
> > > lib/nproc.c-412-          else if (*endptr == ',')
> > > lib/nproc.c-413-            return value;
> > > lib/nproc.c-414-        }
> > > lib/nproc.c-415-    }
> > > ...
> > > This is one case where you seem to silently ignore saturation.
> > 
> > This is on purpose. A CPU never has more than 1000 processors. Therefore
> > all values > 1000 should be treated the same way, whether they are
> > <= ULONG_MAX or > ULONG_MAX. Clamping to the number of actually available
> > processors occurs later in the code.
> 
> I guess you refer to the MIN() calls within num_processors(), right?
> Why do those clampings not result in diagnostics?  Couldn't you
> calculate the limits before parsing the actual number, and so use them
> to perform the range checks during the strtou(3) call?

The essential code is like this:

  unsigned long int omp_env_limit = ULONG_MAX;

  if (query == NPROC_CURRENT_OVERRIDABLE)
    {
      omp_env_limit = parse_omp_threads (getenv ("OMP_THREAD_LIMIT"));
      if (! omp_env_limit)
        omp_env_limit = ULONG_MAX;
      ...
      query = NPROC_CURRENT;
    }
  unsigned long nprocs = num_processors_ignoring_omp (query);
  return MIN (nprocs, omp_env_limit);

and num_processors_ignoring_omp (query) is expensive to compute, since
it makes system calls. For this reason, we do the parsing of environment
variables first; this gives us the opportunity to optimize away the
system calls in particular circumstances.[1]

> > 3) msgfmt.c
> > 
> > > >   gettext/gettext-tools/src/msgfmt.c:287
> > > 
> > > gettext-tools/src/msgfmt.c-286-          char *endp;
> > > gettext-tools/src/msgfmt.c:287:          size_t new_align = strtoul (optarg, &endp, 0);
> > > gettext-tools/src/msgfmt.c-288-
> > > gettext-tools/src/msgfmt.c-289-          if (endp != optarg)
> > > gettext-tools/src/msgfmt.c-290-            alignment = new_align;
> > > ...
> > > Why don't you have a diagnostic message for invalid input?
> > 
> > Because this option is meant for users who know what an "alignment" is.
> 
> But still, you don't need to actively ignore ERANGE, especially when it
> results in more complex code than actually checking.  Once the API gives
> you the check for free, you should probably use it.
> 
> In this case, I think I'd do
> 
> 	int     status;
> 	size_t  new_align;
> 
> 	new_align = strtou(optarg, NULL, 0, 1, 16, &status);
> 	if (status == 0)
> 		alignment = new_align;
> 
> Don't you think this is more robust and just as simple?
> ...
> strtoul(3) makes it difficult for you to do the check,
> not because you actively want to not check.

Yes and no. Yes, it would be nice to have a range check at essentially
zero cost. No, clamping is not the right behaviour: an alignment of 5 or
17 should be rejected, not mapped to something valid.

> > 4) msgl-check.c
> > 
> > > >   gettext/gettext-tools/src/msgl-check.c:379
> > > 
> > > gettext-tools/src/msgl-check.c-374-          while (*nplurals != '\0' && c_isspace ((unsigned char) *nplurals))
> > > gettext-tools/src/msgl-check.c-375-            ++nplurals;
> > > gettext-tools/src/msgl-check.c-376-          endp = nplurals;
> > > gettext-tools/src/msgl-check.c-377-          nplurals_value = 0;
> > > gettext-tools/src/msgl-check.c-378-          if (*nplurals >= '0' && *nplurals <= '9')
> > > gettext-tools/src/msgl-check.c:379:            nplurals_value = strtoul (nplurals, (char **) &endp, 10);
> > > gettext-tools/src/msgl-check.c-380-          if (nplurals == endp)
> > > gettext-tools/src/msgl-check.c-381-            {
> > > gettext-tools/src/msgl-check.c-382-              const char *msg = _("invalid nplurals value");
> > > gettext-tools/src/msgl-check.c-383-              char *help = plural_help (nullentry);
> > > gettext-tools/src/msgl-check.c-384-
> > > gettext-tools/src/msgl-check.c-385-              if (help != NULL)
> > > gettext-tools/src/msgl-check.c-386-                {
> > > gettext-tools/src/msgl-check.c-387-                  char *msgext = xasprintf ("%s\n%s", msg, help);
> > > gettext-tools/src/msgl-check.c-388-                  xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, true,
> > > gettext-tools/src/msgl-check.c-389-                               msgext);
> > > gettext-tools/src/msgl-check.c-390-                  free (msgext);
> > > gettext-tools/src/msgl-check.c-391-                  free (help);
> > > gettext-tools/src/msgl-check.c-392-                }
> > > gettext-tools/src/msgl-check.c-393-              else
> > > gettext-tools/src/msgl-check.c-394-                xeh->xerror (CAT_SEVERITY_ERROR, header, NULL, 0, 0, false,
> > > gettext-tools/src/msgl-check.c-395-                             msg);
> > > gettext-tools/src/msgl-check.c-396-
> > > gettext-tools/src/msgl-check.c-397-              seen_errors++;
> > > gettext-tools/src/msgl-check.c-398-            }
> > 
> > > On the other hand, I also wonder why you don't diagnose invalid input.
> > > Why is -10 an "invalid nplurals value"
> > 
> > "-10" designates a negative number. For the number of plural forms, that's
> > invalid.
> > 
> > > but ULONG_MAX+10 is a valid (albeit clamped) one?
> > 
> > Again, as mentioned above, values like 1000 and 10000000000000000 should
> > be treated the same way. Since ULONG_MAX+10 gets clamped to ULONG_MAX, that's
> > just fine.
> 
> I see in line 433 a check
> 
> 	if (min_nplurals < nplurals_value)
> 
> And in line 449 a check
> 
> 	else if (max_nplurals > nplurals_value)
> 
> nplurals_value was parsed via strtoul(3) in line 379.
> 
> And min_plurals and max_plurals were assigned in lines 293 and lines 298.
> Which makes me wonder: why don't you move the tests from lines 4** into
> the strtou(3) call?
> 
> That is, why not call this?
> 
> 	nplurals_value = strtou(nplurals, NULL, 10, min_nplurals, max_nplurals, &status);
> 
> And then have the ERANGE handling there?

For two reasons:

  - The program's logic is to parse simple things first and complicated things
    afterwards. The user is likely to make a mistake in the simple things with
    a lower probability than in the complicated things. When there is a
    mismatch, the likely culprit is a mistake in the complicated things; the
    diagnostics need to be directed at this scenario.

  - It feels strange to move complicated validation logic into a library
    function.

In summary, while the built-in range checks are welcome for things like
port numbers (0..65536), in this case it's better to keep the program's logic
explicit and situation-aware.

> In cases 2 and 4, I think you'd benefit from refactoring the code to use
> the limits in the strtou(3) call.  Especially in case 4 where you
> already know the limits.

No. As the author of that code, I disagree.

Bruno

[1] https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=205078f891d87e0e966ad8616fdf0306437f0370





Home | Main Index | Thread Index | Old Index