tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]


    Date:        Sun, 22 Nov 2009 20:57:48 +0000 (GMT)
    From:        Iain Hibbert <>
    Message-ID:  <>

  | but I think it should actually have been "strchr() == NULL" to escape the
  | unmatched characters..

No, that wasn't the bug, the strchr(str, c) != NULL is the same as
        c == str[0] || c == str[2] || ...
and that's exactly what we want (and what you translated it into).

except for ...
  | Also, I think there would be a bug with the use of
  | strchr(), as it will match '\0' which ought to be escaped ..

Yes, the nul is a special case.    The bug was that the logic was
wrong:  If !isalnum(c) is false, which it needs to be for the next ||
term to even be evaluated, then we know isalnum(c) is true, which means
that c is a letter or digit, which means that you cannot ever find it
in a string that contains no letters or digits - so the third alternative
in the code you quoted is just a waste of space...

  | And so, I would like to apply the following patch to fix that

Personally I'd prefer a smaller change to the code, to just correct the
obvious bugs, without the unnecessary style change, that would be
something more like

        if (!isascii(c) || !(isalnum(c) || (c != '\0' &&
            strchr("$-_.+!*'(),", c) != NULL))) {

and then leave the order of the true and false cases the same as
they are now ... just to minimise churn in the code and make it
clear what was really changed - that is (aside from the 0 test)
that there is a missing set of ()'s in the code we have now.
The missing '(' goes between ! and isalnum(), and the corresponding
')' after NULL (with the other ')'s.)

Then we do %hexhex encoding if the character is not ascii (yeah, that's
going to be right for arbitrary non-ascii characters ... we better hope
that c is confined to the 0..255 range) or if it is not the case that
it is either alphanumeric or is in the magic string (is one of the magic
characters that are OK).  That is, that it is not alphanumeric and is
not in the magic string (with negative logic).  The c != 0 test is needed
because 0 is in the string, but we want to pretend it isn't (as you pointed
out) - it woudl be kind of nice to have a version of strchr() that wouldn't
match the terminating \0 just for cases like this - but we don't.

You can play with boolean algebra to rewrite that in various other forms,
but this one is closest to the original I think, and so best shows what
changed from what we have now, into what was really intended.

After that, if we (or someone) think(s) that the style is wrong, and it is
better to explicitly enumerate the character tests (c == 'a' || c == 'b' ...)
rather than using strchr(), or if we presume that way generates better
code, or ... (lots of other reasons) then the code style could be changed
in an update intended to make no difference to the result (which can be
tested for in the test suite if someone wants to do that).


Home | Main Index | Thread Index | Old Index