tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: shell (/bin/sh) pattern matching bugs



    Date:        Sun, 24 Jun 2018 19:09:58 +0200
    From:        Rhialto <rhialto%falu.nl@localhost>
    Message-ID:  <20180624170958.GJ8143%falu.nl@localhost>

  | Are we to assume that NetBSD's sh(1) manual page is correct?

Well, yes and no...

  | Since that clearly says that your example above should not match.

Actually, it doesn't - it just kind of slides by this case...   That is, it
makes no mention of what happens if characters inside [ ] are
quoted (partly because I don't much like the quoting solution,
and never thought the ordering method was hard to get right...)

The man page shouild probably be fixed to be more precise - there
are all kinds of details it omits.  But there are also limits on how much
people are willing to read!

  | Pretty much ~always, descriptions of character classes (including
  | re_format(7)) have included words to the effect of
  |
  |    To include a ``]'' in a character class, make it the first character
  |    listed (after the ``!'', if any).  To include a ``-'', make it the first
  |    or last character listed.

Yes, that's how this was originally designed in the vey earliest
versions of re's and glob matching - from 5th edition or earlier
(that's as far back as I go.)

  | so the example should always have been
  |  	case - in [az-]) ...

Yes, of course, that should work, and there are tests for that kind
of case as well (and [-az] of course).    Those ones work.   So
does the simple [a\-z] case, if you accept that it is supposed to be
a match of the 3 chars listed a - and z.   This isn't anything new,
it has been like that in the NetBSD sh for a long time (probably
goes back to the original ash) and works the same way in every
other sh I can find to test.

  | if you want this to match. If your version matches, I'd call that a
  | long-standing bug.

That may be, I suspect this happened, as the original Bourne sh
(which had to run on non split I-D pdp-11's) handled parsing
by reading the input, and for any quoted (ascii only of course)
char, simply set the top bit.   Then it would compare against the
operator chars ('<' etc) or the pattern magic ('*' etc) and with the
top bit set, the chars were not equal, so not magic, just
nornal chars.   About the last thing it did was clear the top bits
before handing off to wherever the data was to go next (it
would also ignore that bit in cases it was doing a comparison
where no magic was expected or possible.)

Whether the quoting was intended to affect things the way it
did, or whether that was just an accident, is immaterial now.
There's no question that it has come to be a relied upon
feature, and is not going to go away.

  | Strangely, the Ex Reference Manual
  | /usr/share/doc/reference/ref1/ex/reference.ps.gz on page 13 claims that
  | a backslash SHOULD be used within [] to escape characters,

Yes, I remember that, and never understood why.   I doubt even Bill Joy
would remember now.

  | I found a V7 system (here:
[...]
  | I think that this shows that the trailing - isn't quite managed yet;

Yes, the code that does that from the original Bourne shell was
quoted on the austin-group list (posix list) and it was obvious what
the bug was there...

  | manpage indeed doesn't include the claim above phrasing about - and ],

It was intended though I suspect.   It also did not properly handle []xyz] to
match a ']' in a [ ] expression.   Maybe those were bugs, or maybe the
code to handle it was omitted as a space saving measure (since the
quoting method "just worked".)

  | You would think that at the time the "- ] blurb" was added,

That's actually much older, older than Bourne shells,
it is just that the original Bourne sh had LOTS of bugs
(or perhaps this was omitted, someone could ask Steve
Bourne if discovering the answer to this is important.)

kre



Home | Main Index | Thread Index | Old Index