tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: shell (/bin/sh) pattern matching bugs



On Sun 24 Jun 2018 at 19:49:25 +0700, Robert Elz wrote:
> But the effect of that way of doing the specification is that the \
> which escapes magic characters in regular expressions does
> not work inside [ ] (and the text is explicit about that - and correct)
> which means the technique in the proposed revised posix text
> about replacing " and ' quoting with \ doesn't work at all as intended
> inside [ ] which is the case in test 97.   But that can't be right
> either, as then
> 
> 	case - in [a\-z]) ...
> 
> would not match, and whether you believe it should or not,
> matching there is what all shells have done forever (that
> is, the quoted - is a literal minus/hyphen/dash (whatever you
> prefer to call it) and not the range indicator, where in a
> regular expression that would be an 'a' and a range with
> all chars from '\' to 'z').

Are we to assume that NetBSD's sh(1) manual page is correct? Since that
clearly says that your example above should not match. Both in my 7.0
version, and the -current version (as seen on
https://man-k.org/man/NetBSD-current/1/sh?r=1&q=sh#Shell_Patterns).
Pretty much ~always, descriptions of character classes (including
re_format(7)) have included words to the effect of

     To include a ``]'' in a character class, make it the first character
     listed (after the ``!'', if any).  To include a ``-'', make it the first
     or last character listed.

so the example should always have been

 	case - in [az-]) ...

if you want this to match. If your version matches, I'd call that a
long-standing bug. I no longer have my printed Ultrix manuals, but I'd
be surprised if they were different.

Strangely, the Ex Reference Manual
/usr/share/doc/reference/ref1/ex/reference.ps.gz on page 13 claims that
a backslash SHOULD be used within [] to escape characters, while the
Vi/Ex Reference Manual /usr/share/doc/reference/ref1/vi/vi.ps.gz on page
10 refers to re_format(7). I suppose the former refers to the old
encumbered BSD ex(1) while the latter says it refers to nvi(1).

I found a V7 system (here:
http://simh.trailing-edge.com/software.html)and a 2.11BSD system (I
don'r remember where) and they did this (at least the V7 did; I didn't
try all cases on 2.11BSD)

# case - in [az-]) echo M;; *) echo X;; esac
X
# case a in [az-]) echo M;; *) echo X;; esac
M
# case z in [az-]) echo M;; *) echo X;; esac
X
# case z in [az]) echo M;; *) echo X;; esac
M
# case - in [a-z]) echo M;; *) echo X;; esac
X
# case - in [a\-z]) echo M;; *) echo X;; esac
M

I think that this shows that the trailing - isn't quite managed yet; the
manpage indeed doesn't include the claim above phrasing about - and ],
but the full description is just

     [...]
          Matches any one of the characters enclosed.  A pair of
          characters separated by - matches any character lexi-
          cally between the pair.

You would think that at the time the "- ] blurb" was added, it was
supposedly because the code had changed and it was now true, but
apparently not. But it would seem to indicate that it was meant to be
true.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- Wayland: Those who don't understand X
\X/ rhialto/at/falu.nl      -- are condemned to reinvent it. Poorly.

Attachment: signature.asc
Description: PGP signature



Home | Main Index | Thread Index | Old Index