tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: shell (/bin/sh) pattern matching bugs



[For reader, please refer to Robert Elz' whole enlightening answer. I
edit it]
On Sun, Jun 24, 2018 at 07:49:25PM +0700, Robert Elz wrote:
>   | - [Suppression of the double quotes?
> 
> This is, of course, the heart of the matter...
> 
> In POSIX, quote removal is explicitly not done on case
> patterns. that is, the expansions that are done are listed,
> and quote removal is not one of them.
> 
> So...
> 
>   | But this doesn't change anything in
>   | the bracket expression];
> 
> It would, as, assuming the current literal text, an input string
> which was a double quote (as in '"' or \") would match, as the
> double quote character would appear in the [ ] expression
> in the pattern.
> 
> Of course that is clearly absurd, and a bug report on the posix
> text was submitted a while ago to include quote removal in the
> list of operations to preform on case patterns.
> 
> Unfortunately, it isn't that simple, as just doing quote
> removal on patterns would cause
> 
> 	case x in ("*") echo match;; esac
> 
> to match as the quote removal would leave the
> pattern being just an asterisk, which matches anything,
> which is not what is supposed to happen.
> 
> So the current proposed new text (which had been
> accepted, but now is being discussed again, and will
> be changed) also specified that along with quote removal,
> any "pattern magic" characters in the quoted part of the
> pattern would be \ escaped so they remained literal,
> so "quote removal" of the "*" would produce \* not *
> and so the pattern matching would look for a literal
> asterisk rather than anything - which is what is wanted.

Thanks for the explanations!

FWIW, as a POSIX shell user, I would expect something more intuitive
than what is proposed (if I understand correctly):

a) In all contexts, including the case patterns, substitutions including
quote removal are done;

b) _After that_, the patterns are interpreted according to their own
rules, including if double quotes escaped are still there, with string
of litterals.

That is:

var="[:alpha:]"

(["$var"]) would lead after a) to ([[:alpha:]]) and then '[' would not
match

while

([\"$var\"]) would lead after a) to (["[:alpha:]"]) and then 
'"[:alpha:]"' being interpreted as a string of litterals, '[' would 
match.

I think that POSIX shell users like me are used to the escaping dance 
when they feed sed(1) in a shell with a (not shell) regular expression,
so it seems to me that this should be reasonably backward compatible
be the least surprise case.

Just my 2 cents.

Best regards.

PS: I don't know if you have already modified the sh(1) man page (I'm on
7.1.1 not on current), but I think that the case grammar should say that
the (pattern) expression is valid, the first '(' being optional---since
in all examples, and in the man page, there is always "pattern)", the
(pattern) expression can be surprising the "(...)" being used in some
shells for lists or arrays.
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                       http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index