tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: shell (/bin/sh) pattern matching bugs

[For reader, please refer to Robert Elz' whole enlightening answer. I
edit it]
On Sun, Jun 24, 2018 at 07:49:25PM +0700, Robert Elz wrote:
>   | - [Suppression of the double quotes?
> This is, of course, the heart of the matter...
> In POSIX, quote removal is explicitly not done on case
> patterns. that is, the expansions that are done are listed,
> and quote removal is not one of them.
> So...
>   | But this doesn't change anything in
>   | the bracket expression];
> It would, as, assuming the current literal text, an input string
> which was a double quote (as in '"' or \") would match, as the
> double quote character would appear in the [ ] expression
> in the pattern.
> Of course that is clearly absurd, and a bug report on the posix
> text was submitted a while ago to include quote removal in the
> list of operations to preform on case patterns.
> Unfortunately, it isn't that simple, as just doing quote
> removal on patterns would cause
> 	case x in ("*") echo match;; esac
> to match as the quote removal would leave the
> pattern being just an asterisk, which matches anything,
> which is not what is supposed to happen.
> So the current proposed new text (which had been
> accepted, but now is being discussed again, and will
> be changed) also specified that along with quote removal,
> any "pattern magic" characters in the quoted part of the
> pattern would be \ escaped so they remained literal,
> so "quote removal" of the "*" would produce \* not *
> and so the pattern matching would look for a literal
> asterisk rather than anything - which is what is wanted.

Thanks for the explanations!

FWIW, as a POSIX shell user, I would expect something more intuitive
than what is proposed (if I understand correctly):

a) In all contexts, including the case patterns, substitutions including
quote removal are done;

b) _After that_, the patterns are interpreted according to their own
rules, including if double quotes escaped are still there, with string
of litterals.

That is:


(["$var"]) would lead after a) to ([[:alpha:]]) and then '[' would not


([\"$var\"]) would lead after a) to (["[:alpha:]"]) and then 
'"[:alpha:]"' being interpreted as a string of litterals, '[' would 

I think that POSIX shell users like me are used to the escaping dance 
when they feed sed(1) in a shell with a (not shell) regular expression,
so it seems to me that this should be reasonably backward compatible
be the least surprise case.

Just my 2 cents.

Best regards.

PS: I don't know if you have already modified the sh(1) man page (I'm on
7.1.1 not on current), but I think that the case grammar should say that
the (pattern) expression is valid, the first '(' being optional---since
in all examples, and in the man page, there is always "pattern)", the
(pattern) expression can be surprising the "(...)" being used in some
shells for lists or arrays.
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

Home | Main Index | Thread Index | Old Index