tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: shell (/bin/sh) pattern matching bugs



Well, discard my suggestion : you have already answered: the problem
is that the "*" is already used as meaning litteral '*' so my point
of view is incompatible with the existing standard.

It's a mess... Wouldn't it be simple for POSIX to let the case...esac
as is and introduce a ecase...esac[e] (à la grep, egrep) with something
making more sense for corner cases?


On Sun, Jun 24, 2018 at 03:44:00PM +0200, tlaronde%polynum.com@localhost wrote:
> [For reader, please refer to Robert Elz' whole enlightening answer. I
> edit it]
> On Sun, Jun 24, 2018 at 07:49:25PM +0700, Robert Elz wrote:
> >   | - [Suppression of the double quotes?
> > 
> > This is, of course, the heart of the matter...
> > 
> > In POSIX, quote removal is explicitly not done on case
> > patterns. that is, the expansions that are done are listed,
> > and quote removal is not one of them.
> > 
> > So...
> > 
> >   | But this doesn't change anything in
> >   | the bracket expression];
> > 
> > It would, as, assuming the current literal text, an input string
> > which was a double quote (as in '"' or \") would match, as the
> > double quote character would appear in the [ ] expression
> > in the pattern.
> > 
> > Of course that is clearly absurd, and a bug report on the posix
> > text was submitted a while ago to include quote removal in the
> > list of operations to preform on case patterns.
> > 
> > Unfortunately, it isn't that simple, as just doing quote
> > removal on patterns would cause
> > 
> > 	case x in ("*") echo match;; esac
> > 
> > to match as the quote removal would leave the
> > pattern being just an asterisk, which matches anything,
> > which is not what is supposed to happen.
> > 
> > So the current proposed new text (which had been
> > accepted, but now is being discussed again, and will
> > be changed) also specified that along with quote removal,
> > any "pattern magic" characters in the quoted part of the
> > pattern would be \ escaped so they remained literal,
> > so "quote removal" of the "*" would produce \* not *
> > and so the pattern matching would look for a literal
> > asterisk rather than anything - which is what is wanted.
> 
> Thanks for the explanations!
> 
> FWIW, as a POSIX shell user, I would expect something more intuitive
> than what is proposed (if I understand correctly):
> 
> a) In all contexts, including the case patterns, substitutions including
> quote removal are done;
> 
> b) _After that_, the patterns are interpreted according to their own
> rules, including if double quotes escaped are still there, with string
> of litterals.
> 
> That is:
> 
> var="[:alpha:]"
> 
> (["$var"]) would lead after a) to ([[:alpha:]]) and then '[' would not
> match
> 
> while
> 
> ([\"$var\"]) would lead after a) to (["[:alpha:]"]) and then 
> '"[:alpha:]"' being interpreted as a string of litterals, '[' would 
> match.
> 
> I think that POSIX shell users like me are used to the escaping dance 
> when they feed sed(1) in a shell with a (not shell) regular expression,
> so it seems to me that this should be reasonably backward compatible
> be the least surprise case.
> 
> Just my 2 cents.
> 
> Best regards.
> 
> PS: I don't know if you have already modified the sh(1) man page (I'm on
> 7.1.1 not on current), but I think that the case grammar should say that
> the (pattern) expression is valid, the first '(' being optional---since
> in all examples, and in the man page, there is always "pattern)", the
> (pattern) expression can be surprising the "(...)" being used in some
> shells for lists or arrays.
> -- 
>         Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
>                      http://www.kergis.com/
>                        http://www.sbfa.fr/
> Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                       http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index