tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sed(1) / BRE bug?



Hello Robert,

On Mon, Oct 12, 2020 at 12:06:57AM +0700, Robert Elz wrote:
>     Date:        Sun, 11 Oct 2020 11:45:12 +0200
>     From:        tlaronde%polynum.com@localhost
>     Message-ID:  <20201011094512.GA356%polynum.com@localhost>
> 
> 
>   | The problem? the leading '$' is not escaped (I was trying to get the var
>   | name from a Makefile)...
>   |
>   | Is this a bug or is this behavior undefined or even required by
>   | POSIX?
> 
> Not a bug, and (kind of) required, kind of in that a \ somewhere it
> is not required produces undefined results (XBD 9.3.2)
> 
> 	The interpretation of an ordinary character preceded by an
> 	unescaped <backslash> ('\\') is undefined, except for:
> 
> The exceptions have nothing to do with '$'.
> 
> "Ordinary character" is defined in the previous sentence, same section
> 
> 	An ordinary character is a BRE that matches itself: any character
> 	in the supported character set, except for the BRE special
> 	characters listed in Section 9.3.3.
> 
> 9.3.3 does include '$' but:
> 
> 	$ The <dollar-sign> shall be special when used as an anchor.
> 
> So, '$' is an ordinary character, except when it is an anchor.
> 
> Anchors are defined in XBD 9.3.8:
> 
> 	A BRE can be limited to matching expressions that begin or end
> 	a string; this is called ``anchoring''. The <circumflex> and
> 	<dollar-sign> special characters shall be considered BRE
> 	anchors in the following contexts:
> 
> (skip '^' for this message)
> 
> 	2. A <dollar-sign> ('$') shall be an anchor when used as the last
> 	   character of an entire BRE.
> 
> Your (first) '$' was not the last character of the BRE, so is not an anchor,
> and hence is not special for this reason.  The section continues:
> 
> 	   The implementation may treat a <dollar-sign> as an anchor when
> 	   used as the last character of a subexpression.
> 
> That one is optional for the implementation so you could not rely upon
> it working, but here your first '$' is not at the end of a subsxpression,
> so it wouldn't qualify anyway.   [This option is actually very ugly, as
> when you want to use a '$' at the end of a subexpression to match itself,
> rather than be an anchor, you must escape it with '\' if the implementation
> would treat it as an anchor, but not escape it if it wouldn't.]
> 
> The rest of the paragraph just explains how matching by a '$' that is
> an anchor works.   Yours isn't, so is just an ordinary character, and
> so matches itself, and would produce undefined results if escaped (the
> undefined result could be for it to simply match itself, making \$ always
> mean to match a literal '$' but a sed (or anything else using BREs) script
> should not rely upon that).
> 
> This behaviour ('^' is special only when it is the very first character
> of the RE, and '$' is special only when it is the absolute last) is traditional
> RE behaviour going back to the very earliest unix RE's (as in "ed").
> 
> For ERE's the rules are slightly different, but for anchors, I think only in
> that they always work in subexpressions, it isn't an implementation option.
> So, even in an ERE your first '$' should not be escaped (and certainly does
> not require to be).
> 

Thank you for the information! (I would not have been able to weave my
way thru the standard to finally find this answer.)

I will have to verify that on every system, the unescaped leading dollar
does not cause something nasty (it is used for installation on every
system where I try to have the minimum POSIX.2 requirements.).

Best regards,
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                       http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index