tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: sed(1) / BRE bug?



    Date:        Sun, 11 Oct 2020 11:45:12 +0200
    From:        tlaronde%polynum.com@localhost
    Message-ID:  <20201011094512.GA356%polynum.com@localhost>


  | The problem? the leading '$' is not escaped (I was trying to get the var
  | name from a Makefile)...
  |
  | Is this a bug or is this behavior undefined or even required by
  | POSIX?

Not a bug, and (kind of) required, kind of in that a \ somewhere it
is not required produces undefined results (XBD 9.3.2)

	The interpretation of an ordinary character preceded by an
	unescaped <backslash> ('\\') is undefined, except for:

The exceptions have nothing to do with '$'.

"Ordinary character" is defined in the previous sentence, same section

	An ordinary character is a BRE that matches itself: any character
	in the supported character set, except for the BRE special
	characters listed in Section 9.3.3.

9.3.3 does include '$' but:

	$ The <dollar-sign> shall be special when used as an anchor.

So, '$' is an ordinary character, except when it is an anchor.

Anchors are defined in XBD 9.3.8:

	A BRE can be limited to matching expressions that begin or end
	a string; this is called ``anchoring''. The <circumflex> and
	<dollar-sign> special characters shall be considered BRE
	anchors in the following contexts:

(skip '^' for this message)

	2. A <dollar-sign> ('$') shall be an anchor when used as the last
	   character of an entire BRE.

Your (first) '$' was not the last character of the BRE, so is not an anchor,
and hence is not special for this reason.  The section continues:

	   The implementation may treat a <dollar-sign> as an anchor when
	   used as the last character of a subexpression.

That one is optional for the implementation so you could not rely upon
it working, but here your first '$' is not at the end of a subsxpression,
so it wouldn't qualify anyway.   [This option is actually very ugly, as
when you want to use a '$' at the end of a subexpression to match itself,
rather than be an anchor, you must escape it with '\' if the implementation
would treat it as an anchor, but not escape it if it wouldn't.]

The rest of the paragraph just explains how matching by a '$' that is
an anchor works.   Yours isn't, so is just an ordinary character, and
so matches itself, and would produce undefined results if escaped (the
undefined result could be for it to simply match itself, making \$ always
mean to match a literal '$' but a sed (or anything else using BREs) script
should not rely upon that).

This behaviour ('^' is special only when it is the very first character
of the RE, and '$' is special only when it is the absolute last) is traditional
RE behaviour going back to the very earliest unix RE's (as in "ed").

For ERE's the rules are slightly different, but for anchors, I think only in
that they always work in subexpressions, it isn't an implementation option.
So, even in an ERE your first '$' should not be escaped (and certainly does
not require to be).

kre



Home | Main Index | Thread Index | Old Index