[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/39002: harmful AWK extension: non-portable escaped character
On 26-Jun-08, at 9:39 AM, David Holland wrote:
On Tue, Jun 24, 2008 at 11:31:40AM -0700, James Chacon wrote:
i.e. there's nothing stating that
\x -> \x
\x -> x
inside a string. It's not clearly defined there at all.
...which is why it ought to generate a warning.
No, I don't think so.
As others have shown the standard (POSIX) does not define the
behaviour of backslashes in a string constant in AWK.
However as I've shown the history of AWK in the context of UNIX not
only clearly defines the purpose and meaning of backslashes in a
string constant (and separately in regular expressions), but a
rationale is also plainly evident for the way these things have always
worked the way they do in all but one(?) "rogue"(*) implementation.
Adding a warning, especially in the way that was proposed (IIUC), will
potentially make many valid scripts, including existing scripts, spew
Perhaps if the warning were made significantly more intelligent then
its warnings might prove to be useful, but only in that case. I would
strongly suggest that warnings MUST NOT be given for properly escaped
regular expressions which are expressed as string constants.
(*) If you examine the history of the implementation of GNU AWK I'm
sure you'll still be able to find ample evidence that at least until
POSIX came along the gawk implementers were very much intending to go
their own way with many aspects of the language they were
implementing. My memory of events is that gawk implementers were only
mildly concerned with forward portability of existing AWK scripts and
that they were not at all concerned about portability of scripts
initially written for gawk. The gawk language was clearly intended to
be an extension of, and progression away from, both the original V7
AWK, as well as "new AWK". I.e. the fact that the standard caters to
some degree to the variance of gawk should not get in the way of, or
influence the behaviour of, a true AWK which has firm roots in the
UNIX traditions; especially since the standard gives full freedom for
any AWK to maintain its own heritage. Perhaps what would be more
fruitful would be for someone to propose and implement a fix for gawk
to prevent it from recognizing C special character escapes in regular
expressions and for it to treat string constants as pure C-like
strings and thus hopefully eventually eliminate this glaring
difference between gawk and most/all other AWK implementations.
Greg A. Woods; Planix, Inc.
Main Index |
Thread Index |