tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/39002: harmful AWK extension: non-portable escaped character




On 26-Jun-08, at 9:39 AM, David Holland wrote:

On Tue, Jun 24, 2008 at 11:31:40AM -0700, James Chacon wrote:
i.e. there's nothing stating that

\x -> \x

or

\x -> x

inside a string. It's not clearly defined there at all.

...which is why it ought to generate a warning.

No, I don't think so.

As others have shown the standard (POSIX) does not define the behaviour of backslashes in a string constant in AWK.

However as I've shown the history of AWK in the context of UNIX not only clearly defines the purpose and meaning of backslashes in a string constant (and separately in regular expressions), but a rationale is also plainly evident for the way these things have always worked the way they do in all but one(?) "rogue"(*) implementation.

Adding a warning, especially in the way that was proposed (IIUC), will potentially make many valid scripts, including existing scripts, spew unnecessary warnings.

Perhaps if the warning were made significantly more intelligent then its warnings might prove to be useful, but only in that case. I would strongly suggest that warnings MUST NOT be given for properly escaped regular expressions which are expressed as string constants.

(*) If you examine the history of the implementation of GNU AWK I'm sure you'll still be able to find ample evidence that at least until POSIX came along the gawk implementers were very much intending to go their own way with many aspects of the language they were implementing. My memory of events is that gawk implementers were only mildly concerned with forward portability of existing AWK scripts and that they were not at all concerned about portability of scripts initially written for gawk. The gawk language was clearly intended to be an extension of, and progression away from, both the original V7 AWK, as well as "new AWK". I.e. the fact that the standard caters to some degree to the variance of gawk should not get in the way of, or influence the behaviour of, a true AWK which has firm roots in the UNIX traditions; especially since the standard gives full freedom for any AWK to maintain its own heritage. Perhaps what would be more fruitful would be for someone to propose and implement a fix for gawk to prevent it from recognizing C special character escapes in regular expressions and for it to treat string constants as pure C-like strings and thus hopefully eventually eliminate this glaring difference between gawk and most/all other AWK implementations.

--
                                        Greg A. Woods; Planix, Inc.
                                        <woods%planix.ca@localhost>



Home | Main Index | Thread Index | Old Index