tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/39002: harmful AWK extension: non-portable escaped character

On Thu, Jun 26, 2008 at 10:52:57AM -0400, Greg A. Woods; Planix, Inc. wrote:
>>> It's not clearly defined there at all.
>> ...which is why it ought to generate a warning.
> No, I don't think so.
> As others have shown the standard (POSIX) does not define the behaviour of 
> backslashes in a string constant in AWK.
> However as I've shown the history of AWK in the context of UNIX not only 
> clearly defines the purpose and meaning of backslashes in a string constant 
> (and separately in regular expressions), but a rationale is also plainly 
> evident for the way these things have always worked the way they do in all 
> but one(?) "rogue"(*) implementation.

I'm not clear on what rationale you're thinking of. If someone writes
the string constant "^.*\.txt$", it's evident upon inspection by a
human that they intended the \. to escape the regexp metacharacter,
that is, they meant to write "^.*\\.txt$".

This is doubtless why mawk does what it reportedly does, but as you
note it's not what all the other implementations do.

What I don't understand is why you think it's desirable to assume the
opposite meaning, which is clearly not what anyone intends or wants.

> Adding a warning, especially in the way that was proposed (IIUC), will 
> potentially make many valid scripts, including existing scripts, spew 
> unnecessary warnings.

"Valid" in the sense that demons flying out of your nose is valid. The
behavior is undefined. Warning about undefined behavior is a good

> Perhaps if the warning were made significantly more intelligent then its 
> warnings might prove to be useful, but only in that case.  I would strongly 
> suggest that warnings MUST NOT be given for properly escaped regular 
> expressions which are expressed as string constants.

This paragraph does not make any sense.

> (*)
> [ranting about gawk deleted]
> Perhaps what would be more fruitful would be 
> for someone to propose and implement a fix for gawk to prevent it from 
> recognizing C special character escapes in regular expressions and for it 
> to treat string constants as pure C-like strings and thus hopefully 
> eventually eliminate this glaring difference between gawk and most/all 
> other AWK implementations.

This does not make any sense either.

David A. Holland

Home | Main Index | Thread Index | Old Index