tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/39002: harmful AWK extension: non-portable escaped character

On Sun, Jul 13, 2008 at 03:26:52PM -0400, Greg A. Woods; Planix, Inc. wrote:
> Let us begin again.
> [...]

That makes perfect sense, most of it. Where I lose you is the
transition from this:

> Therefore as I said the rationale should be obvious.  Within the context 
> and heritage of AWK's primary environment it is natural and expected that 
> AWK strings have the same syntax as C strings and that AWK REs have the 
> same syntax as in grep/sed/ed et al.  This is because expressing REs using 
> the string constant syntax is tedious and contrary to the syntax used in 
> those other RE-rich tools such as ed, sed, grep, etc.  Since it is on 
> occasion useful to manipulate REs within a program and to use the results 
> in RE expressions and for RE operands it is necessary to transform the RE 
> syntax into the syntax of a string.  Programmers making such use of strings 
> as REs should be very aware of this necessity. which you grant that the awk programmer must transform /^.*\.c$/
to "^.*\\.c$"...

to this:

> So for this implementation which defines the undefined behaviours w.r.t. 
> escape sequences in the standard, there are no simple cases where anything 
> dumber than a human can, at the immediate parser step, give a valid warning 
> about possible misuse of an escape sequence in a string constant.

As far as I can tell what you're claiming is that someone might
intentionally write "foo\.c" in a string constant, intending to get
the value "foo.c", instead of writing it in the normal way without the

If that's what you mean, then I'd have to say that it's a load of
dingo's kidneys. Nobody is going to do that, and if once in a while
they do, nonetheless 99.999% of the time that \. will be a mistake.
This is what warnings are about. (And why it's a warning, not an

> Finally note that should an AWK implementation choose to try to go beyond 
> use of POSIX Extended Regular Expressions for its REs then there will be 
> further clashes with the use of C escape sequences in the current RE 
> syntax.  I.e. for example with full Perl-compatible REs, eg. as implemented 
> by the PCRE package, the '\b' escape sequence changes from matching a 
> backspace character to matching a word boundary.

This is irrelevant; awk is not perl.

>> What I don't understand is why you think it's desirable to assume the
>> opposite meaning, which is clearly not what anyone intends or wants.
> The intention of the programmer cannot be determined by a simple parser.  
> Analysis in depth of how each string constant is eventually used would be 
> necessary to intuit the programmer's intent.  Without doing such analysis 
> it is impossible to automatically provide helpful warnings to the 
> programmer.


> The only enhancement I can envision that could "fix" this issue for some 
> people would be to allow string variables to be assigned from RE constants, 
> e.g.:
>       str = /^.*\.txt$/;
> and thus the conversion of RE escape sequences to string escape sequences 
> could be done internally and silently.

No, because that expression already has a meaning in awk. (Try it.)

David A. Holland

Home | Main Index | Thread Index | Old Index