tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/39002: harmful AWK extension: non-portable escaped character



On Sun, Jul 13, 2008 at 03:26:52PM -0400, Greg A. Woods; Planix, Inc. wrote:
> Let us begin again.
>
> [...]

That makes perfect sense, most of it. Where I lose you is the
transition from this:

> Therefore as I said the rationale should be obvious.  Within the context 
> and heritage of AWK's primary environment it is natural and expected that 
> AWK strings have the same syntax as C strings and that AWK REs have the 
> same syntax as in grep/sed/ed et al.  This is because expressing REs using 
> the string constant syntax is tedious and contrary to the syntax used in 
> those other RE-rich tools such as ed, sed, grep, etc.  Since it is on 
> occasion useful to manipulate REs within a program and to use the results 
> in RE expressions and for RE operands it is necessary to transform the RE 
> syntax into the syntax of a string.  Programmers making such use of strings 
> as REs should be very aware of this necessity.

...in which you grant that the awk programmer must transform /^.*\.c$/
to "^.*\\.c$"...

to this:

> So for this implementation which defines the undefined behaviours w.r.t. 
> escape sequences in the standard, there are no simple cases where anything 
> dumber than a human can, at the immediate parser step, give a valid warning 
> about possible misuse of an escape sequence in a string constant.

As far as I can tell what you're claiming is that someone might
intentionally write "foo\.c" in a string constant, intending to get
the value "foo.c", instead of writing it in the normal way without the
backslash.

If that's what you mean, then I'd have to say that it's a load of
dingo's kidneys. Nobody is going to do that, and if once in a while
they do, nonetheless 99.999% of the time that \. will be a mistake.
This is what warnings are about. (And why it's a warning, not an
error.)

> Finally note that should an AWK implementation choose to try to go beyond 
> use of POSIX Extended Regular Expressions for its REs then there will be 
> further clashes with the use of C escape sequences in the current RE 
> syntax.  I.e. for example with full Perl-compatible REs, eg. as implemented 
> by the PCRE package, the '\b' escape sequence changes from matching a 
> backspace character to matching a word boundary.

This is irrelevant; awk is not perl.

>> What I don't understand is why you think it's desirable to assume the
>> opposite meaning, which is clearly not what anyone intends or wants.
>
> The intention of the programmer cannot be determined by a simple parser.  
> Analysis in depth of how each string constant is eventually used would be 
> necessary to intuit the programmer's intent.  Without doing such analysis 
> it is impossible to automatically provide helpful warnings to the 
> programmer.

Baloney.

> The only enhancement I can envision that could "fix" this issue for some 
> people would be to allow string variables to be assigned from RE constants, 
> e.g.:
>
>       str = /^.*\.txt$/;
>
> and thus the conversion of RE escape sequences to string escape sequences 
> could be done internally and silently.

No, because that expression already has a meaning in awk. (Try it.)

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index