[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/39002: harmful AWK extension: non-portable escaped character
On Sun, Jul 13, 2008 at 03:26:52PM -0400, Greg A. Woods; Planix, Inc. wrote:
> Let us begin again.
That makes perfect sense, most of it. Where I lose you is the
transition from this:
> Therefore as I said the rationale should be obvious. Within the context
> and heritage of AWK's primary environment it is natural and expected that
> AWK strings have the same syntax as C strings and that AWK REs have the
> same syntax as in grep/sed/ed et al. This is because expressing REs using
> the string constant syntax is tedious and contrary to the syntax used in
> those other RE-rich tools such as ed, sed, grep, etc. Since it is on
> occasion useful to manipulate REs within a program and to use the results
> in RE expressions and for RE operands it is necessary to transform the RE
> syntax into the syntax of a string. Programmers making such use of strings
> as REs should be very aware of this necessity.
...in which you grant that the awk programmer must transform /^.*\.c$/
> So for this implementation which defines the undefined behaviours w.r.t.
> escape sequences in the standard, there are no simple cases where anything
> dumber than a human can, at the immediate parser step, give a valid warning
> about possible misuse of an escape sequence in a string constant.
As far as I can tell what you're claiming is that someone might
intentionally write "foo\.c" in a string constant, intending to get
the value "foo.c", instead of writing it in the normal way without the
If that's what you mean, then I'd have to say that it's a load of
dingo's kidneys. Nobody is going to do that, and if once in a while
they do, nonetheless 99.999% of the time that \. will be a mistake.
This is what warnings are about. (And why it's a warning, not an
> Finally note that should an AWK implementation choose to try to go beyond
> use of POSIX Extended Regular Expressions for its REs then there will be
> further clashes with the use of C escape sequences in the current RE
> syntax. I.e. for example with full Perl-compatible REs, eg. as implemented
> by the PCRE package, the '\b' escape sequence changes from matching a
> backspace character to matching a word boundary.
This is irrelevant; awk is not perl.
>> What I don't understand is why you think it's desirable to assume the
>> opposite meaning, which is clearly not what anyone intends or wants.
> The intention of the programmer cannot be determined by a simple parser.
> Analysis in depth of how each string constant is eventually used would be
> necessary to intuit the programmer's intent. Without doing such analysis
> it is impossible to automatically provide helpful warnings to the
> The only enhancement I can envision that could "fix" this issue for some
> people would be to allow string variables to be assigned from RE constants,
> str = /^.*\.txt$/;
> and thus the conversion of RE escape sequences to string escape sequences
> could be done internally and silently.
No, because that expression already has a meaning in awk. (Try it.)
David A. Holland
Main Index |
Thread Index |