current-users: Re: AWK vs. gawk.

Subject: Re: AWK vs. gawk.
To: None <netbsd-help@netbsd.org, current-users@netbsd.org>
From: Richard Rauch <rkr@olib.org>
List: current-users
Date: 05/12/2004 05:18:55

On Tue, May 11, 2004 at 05:30:50AM -0500, Richard Rauch wrote:
> Right around the time that NetBSD -current switched from using gawk to
> using nawk (I think) as the system AWK, I had written a smallish
> AWK script to parse Doxygen documentation and spit out man pages.
> (Doxygen can generate *roff, but it's not really usable for man
 [...]
> I am particularly bothered by a missing feature from the NetBSD
> AWK, in the regular expressions: I need to anchor some of my
> matches to beginnings or ends of strings.  For example, one of
> my gensub calls is:
> 
>   ret = gensub ("^[ ]*\\([ ]*", "", "g", ret );

Okay, the above is perfectly correct with gawk and with NetBSD's
awk, it seems.

The problem is that those weren't the lines giving me problems.
I didn't test carefully with a ~2.0 system.  Or rather, the
problem lines weren't *exactly* like the above.

The problem is when "g" is not "g", but is rather 1 or "1".
(Or any other integer.)

With gawk, for any integer n:

  ret = gensub (<r.e.>, <replace>, n, src);

...you get the nth occurance replaced, where n=1 is the left-
most occurance.

With NetBSD's AWK, you get the (n+1)th, so n=0 is the left-most
occurance.

Concretely:

  echo "helloello" | awk '{print gensub ("ello", "i", 1, $0);}'

...prints "hiello" on gawk, and "helloi" on NetBSD's pre-2.0
-current (and presuambly 2.0) awk.

Since I understand gensub() to be a GNUism, and cannot find any
outstanding bug report on this, I'm going to file a bug report
after I send this message.  I'm CC'ing to current-users since
I assume that it will be of particular interest there.  (Although
I'm running -current, it's a pre-2.0 -current, which is why I
originally posted to this list.)

-- 
  "I probably don't know what I'm talking about."  http://www.olib.org/~rkr/