Subject: bug in gawk/gsub() (not present in nawk)
From: Jose Nazario
Date: 06/05/2003 00:14:19
while playing with some tools in data massaging, i had to migrate from an
openbsd/nawk system to a netbsd/gawk system. i found the folllowing
behavior, which seems to be a bug.

the following gsub() pattern has a strange effect under gawk which is not
visible in nawk (at least as compiled on openbsd). the intention is to
take a string like "This Is a Title: My Title?" and turn it into a
normalized string: "ThisIsaTitleMyTitle". to do this, i wrote the
following gross gsub line in an awk script:

	gsub(/[\ \"-\/\\:;\[\]\@\?\.\,\$]/, "", $2)
	print $2

in gawk, as found in netbsd-macppc/1.5.2, this will drop the first letter
of every word. the resulting string will be "hissitleyitle", while in nawk
as built on openbsd-3.3 this will get it correct.

any insights? the inconsistency with this relatively naive pattern seems a
bit odd. (i would up installing nawk built from openbsd sources.)

thanks. sorry i didn't send a better bug report, netbsd folks, i'm not
much of a netbsd user, and i dont have send-pr set up. yes, this is a
slightly older version of netbsd and gawk:

$ uname -a
NetBSD entropy 1.5.2 NetBSD 1.5.2 (GENERIC) #0: Sun Feb 10 02:00:04 EST
2002     jose@entropy:/usr/src/sys/arch/macppc/compile/GENERIC macppc
$ awk --version
GNU Awk 3.0.3
Copyright (C) 1989, 1991-1997 Free Software Foundation.


jose nazario, ph.d.