Subject: Re: bug in gawk/gsub() (not present in nawk)
To: Jose Nazario <jose@monkey.org>
From: Geoff Wing <gcw@pobox.com>
List: netbsd-bugs
Date: 06/05/2003 15:21:13
Jose Nazario <jose@monkey.org> typed:
:the following gsub() pattern has a strange effect under gawk which is not
:visible in nawk (at least as compiled on openbsd). the intention is to
:take a string like "This Is a Title: My Title?" and turn it into a
:normalized string: "ThisIsaTitleMyTitle". to do this, i wrote the
:following gross gsub line in an awk script:
:
:	gsub(/[\ \"-\/\\:;\[\]\@\?\.\,\$]/, "", $2)
:	print $2

Yes, it seems to be a bug in that version of gawk.  Newer gawk (3.1.2)
doesn't seem to display it.

% echo "This Is a Title: My Title?" | gawk '{gsub(/[ \"-\/\\:;\[\]@?\.,\$]/, "", $0); print $0}'
ThisIsaTitleMyTitle

:any insights? the inconsistency with this relatively naive pattern seems a
:bit odd. (i would up installing nawk built from openbsd sources.)

Insights?  Use negated lists (or character classes) to keep the ranges you
want instead, e.g.
	gsub(/[^a-zA-Z0-9]/, "", $2)
or
	gsub(/[^[:alnum:]]/, "", $2)

Regards,
-- 
Geoff Wing : <gcw@pobox.com>
Rxvt Stuff : <gcw@rxvt.org>
Zsh Stuff  : <gcw@zsh.org>