NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/42463: Bizarre behavior in awk with invalid numeric constants
The following reply was made to PR bin/42463; it has been noted by GNATS.
From: "Greg A. Woods" <woods%planix.ca@localhost>
To: NetBSD GNATS <gnats-bugs%NetBSD.org@localhost>
Cc:
Subject: Re: bin/42463: Bizarre behavior in awk with invalid numeric constants
Date: Sat, 19 Dec 2009 00:01:11 -0500
--pgp-sign-Multipart_Sat_Dec_19_00:01:11_2009-1
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
At Wed, 16 Dec 2009 21:10:00 +0000 (UTC), dholland%eecs.harvard.edu@localhost
wrote:
Subject: bin/42463: Bizarre behavior in awk with invalid numeric constants
>=20
> This is not so surprising, although one would expect it to generate a
> syntax error (recall that awk doesn't handle hex integer constants...)
Well, a syntax error would not really be correct so far as I can tell,
maybe not even for constants in the program text.
As you know an awk scalar variable has both a string and a number value
at the same time; and expressions take on string or numeric values as
appropriate.
As far as I can find very little is said in the awk book or the awk(1)
manual about numerical constants. Both do say though that string
constants are quoted with double quote characters (and can contain
C-like character escapes), and regular expression constants are quoted
with slash characters.
The awk book does say, (appendix A, p.192) "The numeric value of an
arbitrary string is the numeric value of its numeric prefix."
The mawk(1) manual says this about numeric constants:
Numeric constants can be integer like -2, decimal like 1.08,
or in scientific notation like -1.1e4 or .28E-3.
So, in all your examples the numeric value of the unnamed constant you
give as "0xblegh" should probably be zero, IIUC, at least for awk and
mawk.
However as you've shown it doesn't seem as though things actually work
the way _I_ would expect when it comes to expressions containing
un-quoted non-numeric constants with numeric prefixes.
Interestingly to me awk and mawk behave in exactly the same bizarre way:
$ awk 'BEGIN{v =3D 0xblegh - 5; printf("%s\n", v) }'
0-5
$ mawk 'BEGIN{v =3D 0xblegh - 5; printf("%s\n", v) }'
0-5
$ gawk 'BEGIN{v =3D 0xblegh - 5; printf("%s\n", v) }'
11-5
Those examples really do floor me. What an amazing side effect, and
identically in two different implementations! I can only guess without
looking at the code that mawk tries very hard to mimic awk's behaviour
here. Gawk is almost being even more bizarre, but at least it might be
getting the interpretation of the first constant correct, for some
meaning of correct as per its own documentation.
Given the following as well it looks as if the parser is sticking the
numeric value of the first term into the number part of the variable,
and then sticking the numeric part of the second "term" into the string
part of the variable, but just as if it parsed number, not as the
operator and second value:
$ awk 'BEGIN{v = 0xblegh + 9; printf("%s\n", v) }'
09
Gawk does just print the correct result if the hex number is indeed a
proper hex number, and if the variable is printed as a string:
$ gawk 'BEGIN{v = 0x11 - 5; printf "%s\n", v }'
12
but I guess that's not really a surprise of any kind.
To avoid any possible code parser issues we can feed the value in as
input, and indeed that does then seem to have a better result, though
still not entirely an expected result since, IIUC, neither awk nor mawk
should interpret hex for input values, but apparently they do:
23:07 [602] $ echo "0xblegh" | awk '{v =3D $1} END{printf("%s\n", v)}'
0xblegh
23:07 [603] $ echo "0xblegh" | awk '{v =3D $1} END{printf("%s\n", v + 0)}'
11
23:07 [604] $ echo "0xblegh" | mawk '{v =3D $1} END{printf("%s\n", v + 0)}'
11
23:07 [605] $ echo "0xblegh" | gawk '{v =3D $1} END{printf("%s\n", v + 0)}'
0
23:07 [606] $ echo "0xblegh" | gawk '{v =3D $1} END{printf("%d\n", v + 0)}'
0
23:08 [607] $ echo "0xblegh" | mawk '{v =3D $1} END{printf("%d\n", v + 0)}'
11
23:08 [608] $ echo "0xblegh" | awk '{v =3D $1} END{printf("%d\n", v + 0)}'
11
23:08 [609] $ echo "0xblegh" | mawk '{v =3D $1} END{printf("%d\n", v "")}'
11
23:09 [610] $ echo "0xblegh" | awk '{v =3D $1} END{printf("%d\n", v "")}'
11
23:09 [611] $ echo "0xblegh" | gawk '{v =3D $1} END{printf("%d\n", v "")}'
0
23:09 [612] $ echo "0xblegh" | gawk '{v =3D $1} END{printf("%s\n", v "")}'
0xblegh
23:09 [613] $ echo "0xblegh" | mawk '{v =3D $1} END{printf("%s\n", v "")}'
0xblegh
23:09 [614] $ echo "0xblegh" | awk '{v =3D $1} END{printf("%s\n", v "")}'
0xblegh
--=20
Greg A. Woods
Planix, Inc.
<woods%planix.com@localhost> +1 416 218 0099
http://www.planix.com/
--pgp-sign-Multipart_Sat_Dec_19_00:01:11_2009-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (NetBSD)
iD8DBQBLLF4XZn1xt3i/9H8RAlzTAJ9TW5KGuzGgHN4zAYSYOlU3pRFY5wCffXMI
cLw0/Z8hq8IRLdqIwsvAyDM=
=3G8y
-----END PGP SIGNATURE-----
--pgp-sign-Multipart_Sat_Dec_19_00:01:11_2009-1--
Home |
Main Index |
Thread Index |
Old Index