NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/42463: Bizarre behavior in awk with invalid numeric constants



The following reply was made to PR bin/42463; it has been noted by GNATS.

From: "Greg A. Woods" <woods%planix.ca@localhost>
To: NetBSD GNATS <gnats-bugs%NetBSD.org@localhost>
Cc: 
Subject: Re: bin/42463: Bizarre behavior in awk with invalid numeric constants
Date: Sat, 19 Dec 2009 00:01:11 -0500

 --pgp-sign-Multipart_Sat_Dec_19_00:01:11_2009-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable
 
 At Wed, 16 Dec 2009 21:10:00 +0000 (UTC), dholland%eecs.harvard.edu@localhost 
wrote:
 Subject: bin/42463: Bizarre behavior in awk with invalid numeric constants
 >=20
 > This is not so surprising, although one would expect it to generate a
 > syntax error (recall that awk doesn't handle hex integer constants...)
 
 Well, a syntax error would not really be correct so far as I can tell,
 maybe not even for constants in the program text.
 
 As you know an awk scalar variable has both a string and a number value
 at the same time; and expressions take on string or numeric values as
 appropriate.
 
 As far as I can find very little is said in the awk book or the awk(1)
 manual about numerical constants.  Both do say though that string
 constants are quoted with double quote characters (and can contain
 C-like character escapes), and regular expression constants are quoted
 with slash characters.
 
 The awk book does say, (appendix A, p.192) "The numeric value of an
 arbitrary string is the numeric value of its numeric prefix."
 
 The mawk(1) manual says this about numeric constants:
 
          Numeric constants can be integer like -2, decimal like 1.08,
        or in scientific notation like -1.1e4 or .28E-3.
 
 So, in all your examples the numeric value of the unnamed constant you
 give as "0xblegh" should probably be zero, IIUC, at least for awk and
 mawk.
 
 However as you've shown it doesn't seem as though things actually work
 the way _I_ would expect when it comes to expressions containing
 un-quoted non-numeric constants with numeric prefixes.
 
 Interestingly to me awk and mawk behave in exactly the same bizarre way:
 
 $ awk 'BEGIN{v =3D 0xblegh - 5; printf("%s\n", v) }'
 0-5
 $ mawk 'BEGIN{v =3D 0xblegh - 5; printf("%s\n", v) }'
 0-5
 $ gawk 'BEGIN{v =3D 0xblegh - 5; printf("%s\n", v) }'
 11-5
 
 Those examples really do floor me.  What an amazing side effect, and
 identically in two different implementations!  I can only guess without
 looking at the code that mawk tries very hard to mimic awk's behaviour
 here.  Gawk is almost being even more bizarre, but at least it might be
 getting the interpretation of the first constant correct, for some
 meaning of correct as per its own documentation.
 
 Given the following as well it looks as if the parser is sticking the
 numeric value of the first term into the number part of the variable,
 and then sticking the numeric part of the second "term" into the string
 part of the variable, but just as if it parsed number, not as the
 operator and second value:
 
 $ awk 'BEGIN{v = 0xblegh + 9; printf("%s\n", v) }'  
 09
 
 
 Gawk does just print the correct result if the hex number is indeed a
 proper hex number, and if the variable is printed as a string:
 
 $ gawk 'BEGIN{v = 0x11 - 5; printf "%s\n", v }'
 12
 
 but I guess that's not really a surprise of any kind.
 
 
 To avoid any possible code parser issues we can feed the value in as
 input, and indeed that does then seem to have a better result, though
 still not entirely an expected result since, IIUC, neither awk nor mawk
 should interpret hex for input values, but apparently they do:
 
 23:07 [602] $ echo "0xblegh" | awk '{v =3D $1} END{printf("%s\n", v)}'
 0xblegh
 23:07 [603] $ echo "0xblegh" | awk '{v =3D $1} END{printf("%s\n", v + 0)}'
 11
 23:07 [604] $ echo "0xblegh" | mawk '{v =3D $1} END{printf("%s\n", v + 0)}'
 11
 23:07 [605] $ echo "0xblegh" | gawk '{v =3D $1} END{printf("%s\n", v + 0)}'
 0
 23:07 [606] $ echo "0xblegh" | gawk '{v =3D $1} END{printf("%d\n", v + 0)}'
 0
 23:08 [607] $ echo "0xblegh" | mawk '{v =3D $1} END{printf("%d\n", v + 0)}'
 11
 23:08 [608] $ echo "0xblegh" | awk '{v =3D $1} END{printf("%d\n", v + 0)}'
 11
 23:08 [609] $ echo "0xblegh" | mawk '{v =3D $1} END{printf("%d\n", v "")}'
 11
 23:09 [610] $ echo "0xblegh" | awk '{v =3D $1} END{printf("%d\n", v "")}'
 11
 23:09 [611] $ echo "0xblegh" | gawk '{v =3D $1} END{printf("%d\n", v "")}'
 0
 23:09 [612] $ echo "0xblegh" | gawk '{v =3D $1} END{printf("%s\n", v "")}'
 0xblegh
 23:09 [613] $ echo "0xblegh" | mawk '{v =3D $1} END{printf("%s\n", v "")}'
 0xblegh
 23:09 [614] $ echo "0xblegh" | awk '{v =3D $1} END{printf("%s\n", v "")}'
 0xblegh
 
 
 
 --=20
                                                Greg A. Woods
                                                Planix, Inc.
 
 <woods%planix.com@localhost>       +1 416 218 0099        
http://www.planix.com/
 
 --pgp-sign-Multipart_Sat_Dec_19_00:01:11_2009-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.9 (NetBSD)
 
 iD8DBQBLLF4XZn1xt3i/9H8RAlzTAJ9TW5KGuzGgHN4zAYSYOlU3pRFY5wCffXMI
 cLw0/Z8hq8IRLdqIwsvAyDM=
 =3G8y
 -----END PGP SIGNATURE-----
 
 --pgp-sign-Multipart_Sat_Dec_19_00:01:11_2009-1--
 


Home | Main Index | Thread Index | Old Index