NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/47840: awk string comparison of integer constant



The following reply was made to PR bin/47840; it has been noted by GNATS.

From: Valery Ushakov <uwe%stderr.spb.ru@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: bin/47840: awk string comparison of integer constant
Date: Tue, 21 May 2013 03:02:14 +0400

 On Mon, May 20, 2013 at 05:30:01 +0000, dholland%eecs.harvard.edu@localhost 
wrote:
 
 > Observe the following curious behavior:
 > 
 > macaran% jot 15 1 | awk '{ a[$1] = ($1 < 10); } END { for (k in a) { print 
 > k, a[k], (k < 10); }}'
 > 2 1 0
 [...]
 >
 > Note that k < 10 is evaluated as a string comparison.
 > 
 > Is this required by some standard? gawk does the same thing, but it
 > definitely violates the POLA.
 
 Hmm, it does, indeed, but read the already mentioned
 
 http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
 
 closer and pay attention to the definition of "numeric string".
 
   Expressions in awk
 
   [...]
 
   A string value shall be considered a NUMERIC STRING if it comes from
   one of the following:
 
     1. Field variables
     2. Input from the getline() function
     3. FILENAME
     4. ARGV array elements
     5. ENVIRON array elements
     6. Array elements created by the split() function
     7. A command line variable assignment
     8. Variable assignment from another numeric string variable
 
     ...  Whether or not a string is a numeric string shall be relevant
     only in contexts where that term is used in this section.
 
   [...]
 
   Comparisons (with the '<', "<=", "!=", "==", '>', and ">="
   operators) shall be made numerically if both operands are numeric,
   if one is numeric and the other has A STRING VALUE THAT IS A NUMERIC
   STRING, or if one is numeric and the other has the uninitialized
   value.  Otherwise, operands shall be converted to strings as
   required and a string comparison shall be made using the
   locale-specific collation sequence.
 
 So for (k in a) gives you k that is a string, but not a numeric
 string(!), and so the compariosn is done on strings.
 
   RATIONALE
 
   [...]
 
     The description for comparisons in the ISO POSIX-2:1993 standard
     did not properly describe historical practice because of the way
     numeric strings are compared as numbers.  The current rules cause
     the following code:
 
     if (0 == "000")
         print "strange, but true"
     else
         print "not true"
 
     to do a numeric comparison, causing the if to succeed. It should
     be intuitively obvious that this is incorrect behavior, and
     indeed, no historical implementation of awk actually behaves this
     way.
 
     To fix this problem, the definition of numeric string was enhanced
     to include only those values obtained from specific circumstances
     (mostly external sources) where it is not possible to determine
     unambiguously whether the value is intended to be a string or a
     numeric.
 
     Variables that are assigned to a numeric string shall also be
     treated as a numeric string.  (For example, the notion of a
     numeric string can be propagated across assignments.)
 
 -uwe
 



Home | Main Index | Thread Index | Old Index