NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/39002: harmful AWK extension: non-portable escaped character



>Number:         39002
>Category:       bin
>Synopsis:       harmful AWK extension: non-portable escaped character
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Fri Jun 20 20:10:00 +0000 2008
>Originator:     cheusov%tut.by@localhost
>Release:        NetBSD 4.0_STABLE
>Organization:
>Environment:
System: NetBSD chen.chizhovka.net 4.0_STABLE NetBSD 4.0_STABLE (GENERIC) #3: 
Wed Apr 23 00:58:08 EEST 2008 
cheusov%chen.chizhovka.net@localhost:/srv/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
Portability is one of main Goal of the NetBSD project.
NetBSD itself is portable to huge amount of hardware platforms.
It declared that NetBSD base system can cross-compiled on other systems.
pkgsrc project is portable to many operating systems etc.
All this sounds amazing but sometimes reality is different.

http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html:

   Lexical Conventions

      The token STRING shall represent a string constant. A string
      constant shall begin with the character ' .' Within a string
      constant, a backslash character shall be considered to begin an
      escape sequence as specified in the table in the Base
      Definitions volume of IEEE Std 1003.1-2001, Chapter 5, File
      Format Notation ( '\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v'
      ).
      ...

The problem with NetBSD awk is that it supports treats extra esacpe
sequences, that \<other_char> as plain <other_char>.

Example:
      0 ~>/usr/bin/awk 'BEGIN {print "\."}'
      .
      0 ~>/usr/bin/awk 'BEGIN {print "\$"}' 
      $
      0 ~>/usr/bin/awk 'BEGIN {print "\z"}' 
      z
      0 ~>

I now at least two problems in NetBSD code caused by this extension.
     
     kern/38766: makesyscalls.sh breaks build if mawk is used

       Here building kernel failed under Linux and mawk is in use because
       mawk treat \$ as \$ (not as $).

     pkg/33410: pkgsrc problem with posix awk

       Here pkgsrc passed `\.' , `\$' and `\/' to awk interpreter again
       and pkgsrc might fail with mawk or other awk implementations.
       Note: that days pkgsrc used native version of awk.

What others do:
mawk:               treats \<other_char> as \<other_char>
gawk:               prints warning message and treats as plain <other_char>
HP-UX /usr/bin/awk: treats \<other_char> as \<other_char>

     0 ~>/usr/bin/awk 'BEGIN {print "\$"}' 
     $
     0 ~>/usr/pkg/bin/mawk 'BEGIN {print "\$"}'
     \$
     0 ~>/usr/pkg/bin/gawk 'BEGIN {print "\$"}' 
     gawk: warning: escape sequence `\$' treated as plain `$'
     $
     0 ~>

I think gawk does right thing here and I'd like to see the same in NetBSD.
Even better - exit with error in this case ;) I personally vote for this.
In this case NetBSD code will be even more portable.
And those programs depeloped under NetBSD will have better portability.

>Fix:
Index: lex.c
===================================================================
RCS file: /pub/NetBSD-CVS/src/dist/nawk/lex.c,v
retrieving revision 1.7.4.1
diff -u -r1.7.4.1 lex.c
--- lex.c       3 Feb 2008 00:23:16 -0000       1.7.4.1
+++ lex.c       20 Jun 2008 20:04:46 -0000
@@ -431,7 +431,8 @@
                                break;
                            }
 
-                       default: 
+                       default:
+                               WARNING ("warning: escape sequence \\`%c' 
treated as plain `%c'", c, c);
                                *bp++ = c;
                                break;
                        }



Home | Main Index | Thread Index | Old Index