NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/38108: single regexp implementation for NetBSD base system



The following reply was made to PR bin/38108; it has been noted by GNATS.

From: Aleksey Cheusov <cheusov%tut.by@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: bin/38108: single regexp implementation for NetBSD base system
Date: Wed, 27 Feb 2008 00:33:30 +0200

  >> It whould be nice to AWK/SED and GREP using the same regexp engine
  >> from NetBSD libc. Or at least AWK able to be built with external
  >> regexp engine that supports UTF-8. The same for usr.bin/grep and sed.
 
 >  Ideally I'd like to be able to replace lib/libc/regex with Henry  
 >  Spencer's latest regex implementation as it is found, for example, in  
 >  the TCL sources; and then of course have this new implementation be  
 >  used universally in all the common RE-capable tools on the system.
 I don't know what exactly is included in TCL.
 But I've packaged a patched version of Henry Specer's regexp engine
 from here http://arglist.com/regex/.
 
 See devel/librxspencer package.
 
  >> SUN did this for their AWK years ago, see wip/heirloom-awk.
 >  
 >  NetBSD uses the one true version of AWK from its author and current  
 >  maintainer.  See the doc/3RDPARTY entry for "nawk".
 I know this. Note that wip/heirloom-awk open sourced
 by Caldera and SUN years ago was based on the same original
 nawk sources. Many years ago they separate regexp engine
 into a library. awk, grep and sed use it.
 See wip/heirloom-grep, wip/heirloom-sed and wip/libuxre and packages.
 libuxre is POSIX compatible aware of utf-8.
 
 >  Beware though that AWK as a language definition includes much, if not  
 >  all, of the RE syntax and semantics too and so arbitrarily switching  
 >  to a different RE implementation in the AWK interpreter is not  
 >  necessarily a good thing.
 According to SUS AWK's regexp should conform to ERE also defined in SUS.
 There are some exceptions and the wrapper function I provided did
 everything needed. Additional checks are welcome, of course.
 
 >  It would, for example, lead to the possibility of many common
 >  portable scripts, including those used on NetBSD through pkgsrc, to
 >  fail in strange and mysterious ways.
 I don't think so provided that these scripts are really "portable".
 Good example is cyrly braces which, according to SUS, are special
 characters ( {n,m} notation ) and MUST NOT be used as ordinary
 characters in really portable scripts.
 Anyway _unconditional_ inclusion of nawk to pkgsrc bootstrap
 was a mistake IMHO ;) I guess solaris's native /usr/xpg4/bin/awk is
 good enough. Also see /usr/pkg/heirloom/bin/posix2001/nawk
 from wip/heirloom-awk package which supports _stabdard_ {n,m}
 while NetBSD's nawk doesn't.
 
 Also note, that wrapper function I provided keeps \x as plain x if x
 is not n,r,t etc., i.e. it keeps awk's regexp (backslashed characters)
 backward compatible with nawk and many other awk implementations.
 
 >  Sometimes it really is good to have a given tool provide its own
 >  standardized version of a feature.
 oawk is already dead. Today is 2008. POSIX exists for so many years...
 
 -- 
 Best regards, Aleksey Cheusov.
 


Home | Main Index | Thread Index | Old Index