NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/38108: single regexp implementation for NetBSD base system



 >> It whould be nice to AWK/SED and GREP using the same regexp engine
 >> from NetBSD libc. Or at least AWK able to be built with external
 >> regexp engine that supports UTF-8. The same for usr.bin/grep and sed.

>  Ideally I'd like to be able to replace lib/libc/regex with Henry  
>  Spencer's latest regex implementation as it is found, for example, in  
>  the TCL sources; and then of course have this new implementation be  
>  used universally in all the common RE-capable tools on the system.
I don't know what exactly is included in TCL.
But I've packaged a patched version of Henry Specer's regexp engine
from here http://arglist.com/regex/.

See devel/librxspencer package.

 >> SUN did this for their AWK years ago, see wip/heirloom-awk.
>  
>  NetBSD uses the one true version of AWK from its author and current  
>  maintainer.  See the doc/3RDPARTY entry for "nawk".
I know this. Note that wip/heirloom-awk open sourced
by Caldera and SUN years ago was based on the same original
nawk sources. Many years ago they separate regexp engine
into a library. awk, grep and sed use it.
See wip/heirloom-grep, wip/heirloom-sed and wip/libuxre and packages.
libuxre is POSIX compatible aware of utf-8.

>  Beware though that AWK as a language definition includes much, if not  
>  all, of the RE syntax and semantics too and so arbitrarily switching  
>  to a different RE implementation in the AWK interpreter is not  
>  necessarily a good thing.
According to SUS AWK's regexp should conform to ERE also defined in SUS.
There are some exceptions and the wrapper function I provided did
everything needed. Additional checks are welcome, of course.

>  It would, for example, lead to the possibility of many common
>  portable scripts, including those used on NetBSD through pkgsrc, to
>  fail in strange and mysterious ways.
I don't think so provided that these scripts are really "portable".
Good example is cyrly braces which, according to SUS, are special
characters ( {n,m} notation ) and MUST NOT be used as ordinary
characters in really portable scripts.
Anyway _unconditional_ inclusion of nawk to pkgsrc bootstrap
was a mistake IMHO ;) I guess solaris's native /usr/xpg4/bin/awk is
good enough. Also see /usr/pkg/heirloom/bin/posix2001/nawk
from wip/heirloom-awk package which supports _stabdard_ {n,m}
while NetBSD's nawk doesn't.

Also note, that wrapper function I provided keeps \x as plain x if x
is not n,r,t etc., i.e. it keeps awk's regexp (backslashed characters)
backward compatible with nawk and many other awk implementations.

>  Sometimes it really is good to have a given tool provide its own
>  standardized version of a feature.
oawk is already dead. Today is 2008. POSIX exists for so many years...

-- 
Best regards, Aleksey Cheusov.


Home | Main Index | Thread Index | Old Index