NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/38108: single regexp implementation for NetBSD base system
The following reply was made to PR bin/38108; it has been noted by GNATS.
From: Aleksey Cheusov <cheusov%tut.by@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: bin/38108: single regexp implementation for NetBSD base system
Date: Wed, 27 Feb 2008 00:33:30 +0200
>> It whould be nice to AWK/SED and GREP using the same regexp engine
>> from NetBSD libc. Or at least AWK able to be built with external
>> regexp engine that supports UTF-8. The same for usr.bin/grep and sed.
> Ideally I'd like to be able to replace lib/libc/regex with Henry
> Spencer's latest regex implementation as it is found, for example, in
> the TCL sources; and then of course have this new implementation be
> used universally in all the common RE-capable tools on the system.
I don't know what exactly is included in TCL.
But I've packaged a patched version of Henry Specer's regexp engine
from here http://arglist.com/regex/.
See devel/librxspencer package.
>> SUN did this for their AWK years ago, see wip/heirloom-awk.
>
> NetBSD uses the one true version of AWK from its author and current
> maintainer. See the doc/3RDPARTY entry for "nawk".
I know this. Note that wip/heirloom-awk open sourced
by Caldera and SUN years ago was based on the same original
nawk sources. Many years ago they separate regexp engine
into a library. awk, grep and sed use it.
See wip/heirloom-grep, wip/heirloom-sed and wip/libuxre and packages.
libuxre is POSIX compatible aware of utf-8.
> Beware though that AWK as a language definition includes much, if not
> all, of the RE syntax and semantics too and so arbitrarily switching
> to a different RE implementation in the AWK interpreter is not
> necessarily a good thing.
According to SUS AWK's regexp should conform to ERE also defined in SUS.
There are some exceptions and the wrapper function I provided did
everything needed. Additional checks are welcome, of course.
> It would, for example, lead to the possibility of many common
> portable scripts, including those used on NetBSD through pkgsrc, to
> fail in strange and mysterious ways.
I don't think so provided that these scripts are really "portable".
Good example is cyrly braces which, according to SUS, are special
characters ( {n,m} notation ) and MUST NOT be used as ordinary
characters in really portable scripts.
Anyway _unconditional_ inclusion of nawk to pkgsrc bootstrap
was a mistake IMHO ;) I guess solaris's native /usr/xpg4/bin/awk is
good enough. Also see /usr/pkg/heirloom/bin/posix2001/nawk
from wip/heirloom-awk package which supports _stabdard_ {n,m}
while NetBSD's nawk doesn't.
Also note, that wrapper function I provided keeps \x as plain x if x
is not n,r,t etc., i.e. it keeps awk's regexp (backslashed characters)
backward compatible with nawk and many other awk implementations.
> Sometimes it really is good to have a given tool provide its own
> standardized version of a feature.
oawk is already dead. Today is 2008. POSIX exists for so many years...
--
Best regards, Aleksey Cheusov.
Home |
Main Index |
Thread Index |
Old Index