tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: TRE regex



On 6 June 2016 at 18:35, James K. Lowden <jklowden%schemamania.org@localhost> wrote:
> Back in 2009, Matthias-Christian Ott ported Ville Laurikari's regex
> library, apparently with the intention of replacing the one in base,
> originally from Henry Spencer.
>
>         http://mail-index.netbsd.org/tech-userlevel/2009/08/03/msg002477.html
>
> What happened?  Other than the announcement, I find no discussion about
> it.  I see it's in pkgsrc, all well and good, but why was  the project
> was undertaken and the work not brought into base?

It was brought into base. The USE_LIBTRE definitions causes things to
happen in libc if it's defined.

However, tre itself has no accommodation of basic regexps. Basic
regexps are the default in sed.
Strange things happen when you attempt to compile a basic regexp with
an implementation expecting
an extended regexp, to the point where build.sh would not complete.

I do have a partial fix for that - take a look at the recently-added
regextend(3) in othersrc/external/bsd -
but until I've finished bringing that into libc, tre-based regexps
will have to wait.

There are other implementation-dependent issues, too, not necessarily
specific to tre - the ability to backtrack, to have specific-number
of repetitions, wide character support, efficient searching, etc.

> In case you are feeling complacent about NetBSD's regex, the awk
> documentation relies on it, and falls short.  Awk claims to implement
> regex per egrep(1) -- providing no further description -- but that's
> just docurot:
>
>         $ echo aaa | egrep 'a{3}' | wc -l
>                1
>         $ echo aaa | awk '/a{3}/' | wc -l
>                0
>
> As far as I know, we have 3 regex definitions in base: GNU grep, NetBSD
> sed (with regex(3), defined by re_format(7)), and NetBSD awk.  It would
> be an improvement IMO to use one implementation for all utilities in
> base, to make them internally consistent and dependable (and
> reproducible), even at the expense of compatibilitly with GNU's
> implementations.

The awk documentation describes Bell Labs egrep, for fairly obvious reasons.
The egrep in NetBSD is from GNU grep.

However, overall, you should look at Russ Cox's tutorials on regular
expressions in

https://swtch.com/~rsc/regexp/

Highly recommended.

Regards,
Alistair


Home | Main Index | Thread Index | Old Index