Subject: Re: the state of regex(3)
To: NetBSD Userlevel Technical Discussion List <tech-userlevel@NetBSD.ORG>
From: Alistair Crooks <agc@pkgsrc.org>
List: tech-userlevel
Date: 09/28/2004 17:38:21
On Fri, Jan 02, 2004 at 05:48:18PM -0500, Greg A. Woods wrote:
> I had forgotten that I had a basic bit of test harness for doing simple
> regex testing and benchmarking with the egrep implementation by James
> Howard and Dag-Erling Sm?rgrav (which is just a wrapper around any POSIX
> regex library).  Remembering this prompted me to fetch and compile the
> latest versions of the various libraries mentioned so far and give them
> each a test run.
> [...] 
> For the rest here are some timing results from the following silly test
> I use to find obvious viruses in e-mail, as run across about 64MB of
> accumulated virus e-mail.  So far PCRE is the clear winner by a country
> mile and TRE is way ahead of the rest of the pack.  TRE will probably
> also improve quite a bit more before there's a 1.x release of it.  TRE
> has become very much more interesting in the latest release too -- it
> now has true support for approximate pattern matching using real EREs
> (i.e. in a manner vastly superior to the old agrep).

With thanks to Greg for his benchmarking, which I've deleted, but is in
the archive.

Thomas Klausner has just updated the PCRE package to 5.0. It's
interesting to note that this update says:

	Log Message:
	Update to 5.0:

	Release 5.0 13-Sep-04
	---------------------

	The licence under which PCRE is released has been changed to the more
	conventional "BSD" licence.

	In the code, some bugs have been fixed, and there are also some major changes
	in this release (which is why I've increased the number to 5.0). Some changes
	are internal rearrangements, and some provide a number of new facilities.

Assuming that the internal rearrangements have not clobbered the performance
in any way, is there any reason to stay with the old regex(3) implementation?
Shouldn't we just move to pcre?

Regards,
Alistair