Subject: Re: the state of regex(3) (was: Policy questions)
To: NetBSD Userlevel Technical Discussion List <tech-userlevel@NetBSD.ORG>
From: Greg A. Woods <firstname.lastname@example.org>
Date: 01/01/2004 03:05:10
[ On Monday, December 29, 2003 at 10:49:08 (-0500), Perry E. Metzger wrote: ]
> Subject: Re: Policy questions
> "Bruce J.A. Nourish" <email@example.com> writes:
> > * The man page to regex(3), circa 1994, claims that regex(3) is "an
> > alpha release with known defects." As it has been rigorously tested
> > for almost 10 years, could we remove that comment?
> Well, it is certainly no longer an alpha release. :)
> Are there still known defects?
It's deathly slow. (compared to many (all? :-) other implementations)
According to Henry speeding it up requires a complete rewrite, one he
was apparently working on for some time, but hasn't come close to
I've been hunting around off and on for a couple of years now for a
POSIX compatible ERE library to replace NetBSD's regex with, but
unfortunately I haven't found many that are _not_ currently covered by
the GNU copyright licence, and not even many that are ready with POSIX
There is Tatu Ylonen's 1991 release of his regexpr library. It's
covered by a reasonable copyright, but I don't think it's as stable as
Henry's code despite being quite mature. It implements Emacs REs, as
well as traditional "egrep" EREs, and if I remember correclty it is
quite a bit faster than Henry's code. I think it would still need quite
a bit of work to be made POSIX compatible, but it might be a good
Ozan's public domain regex library is quite fast, but not nearly POSIX
compatible in any way shape or form. I've talked to him about adding a
POSIX API, and possibly about fleshing out POSIX ERE support too, but he
hasn't found the time, nor have I.
BWK's AWK has a rather fast and somewhat extended RE implementation
too, and it's under a reasonable copyright as well, but it too would
need some work to support a POSIX compatible API (and it's REs are not
quite POSIX compatible either).
Another very fast and stable ERE implementation can be found in mawk.
It would would also need quite a bit of hacking to make it more POSIX
compatible. I've been meaning to ask Michael Brennan if he would
consider relicensing it under a more BSD-like copyright.
PCRE is very fast and stable too (as well as being very well
documented), but although it includes a POSIX API it only implements
Perl-compatible EREs, not true POSIX EREs. I seriously doubt Philip
would consider relicensing it under a more BSD-like copyright though,
but I suppose it wouldn't hurt to ask.
Brenton Hoff's Grouse Grep is also amazingly fast (often faster than GNU
Grep, and certainly _much_ smaller!), but I think it would require some
considerable work to extract a POSIX compatible API regex library from
it, and apparently the new version is now GPL'ed too.
I should also mention Andy Key's Extended Regular Expression Handler.
It's fullly public domain and supports a good portion of POSIX ERE
syntax. However it doesn't have a POSIX API. Unfortunately I've not
had a chance to really benchmark it yet, though it's apparently
reasonably fast -- just not quite as fast as GNU Regex (which means it
should still be a lot faster than Henry's regex :-). It should be
easier to add a POSIX API wrapper to it than it would be to any of the
Greg A. Woods
+1 416 218-0098 VE3TCP RoboHack <firstname.lastname@example.org>
Planix, Inc. <email@example.com> Secrets of the Weird <firstname.lastname@example.org>