Subject: Re: the state of regex(3) (was: Policy questions)
To: NetBSD Userlevel Technical Discussion List <tech-userlevel@NetBSD.ORG>
From: Greg A. Woods <>
List: tech-userlevel
Date: 01/01/2004 03:05:10
[ On Monday, December 29, 2003 at 10:49:08 (-0500), Perry E. Metzger wrote: ]
> Subject: Re: Policy questions
> "Bruce J.A. Nourish" <> writes:
> >  * The man page to regex(3), circa 1994, claims that regex(3) is "an
> >    alpha release with known defects." As it has been rigorously tested
> >    for almost 10 years, could we remove that comment?
> Well, it is certainly no longer an alpha release. :)
> Are there still known defects?

It's deathly slow.  (compared to many (all? :-) other implementations)

According to Henry speeding it up requires a complete rewrite, one he
was apparently working on for some time, but hasn't come close to
finishing yet.

I've been hunting around off and on for a couple of years now for a
POSIX compatible ERE library to replace NetBSD's regex with, but
unfortunately I haven't found many that are _not_ currently covered by
the GNU copyright licence, and not even many that are ready with POSIX

There is Tatu Ylonen's 1991 release of his regexpr library.  It's
covered by a reasonable copyright, but I don't think it's as stable as
Henry's code despite being quite mature.  It implements Emacs REs, as
well as traditional "egrep" EREs, and if I remember correclty it is
quite a bit faster than Henry's code.  I think it would still need quite
a bit of work to be made POSIX compatible, but it might be a good
starting place.

Ozan's public domain regex library is quite fast, but not nearly POSIX
compatible in any way shape or form.  I've talked to him about adding a
POSIX API, and possibly about fleshing out POSIX ERE support too, but he
hasn't found the time, nor have I.

BWK's AWK has a rather fast and somewhat extended RE implementation
too, and it's under a reasonable copyright as well, but it too would
need some work to support a POSIX compatible API (and it's REs are not
quite POSIX compatible either).

Another very fast and stable ERE implementation can be found in mawk.
It would would also need quite a bit of hacking to make it more POSIX
compatible.  I've been meaning to ask Michael Brennan if he would
consider relicensing it under a more BSD-like copyright.

PCRE is very fast and stable too (as well as being very well
documented), but although it includes a POSIX API it only implements
Perl-compatible EREs, not true POSIX EREs.  I seriously doubt Philip
would consider relicensing it under a more BSD-like copyright though,
but I suppose it wouldn't hurt to ask.

Brenton Hoff's Grouse Grep is also amazingly fast (often faster than GNU
Grep, and certainly _much_ smaller!), but I think it would require some
considerable work to extract a POSIX compatible API regex library from
it, and apparently the new version is now GPL'ed too.

I should also mention Andy Key's Extended Regular Expression Handler.
It's fullly public domain and supports a good portion of POSIX ERE
syntax.  However it doesn't have a POSIX API.  Unfortunately I've not
had a chance to really benchmark it yet, though it's apparently
reasonably fast -- just not quite as fast as GNU Regex (which means it
should still be a lot faster than Henry's regex :-).  It should be
easier to add a POSIX API wrapper to it than it would be to any of the

						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <>
Planix, Inc. <>          Secrets of the Weird <>