Subject: Bug in regex library?
To: None <current-users@NetBSD.ORG>
From: Marc Baudoin <Marc.Baudoin@hsc.fr>
List: current-users
Date: 10/29/1996 16:33:08
Hi,

I wonder if there's a bug in the handling of the rm_eo member of the
regmatch_t structure in the regex library.  Here's an example program:

---------------------------------------------------------------------------
#include <regex.h>
#include <stdio.h>
#include <stdlib.h>

int
main ( int argc , char **argv )
{
   char         line[BUFSIZ] ;
   regex_t      preg ;
   regmatch_t   pmatch[10] ;
   int          i ;

   printf ( "\nTest regex\n\nEntrez une chaîne : " ) ;
   fgets ( line , sizeof ( line ) , stdin ) ;
   printf ( "regcomp %d\n" , regcomp ( &preg , "[a-z]+@([a-z]+)" ,
                                       REG_EXTENDED ) ) ;
   printf ( "re_nsub %d\n" , preg.re_nsub) ;
   i = regexec ( &preg , line , 10 , pmatch , 0 ) ;
   printf ( "regexec %d\n" , i ) ;
   if ( i != 0 )
   {
      char buf[BUFSIZ] ;

      regerror ( i , &preg , buf , sizeof ( buf ) ) ;
      printf ( "#%s#\n" , buf ) ;
   }
   regfree ( &preg ) ;
   printf ( "%ld %ld\n" , pmatch[0].rm_so , pmatch[0].rm_eo ) ;
   printf ( "%ld %ld\n" , pmatch[1].rm_so , pmatch[1].rm_eo ) ;
   printf ( "%ld %ld\n" , pmatch[2].rm_so , pmatch[2].rm_eo ) ;

   exit ( EXIT_SUCCESS ) ;
}
---------------------------------------------------------------------------

It matches a string againt the [a-z]+@([a-z]+) regular expression and
remembers what's after the @.  Here is its output on a NetBSD 1.2 box:

Test regex

Entrez une chaîne : 12roro@tata34
regcomp 0
re_nsub 1
regexec 0
2 0
7 0
-1 -1

But it should be 2 11 and 7 11 instead of 2 0 and 7 0.  I have the same
wrong result on a FreeBSD 2.1.5 box, but it works right OK on Solaris 2.5
and HP-UX 9.05 and I can't just believe it!

It works OK if I use the GNU regex 0.12 library.

Any idea?  Is it a known bug?  Did I forget some option?

-- 
Marc Baudoin   -=-   <Marc.Baudoin@hsc.fr>
Hervé Schauer Consultants