Subject: Re: Bug in regex library?
To: None <Marc.Baudoin@hsc.fr>
From: der Mouse <mouse@Holo.Rodents.Montreal.QC.CA>
List: current-users
Date: 10/29/1996 11:37:56
> I wonder if there's a bug in the handling of the rm_eo member of the
> regmatch_t structure in the regex library.

Yes, but it's your bug, not NetBSD's. :-)

> Here's an example program: 

> #include <regex.h>
[...]
>    printf ( "%ld %ld\n" , pmatch[0].rm_so , pmatch[0].rm_eo ) ;
>    printf ( "%ld %ld\n" , pmatch[1].rm_so , pmatch[1].rm_eo ) ;
>    printf ( "%ld %ld\n" , pmatch[2].rm_so , pmatch[2].rm_eo ) ;

> 2 0
> 7 0
> -1 -1

> But it should be 2 11 and 7 11 instead of 2 0 and 7 0.

The problem is that the rm_so and rm_eo members are not long ints, but
you're printing them as if the were - with %ld.  Compile with -Wformat
and you'll see; I get

x.c: In function `main':
"x.c", line 28: warning: long int format, different type arg (arg 2)
"x.c", line 28: warning: long int format, different type arg (arg 3)
"x.c", line 29: warning: long int format, different type arg (arg 2)
"x.c", line 29: warning: long int format, different type arg (arg 3)
"x.c", line 30: warning: long int format, different type arg (arg 2)
"x.c", line 30: warning: long int format, different type arg (arg 3)

And when I make those three lines read

   printf ( "%ld %ld\n" , (long int)pmatch[0].rm_so , (long int)pmatch[0].rm_eo ) ;
   printf ( "%ld %ld\n" , (long int)pmatch[1].rm_so , (long int)pmatch[1].rm_eo ) ;
   printf ( "%ld %ld\n" , (long int)pmatch[2].rm_so , (long int)pmatch[2].rm_eo ) ;

then not only does -Wformat shut up, but I also get the expected values
printed out.

rm_so and rm_eo are of type regoff_t.  The reason you're getting the
unexpected behavior on NetBSD (and presumably on FreeBSD too) is that
there, regoff_t is a typedef for off_t - which is a 64-bit type, larger
than long int.  And since you're on a 32-bit machine, the printf is
printing out the two halves of the rm_so value and never getting around
to the rm_eo value.  (Indeed, because you get "2 0", I can tell you're
on a little-endian machine.  On the SPARC, which is big-endian, I got
"0 2" instead.)

> [B]ut it works right OK on Solaris 2.5 and HP-UX 9.05 and I can't
> just believe it!

They must have a regoff_t which is printf-compatible with long int, or
at least must use a type for rm_so and rm_eo that is.

> It works OK if I use the GNU regex 0.12 library.

Ditto.

> Any idea?  Is it a known bug?  Did I forget some option?

You made an invalid assumption about regoff_t - namely, that it's
printf-compatible with "long int" - and got burned by it.  That's all.

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     01 EE 31 F6 BB 0C 34 36  00 F3 7C 5A C1 A0 67 1D