Subject: Re: Bug in regex library?
To: None <Marc.Baudoin@hsc.fr>
From: der Mouse <mouse@Holo.Rodents.Montreal.QC.CA>
List: current-users
Date: 10/29/1996 11:37:56
> I wonder if there's a bug in the handling of the rm_eo member of the
> regmatch_t structure in the regex library.
Yes, but it's your bug, not NetBSD's. :-)
> Here's an example program:
> #include <regex.h>
[...]
> printf ( "%ld %ld\n" , pmatch[0].rm_so , pmatch[0].rm_eo ) ;
> printf ( "%ld %ld\n" , pmatch[1].rm_so , pmatch[1].rm_eo ) ;
> printf ( "%ld %ld\n" , pmatch[2].rm_so , pmatch[2].rm_eo ) ;
> 2 0
> 7 0
> -1 -1
> But it should be 2 11 and 7 11 instead of 2 0 and 7 0.
The problem is that the rm_so and rm_eo members are not long ints, but
you're printing them as if the were - with %ld. Compile with -Wformat
and you'll see; I get
x.c: In function `main':
"x.c", line 28: warning: long int format, different type arg (arg 2)
"x.c", line 28: warning: long int format, different type arg (arg 3)
"x.c", line 29: warning: long int format, different type arg (arg 2)
"x.c", line 29: warning: long int format, different type arg (arg 3)
"x.c", line 30: warning: long int format, different type arg (arg 2)
"x.c", line 30: warning: long int format, different type arg (arg 3)
And when I make those three lines read
printf ( "%ld %ld\n" , (long int)pmatch[0].rm_so , (long int)pmatch[0].rm_eo ) ;
printf ( "%ld %ld\n" , (long int)pmatch[1].rm_so , (long int)pmatch[1].rm_eo ) ;
printf ( "%ld %ld\n" , (long int)pmatch[2].rm_so , (long int)pmatch[2].rm_eo ) ;
then not only does -Wformat shut up, but I also get the expected values
printed out.
rm_so and rm_eo are of type regoff_t. The reason you're getting the
unexpected behavior on NetBSD (and presumably on FreeBSD too) is that
there, regoff_t is a typedef for off_t - which is a 64-bit type, larger
than long int. And since you're on a 32-bit machine, the printf is
printing out the two halves of the rm_so value and never getting around
to the rm_eo value. (Indeed, because you get "2 0", I can tell you're
on a little-endian machine. On the SPARC, which is big-endian, I got
"0 2" instead.)
> [B]ut it works right OK on Solaris 2.5 and HP-UX 9.05 and I can't
> just believe it!
They must have a regoff_t which is printf-compatible with long int, or
at least must use a type for rm_so and rm_eo that is.
> It works OK if I use the GNU regex 0.12 library.
Ditto.
> Any idea? Is it a known bug? Did I forget some option?
You made an invalid assumption about regoff_t - namely, that it's
printf-compatible with "long int" - and got burned by it. That's all.
der Mouse
mouse@rodents.montreal.qc.ca
01 EE 31 F6 BB 0C 34 36 00 F3 7C 5A C1 A0 67 1D