Subject: Re: regexec(3) is broken or I am
To: None <netbsd-help@netbsd.org>
From: Christos Zoulas <christos@astron.com>
List: netbsd-help
Date: 11/29/2005 01:00:17
In article <20051128234142.GA5519@oak.schemamania.org>,
 <jklowden@schemamania.org> wrote:
>I thought I understood regexec(3), but either I don't or it doesn't.
>Would any of the assembled gurus like to debug my code?  Please? 
>
>AIUI, regcomp(3) returns the number of parenthesized subexpressions
>in regex_t::re_nsub.  Then regexec fills the regmatch_t array with
>offset pairs.  The first array element describes the overall match,
>and the subsequent elements describe the subexpressions, in order
>of appearance by left parenthesis.
>
>I find the following anomalies:
>
>1.  rm_so and rm_eo are zero-based if no parentheses are present,
>else are 1-based.
>
>2.  Standard character class names aren't recognized, even with
>REG_EXTENDED.
>
>3.  Offsets are often wrong. 
>
>4.  Multiple parenthesized subexpressions don't match.  
>
>5.  If the RE matches more than once, the last match should be 
>returned, but isn't. 
>
>I can't believe the library is that broken.  I didn't find a bug
>report.  Google turned up several uses of regexec in the /usr/src,
>but none (that I noticed) that use multiple subexpressions.  
>I'm sure it's my test program, but for the life of me I don't see
>what's wrong.  The attached simple program and output illustrates
>the problems.
>
>I am using 2.0_BETA:
>$ uname -a |sed 's/autobuild.*$//'
>NetBSD hello.acml.com 2.0_BETA NetBSD 2.0_BETA (GENERIC) #0: Thu Sep 23
>02:37:20 UTC 2004  
>
>Am I writing the wrong program, or misunderstanding the documentation?  
>
>Many thanks for your kind attention.
>

Can you send your input file too?

christos