Subject: CVS commit: pkgsrc/devel/pcre
To: None <pkgsrc-changes@NetBSD.org>
From: Julio M. Merino Vidal <jmmv@netbsd.org>
List: pkgsrc-changes
Date: 12/12/2003 22:33:36
Module Name:	pkgsrc
Committed By:	jmmv
Date:		Fri Dec 12 22:33:36 UTC 2003

Modified Files:
	pkgsrc/devel/pcre: Makefile distinfo
	pkgsrc/devel/pcre/patches: patch-aa

Log Message:
Update to 4.5:

 1. There has been some re-arrangement of the code for the match() function so
    that it can be compiled in a version that does not call itself recursively.
    Instead, it keeps those local variables that need separate instances for
    each "recursion" in a frame on the heap, and gets/frees frames whenever it
    needs to "recurse". Keeping track of where control must go is done by means
    of setjmp/longjmp. The whole thing is implemented by a set of macros that
    hide most of the details from the main code, and operates only if
    NO_RECURSE is defined while compiling pcre.c. If PCRE is built using the
    "configure" mechanism, "--disable-stack-for-recursion" turns on this way of
    operating.

    To make it easier for callers to provide specially tailored get/free
    functions for this usage, two new functions, pcre_stack_malloc, and
    pcre_stack_free, are used. They are always called in strict stacking order,
    and the size of block requested is always the same.

    The PCRE_CONFIG_STACKRECURSE info parameter can be used to find out whether
    PCRE has been compiled to use the stack or the heap for recursion. The
    -C option of pcretest uses this to show which version is compiled.

    A new data escape \S, is added to pcretest; it causes the amounts of store
    obtained and freed by both kinds of malloc/free at match time to be added
    to the output.

 2. Changed the locale test to use "fr_FR" instead of "fr" because that's
    what's available on my current Linux desktop machine.

 3. When matching a UTF-8 string, the test for a valid string at the start has
    been extended. If start_offset is not zero, PCRE now checks that it points
    to a byte that is the start of a UTF-8 character. If not, it returns
    PCRE_ERROR_BADUTF8_OFFSET (-11). Note: the whole string is still checked;
    this is necessary because there may be backward assertions in the pattern.
    When matching the same subject several times, it may save resources to use
    PCRE_NO_UTF8_CHECK on all but the first call if the string is long.

 4. The code for checking the validity of UTF-8 strings has been tightened so
    that it rejects (a) strings containing 0xfe or 0xff bytes and (b) strings
    containing "overlong sequences".

 5. Fixed a bug (appearing twice) that I could not find any way of exploiting!
    I had written "if ((digitab[*p++] && chtab_digit) == 0)" where the "&&"
    should have been "&", but it just so happened that all the cases this let
    through by mistake were picked up later in the function.

 6. I had used a variable called "isblank" - this is a C99 function, causing
    some compilers to warn. To avoid this, I renamed it (as "blankclass").

 7. Cosmetic: (a) only output another newline at the end of pcretest if it is
    prompting; (b) run "./pcretest /dev/null" at the start of the test script
    so the version is shown; (c) stop "make test" echoing "./RunTest".

 8. Added patches from David Burgess to enable PCRE to run on EBCDIC systems.

 9. The prototype for memmove() for systems that don't have it was using
    size_t, but the inclusion of the header that defines size_t was later. I've
    moved the #includes for the C headers earlier to avoid this.

10. Added some adjustments to the code to make it easier to compiler on certain
    special systems:

      (a) Some "const" qualifiers were missing.
      (b) Added the macro EXPORT before all exported functions; by default this
          is defined to be empty.
      (c) Changed the dftables auxiliary program (that builds chartables.c) so
          that it reads its output file name as an argument instead of writing
          to the standard output and assuming this can be redirected.

11. In UTF-8 mode, if a recursive reference (e.g. (?1)) followed a character
    class containing characters with values greater than 255, PCRE compilation
    went into a loop.

12. A recursive reference to a subpattern that was within another subpattern
    that had a minimum quantifier of zero caused PCRE to crash. For example,
    (x(y(?2))z)? provoked this bug with a subject that got as far as the
    recursion. If the recursively-called subpattern itself had a zero repeat,
    that was OK.

13. In pcretest, the buffer for reading a data line was set at 30K, but the
    buffer into which it was copied (for escape processing) was still set at
    1024, so long lines caused crashes.

14. A pattern such as /[ab]{1,3}+/ failed to compile, giving the error
    "internal error: code overflow...". This applied to any character class
    that was followed by a possessive quantifier.

15. Modified the Makefile to add libpcre.la as a prerequisite for
    libpcreposix.la because I was told this is needed for a parallel build to
    work.

16. If a pattern that contained .* following optional items at the start was
    studied, the wrong optimizing data was generated, leading to matching
    errors. For example, studying /[ab]*.*c/ concluded, erroneously, that any
    matching string must start with a or b or c. The correct conclusion for
    this pattern is that a match can start with any character.


To generate a diff of this commit:
cvs rdiff -r1.15 -r1.16 pkgsrc/devel/pcre/Makefile
cvs rdiff -r1.9 -r1.10 pkgsrc/devel/pcre/distinfo
cvs rdiff -r1.4 -r1.5 pkgsrc/devel/pcre/patches/patch-aa

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.