Subject: CVS commit: pkgsrc/textproc/icu
To: None <pkgsrc-changes@netbsd.org>
From: Lubomir Sedlacik <salo@netbsd.org>
List: pkgsrc-changes
Date: 03/22/2003 01:44:09
Module Name:	pkgsrc
Committed By:	salo
Date:		Fri Mar 21 23:44:09 UTC 2003

Modified Files:
	pkgsrc/textproc/icu: DESCR Makefile PLIST buildlink2.mk distinfo
	pkgsrc/textproc/icu/patches: patch-aa patch-ab patch-ac patch-ad
Removed Files:
	pkgsrc/textproc/icu/patches: patch-ae patch-af

Log Message:
Update to version 2.4.

Based on a PR pkg/20825 by Hiramatsu Yoshifumi, modified by me.

- follow PKG_SYSCONFDIR

List of major changes for this release:

  * Regular Expressions Phase 1
    ICU 2.4 introduces a Regular Expression C++ API that is modeled after
    the JDK 1.4 API. ICU 2.4's Regular Expression API supports Unicode
    level 1 regular expressions (see Unicode Regular Expression
    Guidelines) but not all pattern metacharacters and features are
    supported yet. Regular expressions leverage all of the UnicodeSet
    support, including all Unicode 3.2 property names and property value
    names. Future ICU releases will complete the pattern support, add
    support for higher Unicode regex levels, and improve performance. For
    more details see the API References and the User Guide.
  * Modularized ICU library building
    ICU 2.4 provides build-time switches to prune parts of the library
    code, for smaller custom distributions. For details see the readme
    file.
  * Character set alias management support
    Additional APIs map alias+standard to a unique charset name (e.g.,
    "Shift-JIS"+"IANA"->"ibm-943_P14A-2000") and enumerate all charset
    names in the alias table, not just the installed ones. See
    convrtrs.txt and ucnv.h.
    These APIs allow programmers to avoid data corruption problems when
    different platforms use the same names for different character
    conversion mappings.
  * EBCDIC-z/OS converter option
    The EBCDIC converter now handles swapped LF/NL mappings
    algorithmically instead of with modified .ucm/.cnv conversion table
    files. This makes this behavior available for all supported EBCDIC
    conversions without adding to the data package size. See "swaplfnl" in
    convrtrs.txt.
  * Additional converter
    A new converter implementation has been added for the encoding of IMAP
    mailbox names. See RFC 2060/5.1.3. Mailbox International Naming
    Convention and "IMAP-mailbox-name" in convrtrs.txt.
  * Customizable break iteration
    ICU 2.4 allows registration of a BreakIterator with a locale ID. This
    allows applications to provide more sophisticated word/sentence break
    engines and use them seamlessly with the ICU APIs. In future releases,
    this registration mechanism will be extended to all relevant ICU
    services. If you are interested in ICU customization, please try out
    this feature.
  * Collation performance
    ICU 2.4 collation was improved in several areas, with an emphasis on
    performance:
       * Latin-1: Improved performance of u_strcoll().
       * Russian/Cyrillic: Improved performance by tailoring collation for
         cyrillic-script languages, removing UCA contractions that are not
         used for modern Russian (this uses the [suppressContractions]
         tailoring option).
       * Korean: Improved performance by resolving collation elements for
         modern Hangul syllables at build time (this uses the [optimize]
         tailoring option).
       * Japanese: The default strength for Japanese was reduced from
         quaternary to tertiary as in all other locales.
  * UnicodeSet performance
    UnicodeSet performance is significantly improved, especially for
    add(codePoint) and contains(codePoint).
  * Unicode property aliases ICU 2.4 introduces APIs for mapping between
    all appropriate Unicode property aliases and property value aliases
    and ICU property enumeration constants. See u_getPropertyName() etc.
    in uchar.h.
  * Unicode string functions
       * There are new C functions for searching for last occurrences of
         characters and partial strings. See u_strrstr(), u_strrchr32()
         etc.
       * New C/C++/Java functions for efficient checking if a string
         contains more than a certain number of code points. See
         hasMoreChar32Than().
       * Copying UnicodeStrings via the standard assignment operator and
         copy constructor does not preserve readonly aliasing any more
         because this can sometimes have unexpected and dangerous effects.
         A new fastCopyFrom() member function provides the old copy
         semantics. See Jitterbug 1794 for more details.
  * UTF macros simplified
    The low-level C macros for handling code points in 8-bit and 16-bit
    Unicode strings have been replaced by a simpler, more consistent set
    with more concise names. For details see utf_old.h and utf.h.
    Similarly, ICU 2.4 defines the UChar32 consistently (now always as
    int32_t) and adds a U_SENTINEL non-code point value for new APIs.
  * Performance tests
    ICU 2.4 has a new performance test framework and additional
    performance tests using this framework. This is not currently
    documented, but it is available as part of the source distribution at
    source/test/perf/.


To generate a diff of this commit:
cvs rdiff -r1.1 -r1.2 pkgsrc/textproc/icu/DESCR
cvs rdiff -r1.11 -r1.12 pkgsrc/textproc/icu/Makefile
cvs rdiff -r1.2 -r1.3 pkgsrc/textproc/icu/PLIST \
    pkgsrc/textproc/icu/buildlink2.mk pkgsrc/textproc/icu/distinfo
cvs rdiff -r1.3 -r1.4 pkgsrc/textproc/icu/patches/patch-aa \
    pkgsrc/textproc/icu/patches/patch-ab pkgsrc/textproc/icu/patches/patch-ac \
    pkgsrc/textproc/icu/patches/patch-ad
cvs rdiff -r1.3 -r0 pkgsrc/textproc/icu/patches/patch-ae
cvs rdiff -r1.4 -r0 pkgsrc/textproc/icu/patches/patch-af

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.