pkgsrc-Changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

CVS commit: pkgsrc/textproc/py-snowballstemmer



Module Name:    pkgsrc
Committed By:   adam
Date:           Mon May 25 06:30:05 UTC 2026

Modified Files:
        pkgsrc/textproc/py-snowballstemmer: Makefile PLIST distinfo

Log Message:
py-snowballstemmer: updated to 3.1.0

Snowball 3.1.0 (2026-05-22)

Compiler changes

* Bug fixes:

  + Fix segmentation fault if -syntax is used on a program with no code.

  + Fix segmentation fault on some assignment syntax errors.

  + Fix bug introduced in v3.0.0 with conversion of `among` starter.  If there
    were any commands after the among in the same command list then the among
    itself would get lost.  Not triggered by any current algorithms.

  + Clear name field when removing dead assignments.  This is visible in the
    syntax tree shown when command line option -syntax is used, but probably
    doesn't affect anything otherwise.

* Compiler command-line options:

  + Using `-` for the Snowball source file is now interpreted as stdin.

  + Improve comments generated by `-comments` to show more details of the
    corresponding Snowball code (e.g. variable names, arithmetic expressions,
    and literal strings).

  + Add `-coverage` option which enables a code coverage feature.  So far this
    tracks which among strings and functions are exercised, and which grouping
    characters are exercised. !

  + Support `-eprefix` for all target languages.  This is easy to do and
    provides a way to deal with externals which collide with keywords in the
    target language.  Our build system now uses `-eprefix _` for Python to make
    the `stem` external non-public (it is called by BaseStemmer method
    `stemWord()`) and we no longer hard-code prefixing Python externals with
    `_`.

  + Describe more options in `--help` output.

  + Sort target language options in `--help` output.

  + The `-o` option is now optional.  If not specified we now write output(s)
    to the same filename as the first source, but with a different extension
    (e.g. path/to/english.sbl -> path/to/english.c and path/to/english.h).

  + The `-o` option can now optionally include an extension so you can now
    write `-c++ -o path/to/foo.cxx` instead of `-c++ -o path/to/foo`, which can
    be more convenient (e.g. in `make` rules) and also provides an easy way to
    specify an alternative extension (for example, `.cxx`, `.cc` and `.cpp` are
    all extensions commonly used for C++ source code).

  + Reject `-vprefix` option for target languages which don't support it (it is
    currently only implemented for C/C++).

* Diagnostics:

  + Clean up and improve error reporting.

  + Improve line numbers reported for some errors and warnings by using the
    line number of an appropriate token rather than the current line number
    of the tokeniser (which is often the line after the command being warned
    about).

  + Improve recovery after various errors, trying to resynchronise based on
    what's more likely, and eliminating some additional irrelevant errors
    (including reporting the exact same error twice in some situations).

  + Emit warnings for uses of legacy Snowball language features.

  + The Snowball manual describes `integers (x)` as a declaration of `x` so we
    now warn:

      integer 'x' declared but not used

    rather than:

      integer 'x' defined but not used

  + 3.0.0 added a warning if the body of a `repeat` or `atleast` loop always
    signals `t` (meaning it will loop forever which is very undesirable for a
    stemming algorithm) or always signals `f` (meaning it will never loop,
    which seems unlikely to be what was intended).  This warning was added to
    the C generator, but has been moved to generic code so it is now issued
    regardless of the current target language.

  + Improve the wording of the warning if the body of a `repeat` or `atleast`
    loop always signals 't' to explicitly say this means the loop is infinite.

  + Improve warning message for unreachable code after `not`.

  + `$x = x + 1` cleared the initialised status of x (rather than just not
    setting it) which could lead to bogus warnings that `x` is never
    initialised.

  + The compiler no longer exits immediately after reporting a division by zero
    error in the Snowball code.

  + We now report a division by zero error for `$x /= 0` (this was meant to be
    already implemented but wasn't working due to a code typo).

  + More consistent wording of "is a no-op" warnings.

  + Warn that `insert ''` and `attach ''` are no-ops (and don't generate code
    for them).

  + Warn if a string used to define a grouping repeats characters.  There's no
    reason to do this, so it seems likely to be a typo.

  + Avoid sometimes reporting "-1 blocks unfreed".

* Optimisations:

  + Speed up processing larger Snowball programs by growing large string
    buffers exponentially to avoid a huge number of reallocations.  For
    example, this reduced the time to compile serbian.sbl to C by about 80%!

  + Optimise reading of input file when it is seekable (which it is in typical
    usage).  Non-seekable input files are still supported.

  + Optimise writing integers when generating code.  72% of integers we write
    are 0 to 9 and these are now written as a character.  Other values are now
    handled without a temporary buffer, avoiding a copy.  This reduced the time
    to compile serbian.sbl to C by about 8%, for example.

  + Optimise comparing among actions to find and merge equivalent actions.
    The comparison function used for this was carefully returning a full order,
    but actually we only need to know if the actions are equivalent or not
    which can be tested more efficiently.  For example, this reduced the time
    to compile serbian.sbl to C by about 2%.

  + We now precompute the possible signals from each command which means this
    is now done exactly once per command, whereas previously we could end up
    doing it many times for some commands in some cases.  The only functional
    change should we no longer make a pessimistic assumption if the function
    call depth reaches 100.  This is cleaner but is unlikely to make a
    difference for any real-world Snowball programs.

  + Handle possible_signals for string-$ which just passes on signals from its
    subcommand.  This doesn't affect code generation for any algorithms we
    currently ship.

  + We now only generate function bodies to a temporary buffer for target
    languages where we need to.  This makes the code a bit clearer and reduces
    the amount of copying of data so will make the Snowball compiler a little
    faster.  This change produces identical output for all current algorithms.

  + Tokenisation now decodes symbol tokens using switch statements.  We don't
    know the length of these tokens in advance, so the old approach of binary
    chop on a sorted list required searching the list multiple times with
    different possible lengths.  Alphabetical tokens are still decoded by
    binary chop.

* Code quality:

  + Remove unused routines and groupings from the program during the analysis
    phase, which avoids each generator having to have duplicate code to skip
    them.

  + Fix small memory leak if all uses of a name are eliminated.

  + Always use `snprintf()` instead of `sprintf()`.  If the buffer passed was
    too small we now emit an error rather than quietly using truncated output.

  + Fix GCC -Wcast-qual warnings in compiler and enable this warning by
    default.

  + Switch to using the standard C `bool` type in the code of the compiler.
    (The generated code still aims to require only C90.)

* Other changes:

  + Provide a simpler way to build a cut-down Snowball compiler.  The
    motivation here was to have a way to more quickly build a smaller Snowball
    compiler which only targets C.  Rather than have a DISABLE_xxx macro for
    each language, just check if TARGET_C_ONLY is defined, and only turn off
    the code to actually call the other generators which greatly reduces the
    amount of conditionalisation required.


To generate a diff of this commit:
cvs rdiff -u -r1.9 -r1.10 pkgsrc/textproc/py-snowballstemmer/Makefile \
    pkgsrc/textproc/py-snowballstemmer/distinfo
cvs rdiff -u -r1.7 -r1.8 pkgsrc/textproc/py-snowballstemmer/PLIST

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: pkgsrc/textproc/py-snowballstemmer/Makefile
diff -u pkgsrc/textproc/py-snowballstemmer/Makefile:1.9 pkgsrc/textproc/py-snowballstemmer/Makefile:1.10
--- pkgsrc/textproc/py-snowballstemmer/Makefile:1.9     Sun May 11 10:33:49 2025
+++ pkgsrc/textproc/py-snowballstemmer/Makefile Mon May 25 06:30:05 2026
@@ -1,6 +1,6 @@
-# $NetBSD: Makefile,v 1.9 2025/05/11 10:33:49 wiz Exp $
+# $NetBSD: Makefile,v 1.10 2026/05/25 06:30:05 adam Exp $
 
-DISTNAME=      snowballstemmer-3.0.1
+DISTNAME=      snowballstemmer-3.1.0
 PKGNAME=       ${PYPKGPREFIX}-${DISTNAME}
 CATEGORIES=    textproc python
 MASTER_SITES=  ${MASTER_SITE_PYPI:=s/snowballstemmer/}
Index: pkgsrc/textproc/py-snowballstemmer/distinfo
diff -u pkgsrc/textproc/py-snowballstemmer/distinfo:1.9 pkgsrc/textproc/py-snowballstemmer/distinfo:1.10
--- pkgsrc/textproc/py-snowballstemmer/distinfo:1.9     Sun May 11 10:33:49 2025
+++ pkgsrc/textproc/py-snowballstemmer/distinfo Mon May 25 06:30:05 2026
@@ -1,5 +1,5 @@
-$NetBSD: distinfo,v 1.9 2025/05/11 10:33:49 wiz Exp $
+$NetBSD: distinfo,v 1.10 2026/05/25 06:30:05 adam Exp $
 
-BLAKE2s (snowballstemmer-3.0.1.tar.gz) = ff1118cc3bc93e279d7a3233d1b27ddfc0f10a462ba959e4db7d66883c815366
-SHA512 (snowballstemmer-3.0.1.tar.gz) = a9590da2b0be4b93a7500b337a63cf2039ff01a6da309ddb9462961c309b4763d4dfc925965a62376a9f3b41a05bb634d6472f1e2ee07e53b38f8542e7eada82
-Size (snowballstemmer-3.0.1.tar.gz) = 105575 bytes
+BLAKE2s (snowballstemmer-3.1.0.tar.gz) = ff599bde738bd536e05a10e3beaa2c6f4a43d7781a5fdf344208680d971ab3b8
+SHA512 (snowballstemmer-3.1.0.tar.gz) = 02d3022c76c3e6da37c599b9a58855e538ce5bddf0533c4b32ffeb44e426cbf998f20746c9563a5e05956c3118ce985a5129573f4342bdc94ed2dc1d1d62214d
+Size (snowballstemmer-3.1.0.tar.gz) = 122523 bytes

Index: pkgsrc/textproc/py-snowballstemmer/PLIST
diff -u pkgsrc/textproc/py-snowballstemmer/PLIST:1.7 pkgsrc/textproc/py-snowballstemmer/PLIST:1.8
--- pkgsrc/textproc/py-snowballstemmer/PLIST:1.7        Sun May 11 10:33:49 2025
+++ pkgsrc/textproc/py-snowballstemmer/PLIST    Mon May 25 06:30:05 2026
@@ -1,4 +1,4 @@
-@comment $NetBSD: PLIST,v 1.7 2025/05/11 10:33:49 wiz Exp $
+@comment $NetBSD: PLIST,v 1.8 2026/05/25 06:30:05 adam Exp $
 ${PYSITELIB}/${WHEEL_INFODIR}/METADATA
 ${PYSITELIB}/${WHEEL_INFODIR}/RECORD
 ${PYSITELIB}/${WHEEL_INFODIR}/WHEEL
@@ -25,6 +25,9 @@ ${PYSITELIB}/snowballstemmer/basque_stem
 ${PYSITELIB}/snowballstemmer/catalan_stemmer.py
 ${PYSITELIB}/snowballstemmer/catalan_stemmer.pyc
 ${PYSITELIB}/snowballstemmer/catalan_stemmer.pyo
+${PYSITELIB}/snowballstemmer/czech_stemmer.py
+${PYSITELIB}/snowballstemmer/czech_stemmer.pyc
+${PYSITELIB}/snowballstemmer/czech_stemmer.pyo
 ${PYSITELIB}/snowballstemmer/danish_stemmer.py
 ${PYSITELIB}/snowballstemmer/danish_stemmer.pyc
 ${PYSITELIB}/snowballstemmer/danish_stemmer.pyo
@@ -79,6 +82,12 @@ ${PYSITELIB}/snowballstemmer/nepali_stem
 ${PYSITELIB}/snowballstemmer/norwegian_stemmer.py
 ${PYSITELIB}/snowballstemmer/norwegian_stemmer.pyc
 ${PYSITELIB}/snowballstemmer/norwegian_stemmer.pyo
+${PYSITELIB}/snowballstemmer/persian_stemmer.py
+${PYSITELIB}/snowballstemmer/persian_stemmer.pyc
+${PYSITELIB}/snowballstemmer/persian_stemmer.pyo
+${PYSITELIB}/snowballstemmer/polish_stemmer.py
+${PYSITELIB}/snowballstemmer/polish_stemmer.pyc
+${PYSITELIB}/snowballstemmer/polish_stemmer.pyo
 ${PYSITELIB}/snowballstemmer/porter_stemmer.py
 ${PYSITELIB}/snowballstemmer/porter_stemmer.pyc
 ${PYSITELIB}/snowballstemmer/porter_stemmer.pyo
@@ -94,6 +103,9 @@ ${PYSITELIB}/snowballstemmer/russian_ste
 ${PYSITELIB}/snowballstemmer/serbian_stemmer.py
 ${PYSITELIB}/snowballstemmer/serbian_stemmer.pyc
 ${PYSITELIB}/snowballstemmer/serbian_stemmer.pyo
+${PYSITELIB}/snowballstemmer/sesotho_stemmer.py
+${PYSITELIB}/snowballstemmer/sesotho_stemmer.pyc
+${PYSITELIB}/snowballstemmer/sesotho_stemmer.pyo
 ${PYSITELIB}/snowballstemmer/spanish_stemmer.py
 ${PYSITELIB}/snowballstemmer/spanish_stemmer.pyc
 ${PYSITELIB}/snowballstemmer/spanish_stemmer.pyo



Home | Main Index | Thread Index | Old Index