pkgsrc-Changes archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
CVS commit: pkgsrc/textproc/py-snowballstemmer
Module Name: pkgsrc
Committed By: adam
Date: Mon May 25 06:30:05 UTC 2026
Modified Files:
pkgsrc/textproc/py-snowballstemmer: Makefile PLIST distinfo
Log Message:
py-snowballstemmer: updated to 3.1.0
Snowball 3.1.0 (2026-05-22)
Compiler changes
* Bug fixes:
+ Fix segmentation fault if -syntax is used on a program with no code.
+ Fix segmentation fault on some assignment syntax errors.
+ Fix bug introduced in v3.0.0 with conversion of `among` starter. If there
were any commands after the among in the same command list then the among
itself would get lost. Not triggered by any current algorithms.
+ Clear name field when removing dead assignments. This is visible in the
syntax tree shown when command line option -syntax is used, but probably
doesn't affect anything otherwise.
* Compiler command-line options:
+ Using `-` for the Snowball source file is now interpreted as stdin.
+ Improve comments generated by `-comments` to show more details of the
corresponding Snowball code (e.g. variable names, arithmetic expressions,
and literal strings).
+ Add `-coverage` option which enables a code coverage feature. So far this
tracks which among strings and functions are exercised, and which grouping
characters are exercised. !
+ Support `-eprefix` for all target languages. This is easy to do and
provides a way to deal with externals which collide with keywords in the
target language. Our build system now uses `-eprefix _` for Python to make
the `stem` external non-public (it is called by BaseStemmer method
`stemWord()`) and we no longer hard-code prefixing Python externals with
`_`.
+ Describe more options in `--help` output.
+ Sort target language options in `--help` output.
+ The `-o` option is now optional. If not specified we now write output(s)
to the same filename as the first source, but with a different extension
(e.g. path/to/english.sbl -> path/to/english.c and path/to/english.h).
+ The `-o` option can now optionally include an extension so you can now
write `-c++ -o path/to/foo.cxx` instead of `-c++ -o path/to/foo`, which can
be more convenient (e.g. in `make` rules) and also provides an easy way to
specify an alternative extension (for example, `.cxx`, `.cc` and `.cpp` are
all extensions commonly used for C++ source code).
+ Reject `-vprefix` option for target languages which don't support it (it is
currently only implemented for C/C++).
* Diagnostics:
+ Clean up and improve error reporting.
+ Improve line numbers reported for some errors and warnings by using the
line number of an appropriate token rather than the current line number
of the tokeniser (which is often the line after the command being warned
about).
+ Improve recovery after various errors, trying to resynchronise based on
what's more likely, and eliminating some additional irrelevant errors
(including reporting the exact same error twice in some situations).
+ Emit warnings for uses of legacy Snowball language features.
+ The Snowball manual describes `integers (x)` as a declaration of `x` so we
now warn:
integer 'x' declared but not used
rather than:
integer 'x' defined but not used
+ 3.0.0 added a warning if the body of a `repeat` or `atleast` loop always
signals `t` (meaning it will loop forever which is very undesirable for a
stemming algorithm) or always signals `f` (meaning it will never loop,
which seems unlikely to be what was intended). This warning was added to
the C generator, but has been moved to generic code so it is now issued
regardless of the current target language.
+ Improve the wording of the warning if the body of a `repeat` or `atleast`
loop always signals 't' to explicitly say this means the loop is infinite.
+ Improve warning message for unreachable code after `not`.
+ `$x = x + 1` cleared the initialised status of x (rather than just not
setting it) which could lead to bogus warnings that `x` is never
initialised.
+ The compiler no longer exits immediately after reporting a division by zero
error in the Snowball code.
+ We now report a division by zero error for `$x /= 0` (this was meant to be
already implemented but wasn't working due to a code typo).
+ More consistent wording of "is a no-op" warnings.
+ Warn that `insert ''` and `attach ''` are no-ops (and don't generate code
for them).
+ Warn if a string used to define a grouping repeats characters. There's no
reason to do this, so it seems likely to be a typo.
+ Avoid sometimes reporting "-1 blocks unfreed".
* Optimisations:
+ Speed up processing larger Snowball programs by growing large string
buffers exponentially to avoid a huge number of reallocations. For
example, this reduced the time to compile serbian.sbl to C by about 80%!
+ Optimise reading of input file when it is seekable (which it is in typical
usage). Non-seekable input files are still supported.
+ Optimise writing integers when generating code. 72% of integers we write
are 0 to 9 and these are now written as a character. Other values are now
handled without a temporary buffer, avoiding a copy. This reduced the time
to compile serbian.sbl to C by about 8%, for example.
+ Optimise comparing among actions to find and merge equivalent actions.
The comparison function used for this was carefully returning a full order,
but actually we only need to know if the actions are equivalent or not
which can be tested more efficiently. For example, this reduced the time
to compile serbian.sbl to C by about 2%.
+ We now precompute the possible signals from each command which means this
is now done exactly once per command, whereas previously we could end up
doing it many times for some commands in some cases. The only functional
change should we no longer make a pessimistic assumption if the function
call depth reaches 100. This is cleaner but is unlikely to make a
difference for any real-world Snowball programs.
+ Handle possible_signals for string-$ which just passes on signals from its
subcommand. This doesn't affect code generation for any algorithms we
currently ship.
+ We now only generate function bodies to a temporary buffer for target
languages where we need to. This makes the code a bit clearer and reduces
the amount of copying of data so will make the Snowball compiler a little
faster. This change produces identical output for all current algorithms.
+ Tokenisation now decodes symbol tokens using switch statements. We don't
know the length of these tokens in advance, so the old approach of binary
chop on a sorted list required searching the list multiple times with
different possible lengths. Alphabetical tokens are still decoded by
binary chop.
* Code quality:
+ Remove unused routines and groupings from the program during the analysis
phase, which avoids each generator having to have duplicate code to skip
them.
+ Fix small memory leak if all uses of a name are eliminated.
+ Always use `snprintf()` instead of `sprintf()`. If the buffer passed was
too small we now emit an error rather than quietly using truncated output.
+ Fix GCC -Wcast-qual warnings in compiler and enable this warning by
default.
+ Switch to using the standard C `bool` type in the code of the compiler.
(The generated code still aims to require only C90.)
* Other changes:
+ Provide a simpler way to build a cut-down Snowball compiler. The
motivation here was to have a way to more quickly build a smaller Snowball
compiler which only targets C. Rather than have a DISABLE_xxx macro for
each language, just check if TARGET_C_ONLY is defined, and only turn off
the code to actually call the other generators which greatly reduces the
amount of conditionalisation required.
To generate a diff of this commit:
cvs rdiff -u -r1.9 -r1.10 pkgsrc/textproc/py-snowballstemmer/Makefile \
pkgsrc/textproc/py-snowballstemmer/distinfo
cvs rdiff -u -r1.7 -r1.8 pkgsrc/textproc/py-snowballstemmer/PLIST
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Modified files:
Index: pkgsrc/textproc/py-snowballstemmer/Makefile
diff -u pkgsrc/textproc/py-snowballstemmer/Makefile:1.9 pkgsrc/textproc/py-snowballstemmer/Makefile:1.10
--- pkgsrc/textproc/py-snowballstemmer/Makefile:1.9 Sun May 11 10:33:49 2025
+++ pkgsrc/textproc/py-snowballstemmer/Makefile Mon May 25 06:30:05 2026
@@ -1,6 +1,6 @@
-# $NetBSD: Makefile,v 1.9 2025/05/11 10:33:49 wiz Exp $
+# $NetBSD: Makefile,v 1.10 2026/05/25 06:30:05 adam Exp $
-DISTNAME= snowballstemmer-3.0.1
+DISTNAME= snowballstemmer-3.1.0
PKGNAME= ${PYPKGPREFIX}-${DISTNAME}
CATEGORIES= textproc python
MASTER_SITES= ${MASTER_SITE_PYPI:=s/snowballstemmer/}
Index: pkgsrc/textproc/py-snowballstemmer/distinfo
diff -u pkgsrc/textproc/py-snowballstemmer/distinfo:1.9 pkgsrc/textproc/py-snowballstemmer/distinfo:1.10
--- pkgsrc/textproc/py-snowballstemmer/distinfo:1.9 Sun May 11 10:33:49 2025
+++ pkgsrc/textproc/py-snowballstemmer/distinfo Mon May 25 06:30:05 2026
@@ -1,5 +1,5 @@
-$NetBSD: distinfo,v 1.9 2025/05/11 10:33:49 wiz Exp $
+$NetBSD: distinfo,v 1.10 2026/05/25 06:30:05 adam Exp $
-BLAKE2s (snowballstemmer-3.0.1.tar.gz) = ff1118cc3bc93e279d7a3233d1b27ddfc0f10a462ba959e4db7d66883c815366
-SHA512 (snowballstemmer-3.0.1.tar.gz) = a9590da2b0be4b93a7500b337a63cf2039ff01a6da309ddb9462961c309b4763d4dfc925965a62376a9f3b41a05bb634d6472f1e2ee07e53b38f8542e7eada82
-Size (snowballstemmer-3.0.1.tar.gz) = 105575 bytes
+BLAKE2s (snowballstemmer-3.1.0.tar.gz) = ff599bde738bd536e05a10e3beaa2c6f4a43d7781a5fdf344208680d971ab3b8
+SHA512 (snowballstemmer-3.1.0.tar.gz) = 02d3022c76c3e6da37c599b9a58855e538ce5bddf0533c4b32ffeb44e426cbf998f20746c9563a5e05956c3118ce985a5129573f4342bdc94ed2dc1d1d62214d
+Size (snowballstemmer-3.1.0.tar.gz) = 122523 bytes
Index: pkgsrc/textproc/py-snowballstemmer/PLIST
diff -u pkgsrc/textproc/py-snowballstemmer/PLIST:1.7 pkgsrc/textproc/py-snowballstemmer/PLIST:1.8
--- pkgsrc/textproc/py-snowballstemmer/PLIST:1.7 Sun May 11 10:33:49 2025
+++ pkgsrc/textproc/py-snowballstemmer/PLIST Mon May 25 06:30:05 2026
@@ -1,4 +1,4 @@
-@comment $NetBSD: PLIST,v 1.7 2025/05/11 10:33:49 wiz Exp $
+@comment $NetBSD: PLIST,v 1.8 2026/05/25 06:30:05 adam Exp $
${PYSITELIB}/${WHEEL_INFODIR}/METADATA
${PYSITELIB}/${WHEEL_INFODIR}/RECORD
${PYSITELIB}/${WHEEL_INFODIR}/WHEEL
@@ -25,6 +25,9 @@ ${PYSITELIB}/snowballstemmer/basque_stem
${PYSITELIB}/snowballstemmer/catalan_stemmer.py
${PYSITELIB}/snowballstemmer/catalan_stemmer.pyc
${PYSITELIB}/snowballstemmer/catalan_stemmer.pyo
+${PYSITELIB}/snowballstemmer/czech_stemmer.py
+${PYSITELIB}/snowballstemmer/czech_stemmer.pyc
+${PYSITELIB}/snowballstemmer/czech_stemmer.pyo
${PYSITELIB}/snowballstemmer/danish_stemmer.py
${PYSITELIB}/snowballstemmer/danish_stemmer.pyc
${PYSITELIB}/snowballstemmer/danish_stemmer.pyo
@@ -79,6 +82,12 @@ ${PYSITELIB}/snowballstemmer/nepali_stem
${PYSITELIB}/snowballstemmer/norwegian_stemmer.py
${PYSITELIB}/snowballstemmer/norwegian_stemmer.pyc
${PYSITELIB}/snowballstemmer/norwegian_stemmer.pyo
+${PYSITELIB}/snowballstemmer/persian_stemmer.py
+${PYSITELIB}/snowballstemmer/persian_stemmer.pyc
+${PYSITELIB}/snowballstemmer/persian_stemmer.pyo
+${PYSITELIB}/snowballstemmer/polish_stemmer.py
+${PYSITELIB}/snowballstemmer/polish_stemmer.pyc
+${PYSITELIB}/snowballstemmer/polish_stemmer.pyo
${PYSITELIB}/snowballstemmer/porter_stemmer.py
${PYSITELIB}/snowballstemmer/porter_stemmer.pyc
${PYSITELIB}/snowballstemmer/porter_stemmer.pyo
@@ -94,6 +103,9 @@ ${PYSITELIB}/snowballstemmer/russian_ste
${PYSITELIB}/snowballstemmer/serbian_stemmer.py
${PYSITELIB}/snowballstemmer/serbian_stemmer.pyc
${PYSITELIB}/snowballstemmer/serbian_stemmer.pyo
+${PYSITELIB}/snowballstemmer/sesotho_stemmer.py
+${PYSITELIB}/snowballstemmer/sesotho_stemmer.pyc
+${PYSITELIB}/snowballstemmer/sesotho_stemmer.pyo
${PYSITELIB}/snowballstemmer/spanish_stemmer.py
${PYSITELIB}/snowballstemmer/spanish_stemmer.pyc
${PYSITELIB}/snowballstemmer/spanish_stemmer.pyo
Home |
Main Index |
Thread Index |
Old Index