pkgsrc-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[pkgsrc/trunk]: pkgsrc/textproc/xapian Update to 1.4.4. From the changelog:



details:   https://anonhg.NetBSD.org/pkgsrc/rev/b537d8ed1079
branches:  trunk
changeset: 365050:b537d8ed1079
user:      schmonz <schmonz%pkgsrc.org@localhost>
date:      Sun Jul 09 22:27:43 2017 +0000

description:
Update to 1.4.4. From the changelog:

API:

* Database::check():

  + Fix checking a single table - changes in 1.4.2 broke such checks unless you
    specified the table without any extension.

  + Errors from failing to find the file specified are now thrown as
    DatabaseOpeningError (was DatabaseError, of which DatabaseOpeningError is
    a subclass so existing code should continue to work).  Also improved the
    error message when the file doesn't exist is better.

* Drop OP_SCALE_WEIGHT over OP_VALUE_RANGE, OP_VALUE_GE and OP_VALUE_LE in the
  Query constructor.  These operators always return weight 0 so OP_SCALE_WEIGHT
  over them has no effect.  Eliminating it at query construction time is cheap
  (we only need to check the type of the subquery), eliminates the confusing
  "0 * " from the query description, and means the OP_SCALE_WEIGHT Query object
  can be released sooner.  Inspired by Shivanshu Chauhan asking about the query
  description on IRC.

* Drop OP_SCALE_WEIGHT on the right side of OP_AND_NOT in the Query
  constructor.  OP_AND_NOT takes no weight from the right so OP_SCALE_WEIGHT
  has no effect there.  Eliminating it at query construction time is cheap
  (just need to check the subquery's type), eliminates the confusing "0 * "
  from the query description, and means the OP_SCALE_WEIGHT object can be
  released sooner.

* MSet::snippet(): Favour candidate snippets which contain more of a diversity
  of matching terms by discounting the relevance of repeated terms using an
  exponential decay.  A snippet which contains more terms from the query is
  likely to be better than one which contains the same term or terms multiple
  times, but a repeated term is still interesting, just less with each
  additional appearance.  Diversity issue highlighted by Robert Stepanek's
  patch in https://github.com/xapian/xapian/pull/117 - testcases taken from his
  patch.

* MSet::snippet(): New flag SNIPPET_EMPTY_WITHOUT_MATCH to get an empty snippet
  if there are no matches in the text passed in.  Implemented by Robert
  Stepanek.

* Round MSet::get_matches_estimated() to an appropriate number of significant
  figures.  The algorithm used looks at the lower and upper bound and where the
  estimate sits between them, and then picks an appropriate number of
  significant figures.  Thanks to S?bastien Le Callonnec for help sorting out a
  portability issue on OS X.

* Add Database::locked() method - where possible this non-invasively checks if
  the database is currently open for writing, which can be useful for
  dashboards and other status reporting tools.

testsuite:

* Add more tests of Database::check().  Fixes #238, reported by Richard
  Boulton.

* Make apitest testcase nosuchdb1 fail if we manage to open the DB.

* Skip testcases which throw NetworkError with errno value ECHILD - this
  indicates system resource starvation rather than a Xapian bug.  Such failures
  are seen on Debian buildds from time to time, see:
  https://bugs.debian.org/681941

* Use terms that exist in the database for most snippet tests.  It's good to
  test that snippet highlighting works for terms that aren't in the database,
  but it's not good for all our snippet tests to feature such terms - it's
  not the common usage.

matcher:

* Fix incorrect results due to uninitialised memory.  The array holding max
  weight values in MultiAndPostList is never initialised if the operator is
  unweighted, but the values are still used to calculate the max weight to pass
  to subqueries, leading to incorrect results.  This can be observed with an OR
  under an unweighted AND (e.g. OR under AND on the right side of AND_NOT).
  The fix applied is to simply default initialise this array, which should lead
  to a max weight of 0.0 being passed on to subqueries.  Bug reported in
  notmuch by Kirill A. Shutemov, and forwarded by David Bremner.

* Improve value range upper bound and estimated matches.  The value slot
  frequency provides a tighter upper bound than Database::get_doccount().
  The estimate is now calculated by working out the proportion of possible
  values between the slot lower and upper bounds which the range covers
  (assuming a uniform distribution).  This seems to work fairly well in
  practice, and is certainly better than the crude estimate we were using:
  Database::get_doccount() / 2

* Handle arbitrary combinations of OP_OR under OP_NEAR/OP_PHRASE, partly
  addressing #508.  Thanks to Jean-Francois Dockes for motivation and testing.

* Only convert OP_PHRASE to OP_AND if full DB has no positions.  Until now the
  conversion was done independently for each sub-database, but being consistent
  with the results from a database containing all the same documents seems more
  useful.

* Avoid double get_wdf() call for first subquery of OP_NEAR and OP_PHRASE,
  which will speed them up by a small amount.

documentation:

* Correct "Query::feature_flag" -> "QueryParser::feature_flag".  Fixes #747,
  reported by James Aylett.

* Rename set_metadata() `value` parameter to `metadata`.  This change is
  particularly motivated by making it easier to map this case specially in SWIG
  bindings, but the new name is also clearer and better documents its purpose.

* Rename value range parameters.  The new names (`range_limit` instead of
  `limit`, `range_lower` instead of `begin` and `range_upper` instead of `end`)
  are particularly motivated by making it easier to map them specially in SWIG
  bindings, but they're also clearer names which better document their
  purposes.

* Change "(key, tag)" to "(key, value)" in user metadata docs.  The user
  metadata is essentially what's often called a "key-value store" so users
  are likely to be familiar with that terminology.

* Consistently name parameter of Weight::unserialise() overridden forms.
  In xapian/weight.h it was almost always named `serialised`, but LMWeight
  named it `s` and CoordWeight omitted the name.

* Fix various minor documentation comment typos.

* INSTALL: Update section about -Bsymbolic-functions which is not a new
  GNU ld feature at this point.

tools:

* xapian-delve: Uses new Database::locked() method to report if the database
  is currently locked.

portability:

* Fix configure probe for __builtin_exp10() to work around bug on mingw - there
  GCC generates a call to exp10() for __builtin_exp10() but there is no exp10()
  function in the C library, so we get a link failure.  Use a full link test
  instead to avoid this issue.  Reported by Mario Emmenlauer on xapian-devel.

* Fix configure probe for log2() which was failing on at least some platforms
  due to ambiguity between overloaded forms of log2().  Make the probe
  explicitly check for log2(double) to avoid this problem.

* Workaround the unhelpful semantics of AI_ADDRCONFIG on platforms which follow
  the old RFC instead of POSIX (such as Linux) - if only loopback networking is
  configured, localhost won't resolve by name or IP address, which causes
  testsuites using the remote backend over localhost to fail in auto-build
  environments which deliberately disable networking during builds.  The
  workaround implemented is to check if the hostname is "::1", "127.0.0.1" or
  "localhost" and disable AI_ADDRCONFIG for these.  This doesn't catch all
  possible ways to specify localhost, but should catch all the ways these might
  be specified in a testsuite.  Fixes https://bugs.debian.org/853107, reported
  by Daniel Schepler and the root cause uncovered by James Clarke.

* Fix build failure cross-compiling for android due to not pulling in header
  for errno.

* Fix compiler warnings.

debug code:

* Adjust assertion in InMemoryPostList.  Calling skip_to() is fine when the
  postlist hasn't been started yet (but the assertion was failing for a term
  not in the database).  Latent bug, triggered by testcases complexphrase1 and
  complexnear1 as updated for addition of support for OP_OR subqueries of
  OP_PHRASE/OP_NEAR.

diffstat:

 textproc/xapian/Makefile |   4 ++--
 textproc/xapian/distinfo |  10 +++++-----
 2 files changed, 7 insertions(+), 7 deletions(-)

diffs (29 lines):

diff -r 71598d36085f -r b537d8ed1079 textproc/xapian/Makefile
--- a/textproc/xapian/Makefile  Sun Jul 09 22:12:19 2017 +0000
+++ b/textproc/xapian/Makefile  Sun Jul 09 22:27:43 2017 +0000
@@ -1,7 +1,7 @@
-# $NetBSD: Makefile,v 1.29 2017/05/08 12:02:16 schmonz Exp $
+# $NetBSD: Makefile,v 1.30 2017/07/09 22:27:43 schmonz Exp $
 
 DISTNAME=              xapian-core-${VERSION}
-VERSION=               1.4.2
+VERSION=               1.4.4
 PKGNAME=               xapian-${VERSION}
 CATEGORIES=            textproc
 MASTER_SITES=          http://oligarchy.co.uk/xapian/${VERSION}/
diff -r 71598d36085f -r b537d8ed1079 textproc/xapian/distinfo
--- a/textproc/xapian/distinfo  Sun Jul 09 22:12:19 2017 +0000
+++ b/textproc/xapian/distinfo  Sun Jul 09 22:27:43 2017 +0000
@@ -1,7 +1,7 @@
-$NetBSD: distinfo,v 1.24 2017/01/01 10:40:49 schmonz Exp $
+$NetBSD: distinfo,v 1.25 2017/07/09 22:27:43 schmonz Exp $
 
-SHA1 (xapian-core-1.4.2.tar.xz) = fe42396875c72136758ab97c1344761cbafc3b1a
-RMD160 (xapian-core-1.4.2.tar.xz) = 64d2630e5808bdebb77e9bee6e03439407e85c90
-SHA512 (xapian-core-1.4.2.tar.xz) = 2ea189068837c295b9c2065f06bdf5c4078114c0a07d5ea94f396baab806c038e0e8e8ae6b7702322255b2bc8a84025c0c03d20b87dd3de7c6854666b1c753a3
-Size (xapian-core-1.4.2.tar.xz) = 2799492 bytes
+SHA1 (xapian-core-1.4.4.tar.xz) = 6b8bf7eea3059dab8d5dd254c3ae0cf895bc4910
+RMD160 (xapian-core-1.4.4.tar.xz) = 19535bc7ca5c175b7ee1c4898e9e9e796e45dcb0
+SHA512 (xapian-core-1.4.4.tar.xz) = dc88bab1d82c68b29d51c2113319ddb5d16840f3544b9d5fcc7a3671f97d58f16ddff58b865ad3521ea778cbaacf73fe7346bb514a1275f1f739283a4128d001
+Size (xapian-core-1.4.4.tar.xz) = 2807952 bytes
 SHA1 (patch-common_safesyssocket.h) = 032d441853914d510bc285bb682a98c4ee264d52



Home | Main Index | Thread Index | Old Index