pkgsrc-WIP-changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

py-dask: Update to 2024.1.1



Module Name:	pkgsrc-wip
Committed By:	Matthew Danielson <matthewd%fastmail.us@localhost>
Pushed By:	matthewd
Date:		Sat Feb 3 12:28:20 2024 -0800
Changeset:	7cf833298f099ab24422ee6b0c9f56752b2e2c65

Modified Files:
	py-dask/Makefile
	py-dask/PLIST
	py-dask/distinfo

Log Message:
py-dask: Update to 2024.1.1

2024.1.1
Released on January 26, 2024
Highlights
Pandas 2.2 and Scipy 1.12 support
This release contains compatibility updates for the latest pandas and scipy releases.
See GH#10834, GH#10849, GH#10845, and GH#8474 from crusaderky for details.
Deprecations
    Deprecate convert_dtype in apply (GH#10827) Miles
    Deprecate axis in DataFrame.rolling (GH#10803) Miles
    Deprecate out= and dtype= parameter in most DataFrame methods (GH#10800) crusaderky
    Deprecate axis in groupby cumulative transformers (GH#10796) Miles
    Rename shuffle to shuffle_method in remaining methods (GH#10797) Miles

2024.1.0
Released on January 12, 2024
Highlights
Partial rechunks within P2P
P2P rechunking now utilizes the relationships between input and output chunks. For situations that do not require all-to-all data transfer, this may significantly reduce the runtime and memory/disk footprint. It also enables task culling.
See GH#8330 from Hendrik Makait for details.
Fastparquet engine deprecated
The fastparquet Parquet engine has been deprecated. Users should migrate to the pyarrow engine by installing PyArrow and removing engine="fastparquet" in read_parquet or to_parquet calls.
See GH#10743 from crusaderky for details.
Improved serialization for arbitrary data
This release improves serialization robustness for arbitrary data. Previously there were some cases where serialization could fail for non-msgpack serializable data. In those cases we now fallback to using pickle.
See GH#8447 from Hendrik Makait for details.
Additional deprecations
    Deprecate shuffle keyword in favour of shuffle_method for DataFrame methods (GH#10738) Hendrik Makait
    Deprecate automatic argument inference in repartition (GH#10691) Patrick Hoefler
    Deprecate compute parameter in set_index (GH#10784) Miles
    Deprecate inplace in eval (GH#10785) Miles
    Deprecate Series.view (GH#10754) Miles
    Deprecate npartitions="auto" for set_index & sort_values (GH#10750) Miles

2023.12.1
Released on December 15, 2023
Highlights
Logical Query Planning now available for Dask DataFrames
Dask DataFrames are now much more performant by using a logical query planner. This feature is currently off by default, but can be turned on with:
dask.config.set({"dataframe.query-planning": True})
You also need to have dask-expr installed:
pip install dask-expr
We’ve seen promising performance improvements so far, see this blog post and these regularly updated benchmarks for more information. A more detailed explanation of how the query optimizer works can be found in this blog post.
This feature is still under active development and the API isn’t stable yet, so breaking changes can occur. We expect to make the query optimizer the default early next year.
See GH#10634 from Patrick Hoefler for details.
Dtype inference in read_parquet
read_parquet will now infer the Arrow types pa.date32(), pa.date64() and pa.decimal() as a ArrowDtype in pandas. These dtypes are backed by the original Arrow array, and thus avoid the conversion to NumPy object. Additionally, read_parquet will no longer infer nested and binary types as strings, they will be stored in NumPy object arrays.
See GH#10698 and GH#10705 from Patrick Hoefler for details.
Scheduling improvements to reduce memory usage
This release includes a major rewrite to a core part of our scheduling logic. It includes a new approach to the topological sorting algorithm in dask.order which determines the order in which tasks are run. Improper ordering is known to be a major contributor to too large cluster memory pressure.
Updates in this release fix a couple of performance regressions that were introduced in the release 2023.10.0 (see GH#10535). Generally, computations should now be much more eager to release data if it is no longer required in memory.
See GH#10660, GH#10697 from Florian Jetter for details.
Improved P2P-based merging robustness and performance
This release contains several updates that fix a possible deadlock introduced in 2023.9.2 and improve the robustness of P2P-based merging when the cluster is dynamically scaling up.
See GH#8415, GH#8416, and GH#8414 from Hendrik Makait for details.
Removed disabling pickle option
The distributed.scheduler.pickle configuration option is no longer supported. As of the 2023.4.0 release, pickle is used to transmit task graphs, so can no longer be disabled. We now raise an informative error when distributed.scheduler.pickle is set to False.
See GH#8401 from Florian Jetter for details.

2023.12.0
Released on December 1, 2023
Highlights
PipInstall restart and environment variables
The distributed.PipInstall plugin now has more robust restart logic and also supports environment variables.
Below shows how users can use the distributed.PipInstall plugin and a TOKEN environment variable to securely install a package from a private repository:
from dask.distributed import PipInstall
plugin = PipInstall(packages=["private_package@git+https://${TOKEN}@github.com/dask/private_package.git])
client.register_plugin(plugin)
See GH#8374, GH#8357, and GH#8343 from Hendrik Makait for details.
Bokeh 3.3.0 compatibility
This release contains compatibility updates for using bokeh>=3.3.0 with proxied Dask dashboards. Previously the contents of dashboard plots wouldn’t be displayed.
See GH#8347 and GH#8381 from Jacob Tomlinson for details.

To see a diff of this commit:
https://wip.pkgsrc.org/cgi-bin/gitweb.cgi?p=pkgsrc-wip.git;a=commitdiff;h=7cf833298f099ab24422ee6b0c9f56752b2e2c65

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

diffstat:
 py-dask/Makefile | 10 +++++-----
 py-dask/PLIST    |  9 +++++++++
 py-dask/distinfo |  6 +++---
 3 files changed, 17 insertions(+), 8 deletions(-)

diffs:
diff --git a/py-dask/Makefile b/py-dask/Makefile
index a83d3ba203..3de635df93 100644
--- a/py-dask/Makefile
+++ b/py-dask/Makefile
@@ -1,6 +1,6 @@
 # $NetBSD$
 
-GITHUB_TAG=	2023.11.0
+GITHUB_TAG=	2024.1.1
 DISTNAME=	dask-${GITHUB_TAG}
 PKGNAME=	${PYPKGPREFIX}-${DISTNAME}
 GITHUB_PROJECT=	dask
@@ -18,8 +18,8 @@ LICENSE=	modified-bsd
 
 PYTHON_VERSIONS_INCOMPATIBLE=	27 38
 
-TOOL_DEPENDS+=       ${PYPKGPREFIX}-wheel>=0:../../devel/py-wheel
-TOOL_DEPENDS+=       ${PYPKGPREFIX}-versioneer>=0.28:../../devel/py-versioneer
+TOOL_DEPENDS+=	${PYPKGPREFIX}-wheel>=0:../../devel/py-wheel
+TOOL_DEPENDS+=	${PYPKGPREFIX}-versioneer>=0.28:../../devel/py-versioneer
 
 DEPENDS+=	${PYPKGPREFIX}-cloudpickle>=1.5.0:../../wip/py-cloudpickle
 DEPENDS+=	${PYPKGPREFIX}-click>=8.1.3:../../devel/py-click
@@ -39,8 +39,8 @@ TEST_DEPENDS+=	${PYPKGPREFIX}-test-[0-9]*:../../devel/py-test
 TEST_DEPENDS+=	${PYPKGPREFIX}-test-cov-[0-9]*:../../devel/py-test-cov
 TEST_DEPENDS+=	${PYPKGPREFIX}-test-rerunfailures-[0-9]*:../../devel/py-test-rerunfailures
 TEST_DEPENDS+=	${PYPKGPREFIX}-test-xdist-[0-9]*:../../devel/py-test-xdist
-TEST_DEPENDS+=	${PYPKGPREFIX}-multipledispatch>=0.6.0*:../../devel/py-multipledispatch
-TEST_DEPENDS+=	${PYPKGPREFIX}-importlib-metadata>=6.6.0*:../../devel/py-importlib-metadata
+TEST_DEPENDS+=	${PYPKGPREFIX}-multipledispatch>=0.6.0:../../devel/py-multipledispatch
+TEST_DEPENDS+=	${PYPKGPREFIX}-importlib-metadata>=6.6.0:../../devel/py-importlib-metadata
 TEST_DEPENDS+=	${PYPKGPREFIX}-pre-commit-[0-9]*:../../wip/py-pre-commit
 # TEST_DEPENDS+=	${PYPKGPREFIX}-s3fs>=2022*:../../wip/py-s3fs
 # TEST_DEPENDS+=	${PYPKGPREFIX}-sparse>=0.11.2:../../wip/py-sparse
diff --git a/py-dask/PLIST b/py-dask/PLIST
index df5514192f..26f858ce9c 100644
--- a/py-dask/PLIST
+++ b/py-dask/PLIST
@@ -329,6 +329,9 @@ ${PYSITELIB}/dask/compatibility.pyo
 ${PYSITELIB}/dask/config.py
 ${PYSITELIB}/dask/config.pyc
 ${PYSITELIB}/dask/config.pyo
+${PYSITELIB}/dask/conftest.py
+${PYSITELIB}/dask/conftest.pyc
+${PYSITELIB}/dask/conftest.pyo
 ${PYSITELIB}/dask/context.py
 ${PYSITELIB}/dask/context.pyc
 ${PYSITELIB}/dask/context.pyo
@@ -352,6 +355,9 @@ ${PYSITELIB}/dask/dataframe/_pyarrow.pyo
 ${PYSITELIB}/dask/dataframe/_pyarrow_compat.py
 ${PYSITELIB}/dask/dataframe/_pyarrow_compat.pyc
 ${PYSITELIB}/dask/dataframe/_pyarrow_compat.pyo
+${PYSITELIB}/dask/dataframe/_testing.py
+${PYSITELIB}/dask/dataframe/_testing.pyc
+${PYSITELIB}/dask/dataframe/_testing.pyo
 ${PYSITELIB}/dask/dataframe/accessor.py
 ${PYSITELIB}/dask/dataframe/accessor.pyc
 ${PYSITELIB}/dask/dataframe/accessor.pyo
@@ -728,6 +734,9 @@ ${PYSITELIB}/dask/tests/test_system.pyo
 ${PYSITELIB}/dask/tests/test_threaded.py
 ${PYSITELIB}/dask/tests/test_threaded.pyc
 ${PYSITELIB}/dask/tests/test_threaded.pyo
+${PYSITELIB}/dask/tests/test_tokenize.py
+${PYSITELIB}/dask/tests/test_tokenize.pyc
+${PYSITELIB}/dask/tests/test_tokenize.pyo
 ${PYSITELIB}/dask/tests/test_traceback.py
 ${PYSITELIB}/dask/tests/test_traceback.pyc
 ${PYSITELIB}/dask/tests/test_traceback.pyo
diff --git a/py-dask/distinfo b/py-dask/distinfo
index 6ed39b5d68..410546e06d 100644
--- a/py-dask/distinfo
+++ b/py-dask/distinfo
@@ -1,5 +1,5 @@
 $NetBSD$
 
-BLAKE2s (dask-2023.11.0.tar.gz) = eb9a6b8402709e76c1f2ca64b0afa676aa64fefe15ab4b81752894b8d78a3c1f
-SHA512 (dask-2023.11.0.tar.gz) = 1ebac9c9fb158682dc5063710fd11ccbe0f584cea26afad4b3fe01001f3f7d6888ddbb7653cfdaf2da4ca7acb2b88bc7b1d8b4055790e7036b419ae995346e8f
-Size (dask-2023.11.0.tar.gz) = 8559592 bytes
+BLAKE2s (dask-2024.1.1.tar.gz) = c618e01e1f1788e8c3daf71534dd78143b9f906dd50292b30b1000f0044ca0c6
+SHA512 (dask-2024.1.1.tar.gz) = a5e424333c5d19f67d73c2b036544ef03122a99c2eb6a52019929f1e7b87297c776cbea713062372cf1685ef3b79d47734d6d0acd2c054ffadcbb3d96fb6deeb
+Size (dask-2024.1.1.tar.gz) = 9328425 bytes


Home | Main Index | Thread Index | Old Index