pkgsrc-WIP-changes archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
py-dask: Update to 2024.11.0
Module Name: pkgsrc-wip
Committed By: Matthew Danielson <matthewd%fastmail.us@localhost>
Pushed By: matthewd
Date: Sat Nov 9 06:14:38 2024 -0800
Changeset: d26ed8d3d1c9bc9d14c933353a4cd9480cde4e9e
Modified Files:
py-dask/Makefile
py-dask/PLIST
py-dask/distinfo
Log Message:
py-dask: Update to 2024.11.0
2024.11.0
Highlights
Legacy Dask DataFrame Deprecated
This release deprecates the legacy Dask DataFrame implementation. The old implementation will be removed completely in a future release. Users are encourage to switch to the new implementation now and to report any issues they are facing.
Users are also encourage to check that they are only importing functions from dask.dataframe and not any of the submodules.
New quantile methods for Dask Array API
Dask Array added new quantile and nanquantile methods. Previously, Dask dispatched to the NumPy implementation, which blocked the GIL a lot. This caused large slowdowns on workers with more than one tread and could lead to runtimes over 200s per chunk.
The new quantile implementation avoids many of these problems and reduces runtime to around 1s per chunk independently of the number of threads.
Consistent chunksize in Xarray rolling-construct
Using Xarrays rolling(...).construct(...) with Dask Arrays led to very large chunksizes that rarely fit into memory on a single worker.
The underlying operations is a view on the smaller NumPy array, but triggering a copy of the data will lead to very large memory usage.
import xarray as xr
import dask.array as da
arr = xr.DataArray(
da.ones((93504, 721, 1440), chunks=("auto", -1, -1)),
dims=["time", "lat", "longitude"],
) # Initial chunks are ~128 MiB
arr.rolling(time=30).construct("window_dim")
Previously
Individual chunks are exploding to 10 GiB, likely causing out of memory errors.
Individual chunks are exploding to 10 GiB, likely causing out of memory errors.
Now
Dask will now automatically split individual chunks into chunks that will have the same chunksize minus a small tolerance.
Individual chunks are now roughly the same size
Improved efficiency of map overlap
map_overlap now creates smaller and more efficient graphs to keep task graphs generally a lot smaller.
The previous version injected a lot of tasks that weren’t necessary, increasing the number of tasks by a factor of 2-10x of what actually necessary. This caused a lot of stress on the scheduler.
Consistent chunksizes for Einstein summation
Einstein summation historically led to very large chunksizes if applied to more than one Dask Array. This behavior is inherited from NumPy but led to out of memory errors on workers:
import dask.array as da
arr = da.random.random((1024, 64, 64, 64, 64), chunks=(256, 16, 16, 16, 16)) # Initial chunks are 128 MiB
result = da.einsum("aijkl,amnop->ijklmnop", arr, arr)
Previously
Individual chunks are exploding to 32 GiB, very likely causing out of memory errors.
Individual chunks are exploding to 32 GiB, very likely causing out of memory errors
Now
The operation keeps individual chunksizes the same.
2024.10.0
Notable Changes
Zarr-Python 3 compatibility (dask#11388)
Avoid exponentially increasing taskgraph in overlap (dask#11423)
Ensure numba tokenization does not use slow pickle path (dask#11419)
2024.9.1
Highlights
Improved adaptive scaling resilience
Adaptive scaling clusters now recover from spurious errors during scaling.
See distributed#8871 by Hendrik Makait for more details.
To see a diff of this commit:
https://wip.pkgsrc.org/cgi-bin/gitweb.cgi?p=pkgsrc-wip.git;a=commitdiff;h=d26ed8d3d1c9bc9d14c933353a4cd9480cde4e9e
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
diffstat:
py-dask/Makefile | 4 ++--
py-dask/PLIST | 3 +++
py-dask/distinfo | 6 +++---
3 files changed, 8 insertions(+), 5 deletions(-)
diffs:
diff --git a/py-dask/Makefile b/py-dask/Makefile
index 41d6d7121f..ff4a7dfbda 100644
--- a/py-dask/Makefile
+++ b/py-dask/Makefile
@@ -1,6 +1,6 @@
# $NetBSD$
-DISTNAME= dask-2024.9.0
+DISTNAME= dask-2024.11.0
PKGNAME= ${PYPKGPREFIX}-${DISTNAME}
CATEGORIES= math python
GITHUB_PROJECT= dask
@@ -41,7 +41,7 @@ DEPENDS+= ${PYPKGPREFIX}-apache-arrow>=14.0.1:../../wip/py-apache-arrow
DEPENDS+= ${PYPKGPREFIX}-bokeh>=2.4.2:../../wip/py-bokeh
DEPENDS+= ${PYPKGPREFIX}-cityhash-[0-9]*:../../wip/py-cityhash
DEPENDS+= ${PYPKGPREFIX}-cloudpickle>=1.5.0:../../converters/py-cloudpickle
-#DEPENDS+= ${PYPKGPREFIX}-dask_expr>=1.1.1:../../wip/py-dask_expr
+DEPENDS+= ${PYPKGPREFIX}-dask_expr>=1.1.17:../../wip/py-dask_expr
DEPENDS+= ${PYPKGPREFIX}-distributed>=${GITHUB_TAG}:../../wip/py-distributed
DEPENDS+= ${PYPKGPREFIX}-fastavro>=1.1.0:../../wip/py-fastavro
DEPENDS+= ${PYPKGPREFIX}-partd>=1.2.0:../../wip/py-partd
diff --git a/py-dask/PLIST b/py-dask/PLIST
index ef38e977fd..209d1d57b0 100644
--- a/py-dask/PLIST
+++ b/py-dask/PLIST
@@ -185,6 +185,9 @@ ${PYSITELIB}/dask/array/tests/test_image.pyo
${PYSITELIB}/dask/array/tests/test_linalg.py
${PYSITELIB}/dask/array/tests/test_linalg.pyc
${PYSITELIB}/dask/array/tests/test_linalg.pyo
+${PYSITELIB}/dask/array/tests/test_map_blocks.py
+${PYSITELIB}/dask/array/tests/test_map_blocks.pyc
+${PYSITELIB}/dask/array/tests/test_map_blocks.pyo
${PYSITELIB}/dask/array/tests/test_masked.py
${PYSITELIB}/dask/array/tests/test_masked.pyc
${PYSITELIB}/dask/array/tests/test_masked.pyo
diff --git a/py-dask/distinfo b/py-dask/distinfo
index 6029a661bd..f090af0d55 100644
--- a/py-dask/distinfo
+++ b/py-dask/distinfo
@@ -1,6 +1,6 @@
$NetBSD$
-BLAKE2s (dask-2024.9.0.tar.gz) = c4fe739efc25f8862f0da3b5ba8fbc648540e5beedd7aa8d3ae8ebb8baf99a07
-SHA512 (dask-2024.9.0.tar.gz) = 9ba9035c538dab138db992caecfaa1e4e8d8b9f0ef8b83a89dcc1f8431b521f64ff3367259cde07ffc84195fe1828a093ed80752ee83fcbc26817cf7e05b61dc
-Size (dask-2024.9.0.tar.gz) = 10160938 bytes
+BLAKE2s (dask-2024.11.0.tar.gz) = ee74cf41d1e6d7b5c7cd5a4d6c8f054ed1a589f091f6cabc5e65337677a00101
+SHA512 (dask-2024.11.0.tar.gz) = 99de28b39fe70eb46b82b64f4fde693546d5a721eca6faf5ed23bb0ab5a071c007cf5bebcd0a8a74498e7fe5d8e614fa86e4bea5f1ea816a04e10db3609a5ec2
+Size (dask-2024.11.0.tar.gz) = 10693572 bytes
SHA1 (patch-pyproject.toml) = bae684c99d9ae6d0e83c6ac58eefccd3b8d2638d
Home |
Main Index |
Thread Index |
Old Index