pkgsrc-WIP-changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

py-fastparquet: Update to 2022.12.0



Module Name:	pkgsrc-wip
Committed By:	Matthew Danielson <matthewd%fastmail.us@localhost>
Pushed By:	matthewd
Date:		Sun Dec 11 12:40:12 2022 -0800
Changeset:	a18dd94c4b65e2c4dd922d2733dea796f051850e

Modified Files:
	py-fastparquet/Makefile
	py-fastparquet/PLIST
	py-fastparquet/distinfo
Added Files:
	py-fastparquet/TODO
	py-fastparquet/files/PKG-INFO
	py-fastparquet/patches/patch-setup.py

Log Message:
py-fastparquet: Update to 2022.12.0

2022.12.0
    add py3.11 wheel builds
    check all int32 values before passing to thrift writer
    fix type of num_rows to i64 for big single file
2022.11.0
    Switch to calver
    Speed up loading of nullable types
    Allow schema evolution by addition of columns
    Allow specifying dtypes of output
    update to scm versioning
    update CI and use mamba
    fixes to row filter, statistics and tests
    support pathlib.Paths
    JSON encoder options

To see a diff of this commit:
https://wip.pkgsrc.org/cgi-bin/gitweb.cgi?p=pkgsrc-wip.git;a=commitdiff;h=a18dd94c4b65e2c4dd922d2733dea796f051850e

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

diffstat:
 py-fastparquet/Makefile               |  19 ++++-
 py-fastparquet/PLIST                  |  31 +++-----
 py-fastparquet/TODO                   |   5 ++
 py-fastparquet/distinfo               |   7 +-
 py-fastparquet/files/PKG-INFO         | 134 ++++++++++++++++++++++++++++++++++
 py-fastparquet/patches/patch-setup.py |  12 +++
 6 files changed, 182 insertions(+), 26 deletions(-)

diffs:
diff --git a/py-fastparquet/Makefile b/py-fastparquet/Makefile
index b45a720251..6aa5133802 100644
--- a/py-fastparquet/Makefile
+++ b/py-fastparquet/Makefile
@@ -2,19 +2,23 @@
 
 # Prefer pulling from github instead of pypi
 # so that we can run tests
-DISTNAME=	fastparquet-0.8.3
+GITHUB_TAG=	2022.12.0
+GITHUB_PROJECT=fastparquet
+DISTNAME=	fastparquet-${GITHUB_TAG}
 PKGNAME=	${PYPKGPREFIX}-${DISTNAME}
-GITHUB_PROJECT=	fastparquet
+
 CATEGORIES=	devel
 MASTER_SITES=	${MASTER_SITE_GITHUB:=dask/}
 
+
 MAINTAINER=	matthewd%fastmail.us@localhost
 HOMEPAGE=	https://github.com/dask/fastparquet
 COMMENT=	Python implementation of the parquet format,
 LICENSE=	apache-2.0
 
 USE_LANGUAGES=	c c++
-
+BUILD_DEPENDS+=       ${PYPKGPREFIX}-wheel>=0:../../devel/py-wheel
+BUILD_DEPENDS+=       ${PYPKGPREFIX}-packaging>=0:../../devel/py-packaging
 DEPENDS+=	${PYPKGPREFIX}-numpy>=1.18:../../math/py-numpy
 DEPENDS+=	${PYPKGPREFIX}-pandas>=1.1.0:../../math/py-pandas
 DEPENDS+=	${PYPKGPREFIX}-cramjam>=2.3.0:../../wip/py-cramjam
@@ -23,8 +27,15 @@ DEPENDS+=	${PYPKGPREFIX}-fsspec>=2022.3.0:../../wip/py-fsspec
 TEST_DEPENDS+=	${PYPKGPREFIX}-test-[0-9]*:../../devel/py-test
 TEST_DEPENDS+=	${PYPKGPREFIX}-test-runner-[0-9]*:../../devel/py-test-runner
 
+post-extract:
+	${CP} ${FILESDIR}/PKG-INFO ${WRKSRC}
+
+# The setup.py has integration with pytest-runner, but throws odd errors
+do-test:
+	cd ${WRKSRC} && ${SETENV} ${TEST_ENV} ${PYTHONBIN} setup.py build_ext -i
+	cd ${WRKSRC} && ${SETENV} ${TEST_ENV} ${PYTHONBIN} -m pytest
 
-.include "../../lang/python/egg.mk"
+.include "../../lang/python/wheel.mk"
 .include "../../math/py-numpy/buildlink3.mk"
 .include "../../devel/py-cython/buildlink3.mk"
 .include "../../mk/bsd.pkg.mk"
diff --git a/py-fastparquet/PLIST b/py-fastparquet/PLIST
index 643e6f8f13..8bb6a7702f 100644
--- a/py-fastparquet/PLIST
+++ b/py-fastparquet/PLIST
@@ -1,54 +1,47 @@
 @comment $NetBSD$
-${PYSITELIB}/${EGG_INFODIR}/PKG-INFO
-${PYSITELIB}/${EGG_INFODIR}/SOURCES.txt
-${PYSITELIB}/${EGG_INFODIR}/dependency_links.txt
-${PYSITELIB}/${EGG_INFODIR}/requires.txt
-${PYSITELIB}/${EGG_INFODIR}/top_level.txt
+${PYSITELIB}/${WHEEL_INFODIR}/INSTALLER
+${PYSITELIB}/${WHEEL_INFODIR}/LICENSE
+${PYSITELIB}/${WHEEL_INFODIR}/METADATA
+${PYSITELIB}/${WHEEL_INFODIR}/RECORD
+${PYSITELIB}/${WHEEL_INFODIR}/REQUESTED
+${PYSITELIB}/${WHEEL_INFODIR}/WHEEL
+${PYSITELIB}/${WHEEL_INFODIR}/direct_url.json
+${PYSITELIB}/${WHEEL_INFODIR}/top_level.txt
 ${PYSITELIB}/fastparquet/__init__.py
 ${PYSITELIB}/fastparquet/__init__.pyc
-${PYSITELIB}/fastparquet/__init__.pyo
+${PYSITELIB}/fastparquet/_version.py
+${PYSITELIB}/fastparquet/_version.pyc
 ${PYSITELIB}/fastparquet/api.py
 ${PYSITELIB}/fastparquet/api.pyc
-${PYSITELIB}/fastparquet/api.pyo
 ${PYSITELIB}/fastparquet/cencoding.c
 ${PYSITELIB}/fastparquet/cencoding.pyx
 ${PYSITELIB}/fastparquet/cencoding.so
 ${PYSITELIB}/fastparquet/compression.py
 ${PYSITELIB}/fastparquet/compression.pyc
-${PYSITELIB}/fastparquet/compression.pyo
 ${PYSITELIB}/fastparquet/converted_types.py
 ${PYSITELIB}/fastparquet/converted_types.pyc
-${PYSITELIB}/fastparquet/converted_types.pyo
 ${PYSITELIB}/fastparquet/core.py
 ${PYSITELIB}/fastparquet/core.pyc
-${PYSITELIB}/fastparquet/core.pyo
 ${PYSITELIB}/fastparquet/dataframe.py
 ${PYSITELIB}/fastparquet/dataframe.pyc
-${PYSITELIB}/fastparquet/dataframe.pyo
 ${PYSITELIB}/fastparquet/encoding.py
 ${PYSITELIB}/fastparquet/encoding.pyc
-${PYSITELIB}/fastparquet/encoding.pyo
+${PYSITELIB}/fastparquet/json.py
+${PYSITELIB}/fastparquet/json.pyc
 ${PYSITELIB}/fastparquet/parquet_thrift/__init__.py
 ${PYSITELIB}/fastparquet/parquet_thrift/__init__.pyc
-${PYSITELIB}/fastparquet/parquet_thrift/__init__.pyo
 ${PYSITELIB}/fastparquet/parquet_thrift/parquet/__init__.py
 ${PYSITELIB}/fastparquet/parquet_thrift/parquet/__init__.pyc
-${PYSITELIB}/fastparquet/parquet_thrift/parquet/__init__.pyo
 ${PYSITELIB}/fastparquet/parquet_thrift/parquet/ttypes.py
 ${PYSITELIB}/fastparquet/parquet_thrift/parquet/ttypes.pyc
-${PYSITELIB}/fastparquet/parquet_thrift/parquet/ttypes.pyo
 ${PYSITELIB}/fastparquet/schema.py
 ${PYSITELIB}/fastparquet/schema.pyc
-${PYSITELIB}/fastparquet/schema.pyo
 ${PYSITELIB}/fastparquet/speedups.c
 ${PYSITELIB}/fastparquet/speedups.pyx
 ${PYSITELIB}/fastparquet/speedups.so
 ${PYSITELIB}/fastparquet/thrift_structures.py
 ${PYSITELIB}/fastparquet/thrift_structures.pyc
-${PYSITELIB}/fastparquet/thrift_structures.pyo
 ${PYSITELIB}/fastparquet/util.py
 ${PYSITELIB}/fastparquet/util.pyc
-${PYSITELIB}/fastparquet/util.pyo
 ${PYSITELIB}/fastparquet/writer.py
 ${PYSITELIB}/fastparquet/writer.pyc
-${PYSITELIB}/fastparquet/writer.pyo
diff --git a/py-fastparquet/TODO b/py-fastparquet/TODO
new file mode 100644
index 0000000000..2a9962c83a
--- /dev/null
+++ b/py-fastparquet/TODO
@@ -0,0 +1,5 @@
+The pypi tarfile does not have the tests in the package.
+The github tarfile does not have the PKG-INFO file, so builds fail.
+So in the meantime, do this dance of pulling PKG-INFO from pypi, and
+using github, as it is worthwhile to have tests
+
diff --git a/py-fastparquet/distinfo b/py-fastparquet/distinfo
index 221c7be8b4..b0aba8a4ae 100644
--- a/py-fastparquet/distinfo
+++ b/py-fastparquet/distinfo
@@ -1,5 +1,6 @@
 $NetBSD$
 
-BLAKE2s (fastparquet-0.8.3.tar.gz) = 9d726c9b83804ec73fe2fadd960ab8d2f2d1d8a3e709f2d13ed362ec0b3ee368
-SHA512 (fastparquet-0.8.3.tar.gz) = 11f218889eaba686a2154bbfc826b8bc9274bc1468dba3426394e9d50d33aa434ee123bc489bdf963086cf2b31ac31f532d91b0ea8b4af0c3ed9a8cfddba7841
-Size (fastparquet-0.8.3.tar.gz) = 29210251 bytes
+BLAKE2s (fastparquet-2022.12.0.tar.gz) = 66035abccc61bfb5721204c3a367fa37342e2a28a80022a603e054bed5d32777
+SHA512 (fastparquet-2022.12.0.tar.gz) = 8f19d3d3201607ce8a396378828fc7679aca0e8f82de3ebfd2935ebb9631ebb87c55785955e34490b0f8910f669e508c107fd1b00981006654f9d65e12fa0274
+Size (fastparquet-2022.12.0.tar.gz) = 28903475 bytes
+SHA1 (patch-setup.py) = 85710df002f01fae09e08d543371e532699fea61
diff --git a/py-fastparquet/files/PKG-INFO b/py-fastparquet/files/PKG-INFO
new file mode 100644
index 0000000000..ede3c9eec9
--- /dev/null
+++ b/py-fastparquet/files/PKG-INFO
@@ -0,0 +1,134 @@
+Metadata-Version: 2.1
+Name: fastparquet
+Version: 2022.12.0
+Summary: Python support for Parquet file format
+Home-page: https://github.com/dask/fastparquet/
+Author: Martin Durant
+Author-email: mdurant%anaconda.com@localhost
+License: Apache License 2.0
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: System Administrators
+Classifier: License :: OSI Approved :: Apache Software License
+Classifier: Programming Language :: Python
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: Implementation :: CPython
+Requires-Python: >=3.8
+Provides-Extra: lzo
+License-File: LICENSE
+
+fastparquet
+===========
+
+.. image:: https://github.com/dask/fastparquet/actions/workflows/main.yaml/badge.svg
+    :target: https://github.com/dask/fastparquet/actions/workflows/main.yaml
+
+.. image:: https://readthedocs.org/projects/fastparquet/badge/?version=latest
+    :target: https://fastparquet.readthedocs.io/en/latest/
+
+fastparquet is a python implementation of the `parquet
+format <https://github.com/apache/parquet-format>`_, aiming integrate
+into python-based big data work-flows. It is used implicitly by
+the projects Dask, Pandas and intake-parquet.
+
+We offer a high degree of support for the features of the parquet format, and
+very competitive performance, in a small install size and codebase.
+
+Details of this project, how to use it and comparisons to other work can be found in the documentation_.
+
+.. _documentation: https://fastparquet.readthedocs.io
+
+Requirements
+------------
+
+(all development is against recent versions in the default anaconda channels
+and/or conda-forge)
+
+Required:
+
+- numpy
+- pandas
+- cython >= 0.29.23 (if building from pyx files)
+- cramjam
+- fsspec
+
+Supported compression algorithms:
+
+- Available by default:
+
+  - gzip
+  - snappy
+  - brotli
+  - lz4
+  - zstandard
+
+- Optionally supported
+
+  - `lzo <https://github.com/jd-boyd/python-lzo>`_
+
+
+Installation
+------------
+
+Install using conda, to get the latest compiled version::
+
+   conda install -c conda-forge fastparquet
+
+or install from PyPI::
+
+   pip install fastparquet
+
+You may wish to install numpy first, to help pip's resolver.
+This may install an appropriate wheel, or compile from source. For the latter,
+you will need a suitable C compiler toolchain on your system.
+
+You can also install latest version from github::
+
+   pip install git+https://github.com/dask/fastparquet
+
+in which case you should also have ``cython`` to be able to rebuild the C files.
+
+Usage
+-----
+
+Please refer to the documentation_.
+
+*Reading*
+
+.. code-block:: python
+
+    from fastparquet import ParquetFile
+    pf = ParquetFile('myfile.parq')
+    df = pf.to_pandas()
+    df2 = pf.to_pandas(['col1', 'col2'], categories=['col1'])
+
+You may specify which columns to load, which of those to keep as categoricals
+(if the data uses dictionary encoding). The file-path can be a single file,
+a metadata file pointing to other data files, or a directory (tree) containing
+data files. The latter is what is typically output by hive/spark.
+
+*Writing*
+
+.. code-block:: python
+
+    from fastparquet import write
+    write('outfile.parq', df)
+    write('outfile2.parq', df, row_group_offsets=[0, 10000, 20000],
+          compression='GZIP', file_scheme='hive')
+
+The default is to produce a single output file with a single row-group
+(i.e., logical segment) and no compression. At the moment, only simple
+data-types and plain encoding are supported, so expect performance to be
+similar to *numpy.savez*.
+
+History
+-------
+
+This project forked in October 2016 from `parquet-python`_, which was not designed
+for vectorised loading of big data or parallel access.
+
+.. _parquet-python: https://github.com/jcrobak/parquet-python
+
diff --git a/py-fastparquet/patches/patch-setup.py b/py-fastparquet/patches/patch-setup.py
new file mode 100644
index 0000000000..f422cef067
--- /dev/null
+++ b/py-fastparquet/patches/patch-setup.py
@@ -0,0 +1,12 @@
+$NetBSD$
+No need to call git status
+--- setup.py.orig	2022-12-05 18:49:37.000000000 +0000
++++ setup.py
+@@ -44,7 +44,6 @@ else:
+     extra = {'ext_modules': cythonize(modules, language_level=3)}
+
+ install_requires = open('requirements.txt').read().strip().split('\n')
+-subprocess.call(["git", "status"], stdout=sys.stdout, stderr=sys.stderr)
+
+ setup(
+     name='fastparquet',


Home | Main Index | Thread Index | Old Index