pkgsrc-WIP-changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

apache-arrow: update to 16.0.0



Module Name:	pkgsrc-wip
Committed By:	Matthew Danielson <matthewd%fastmail.us@localhost>
Pushed By:	matthewd
Date:		Thu Apr 4 08:19:41 2024 -0700
Changeset:	a8d2b132a38865edd0723daa03ec6ba5752d4a1e

Modified Files:
	apache-arrow/PLIST
	apache-arrow/distinfo
	apache-arrow/version.mk

Log Message:
apache-arrow: update to 16.0.0

apache Arrow 16.0.0 (2024-04-20 07:00:00)
Bug Fixes

    GH-20379 - [Java] Dataset Failed to update reservation while freeing bytes (#40101)
    GH-35081 - [Python] construct pandas.DataFrame with public API in to_pandas (#40897)
    GH-35369 - [Docs] Add a missing space after ref:IPC format <format-ipc> (#38276)
    GH-35718 - [Go][Parquet] Fix for null-only encoding panic (#39497)
    GH-36026 - [C++][ORC] Catch all ORC exceptions to avoid crash (#40697)
    GH-36026 - [Python] Fix ORC test segfault in the python wheel windows test (#40609)
    GH-37164 - [Python] Attach Python stacktrace to errors in ConvertPyError (#39380)
    GH-37841 - [Java] Dictionary decoding not using the compression factory from the ArrowReader (#38371)
    GH-37989 - [Python] Plug reference leaks when creating Arrow array from Python list of dicts (#40412)
    GH-38768 - [Python] Empty slicing an array backwards beyond the start is now empty (#40682)
    GH-38768 - [Python] Slicing an array backwards beyond the start now includes first item. (#39240)
    GH-38794 - [C++][S3] Handle conventional content-type for directories (#40147)
    GH-38821 - [C++] Strengthen handling of duplicate slashes in S3, GCS (#40371)
    GH-38828 - [R] Ensure that streams can be written to socket connections (#38897)
    GH-38833 - [C++] Avoid hash_mean overflow (#39349)
    GH-38923 - [GLib] Fix spelling (#38924)
    GH-38962 - [C++] Fix spelling (array) (#38963)
    GH-39291 - [Docs] Remove the “Show source” links from doc pages (#40167)
    GH-39309 - [Go][Parquet] handle nil bitWriter for DeltaBinaryPacked (#39347)
    GH-39310 - [CI][Java][Docs] Failed by new module-info-compiler Maven plugin
    GH-39416 - [GLib][Docs] Fixed Broken Link in README Content (#39896)
    GH-39424 - [CI][R] test-r-rhub-debian-gcc-devel-lto-latest fails not being able to install Arrow
    GH-39440 - [Python] Calling pyarrow.dataset.ParquetFileFormat.make_write_options as a class method results in a segfault (#40976)
    GH-39444 - [Python] Fix parquet import in encryption test (#40505)
    GH-39444 - [C++][Parquet] Fix crash in Modular Encryption (#39623)
    GH-39456 - [Go][Parquet] Arrow DATE64 Type Coerced to Parquet DATE Logical Type (#39460)
    GH-39466 - [Go][Parquet] Align Arrow and Parquet Timestamp Instant/Local Semantics (#39467)
    GH-39519 - [Swift] Fix null count when using reader (#39520)
    GH-39523 - [R] Don’t override explicitly set NOT_CRAN=false when on dev version (#39524)
    GH-39558 - [Java] Add SQL_ALL_TABLES_ARE_SELECTABLE, SQL_NULL_ORDERING and SQL_MAX_COLUMNS_IN_TABLE support to SqlInfoBuilder (#39561)
    GH-39579 - [Python] fix raising ValueError on _ensure_partitioning (#39593)
    GH-39683 - [Release] Use temporary direction with TEST_BINARY=1 (#39684)
    GH-39706 - [Archery] Fix benchmark diff subcommand (#39733)
    GH-39738 - [R] Support build against the last three released versions of Arrow (#39739)
    GH-39765 - [C++][Dataset] Fix failures in dataset-scanner-benchmark (#39794)
    GH-39769 - [C++][Device] Fix Importing nested and string types for DeviceArray (#39770)
    GH-39782 - [C++] Use correct (non-CPU) address of buffer in ExportDeviceArray (#39783)
    GH-39788 - [Python] Validate max_chunksize in Table.to_batches (#39796)
    GH-39841 - [GLib] Add support for GLib 2.56 again (#39842)
    GH-39857 - [C++] Improve error message for “chunker out of sync” condition (#39892)
    GH-39870 - [Go] Include buffered pages in TotalBytesWritten (#40105)
    GH-39874 - [CI][C++][Windows] Use pre-installed OpenSSL (#39882)
    GH-39883 - [CI][R][Windows] Use ci/scripts/install_minio.sh with Git bash (#39929)
    GH-39909 - [Java][CI] Update reference to Float16 testing file reference on Testing submodule (#39911)
    GH-39921 - [Go][Parquet] ColumnWriter not reset TotalCompressedBytes after Flush (#39922)
    GH-39925 - [Go][Parquet] Fix re-slicing in maybeReplaceValidity function (#39926)
    GH-39935 - [GLib][Docs] Use GI-DocGen instead of GTK-Doc (#40427)
    GH-39955 - [C++] Use make -j1 to install bundled bzip2 (#39956)
    GH-39965 - [C++] DatasetWriter avoid creating zero-sized batch when max_rows_per_file enabled (#39995)
    GH-39973 - [C++][CI] Disable debug memory pool for ASAN and Valgrind (#39975)
    GH-39992 - [CI][Docs][Java] ubuntu-docs uses Maven version in .env (#39993)
    GH-39996 - [Archery] Fix Crossbow build on a PR from a fork’s main branch (#40002)
    GH-39996 - [Archery] Fix Crossbow build on a PR from a fork’s main branch (#39997)
    GH-40038 - [Java] Export non empty offset buffer for variable-size layout through C Data Interface (#40043)
    GH-40039 - [Java][FlightRPC] Improve performance by removing unnecessary memory copies (#40042)
    GH-40040 - [C++][Gandiva] Make Gandiva’s default cache size to be 5000 for object code cache (#40041)
    GH-40052 - [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash issues on hierarchical namespace accounts (#40054)
    GH-40085 - [C++][FS][Azure] Validate containers in AzureFileSystem::Impl::MovePaths() (#40086)
    GH-40089 - [Go] Concurrent Recordset for receiving huge recordset (#40090)
    GH-40097 - [Go][FlightRPC] Enable disabling TLS (#40098)
    GH-40126 - [C++] Decimal types with different precisions and scales bind failed in resolve type when call arithmetic function (#40223)
    GH-40145 - [C++][Docs] Correct the console emitter link (#40146)
    GH-40153 - [C++][Python] Fix test_gdb failures on 32-bit (#40293)
    GH-40153 - [Python] Make Tensor.__getbuffer__ work on 32-bit platforms (#40294)
    GH-40153 - [Python] Avoid using np.take in Array.to_numpy() (#40295)
    GH-40153 - [Python][C++] Fix large file handling on 32-bit Python build (#40176)
    GH-40153 - [Python] Update size assumptions for 32-bit platforms (#40165)
    GH-40153 - [Python] Fix OverflowError in foreign_buffer on 32-bit platforms (#40158)
    GH-40171 - [Python] Add Type_FIXED_SIZE_LIST to _NESTED_TYPES set (#40172)
    GH-40181 - [C++] Support glog 0.7 build (#40230)
    GH-40183 - [C++] Fix cast function bind failed after add an alias name through AddAlias (#40200)
    GH-40199 - [R] dbplyr 2.5.0 forward compatibility (#40197)
    GH-40207 - [C++] TakeCC: Concatenate only once and delegate to TakeAA instead of TakeCA (#40206)
    GH-40227 - [R] ensure executable files in create_package_with_all_dependencies (#40232)
    GH-40233 - [C++] Fix an abort on asof_join_benchmark run for lost an arg (#40234)
    GH-40249 - [Java] Fix NPE in ArrowDatabaseMetadata (#40988)
    GH-40266 - [Python] Mark ListView as a nested type (#40265)
    GH-40268 - [Archery] Bump the version of pygit2, adapt to API changes (#40269)
    GH-40276 - [C++] Fix an simple buffer-overflow case in decimal_benchmark (#40277)
    GH-40279 - [C++] Reduce S3Client initialization time (#40299)
    GH-40306 - [C++] Fix a wrong total_bytes to generate StringType’s test data in vector_hash_benchmark (#40307)
    GH-40308 - [C++][Gandiva] Add support for compute module’s decimal promotion rules (#40434)
    GH-40316 - [Python] only allocate the ScalarMemoTable when used (#40565)
    GH-40327 - [C++][Parquet] Add missing config.h include in key_management_test.cc (#40330)
    GH-40331 - [C++][CMake] Add missing glog::glog dependency to arrow_util (#40332)
    GH-40334 - [C++][Gandiva] Add missing OpenSSL dependency to encrypt_utils_test.cc (#40338)
    GH-40366 - [C++] Remove const qualifier from Buffer::mutable_span_as (#40367)
    GH-40375 - [Python] Error compiling Cython files on Windows during release verification
    GH-40395 - [C++] Avoid simplifying expressions which call impure functions (#40396)
    GH-40398 - [C++] Expose protobuf dependency if opentelemetry or ORC are enabled (#40399)
    GH-40422 - [C++][FlightRPC] Add missing expiration_time arguments (#40425)
    GH-40431 - [C++] Move key_hash/key_map/light_array related files to internal for prevent using by users (#40484)
    GH-40432 - [C++] Add missing Threads::Threads dependency to arrow_static (#40433)
    GH-40439 - [Python] Fix flake8 failures in python/benchmarks/parquet.py (#40440)
    GH-40443 - [Python] Suppress python/examples/minimal_build/Dockerfile.* warnings (#40444)
    GH-40445 - [C++] Fix static build on Windows (#40446)
    GH-40500 - [C++] Ensure using bundled FlatBuffers (#40519)
    GH-40535 - [Docs][R] Set RETICULATE_PYTHON_ENV in order to find pyarrow (#40571)
    GH-40558 - [C++][CI] Fix TSAN and ASAN/UBSAN crashes (#40559)
    GH-40562 - [C++] Repair FileSystem merge error (#40564)
    GH-40566 - [C++] Fix 3.12 Python support (#40322)
    GH-40568 - [Java] Test failure in Dataset regarding TestAllTypes (#40662)
    GH-40591 - [R] Add extra CSS for navbar on pkgdown website (#40610)
    GH-40602 - [C++] Move mold linker flags to variables (#40603)
    GH-40615 - [Packaging][deb] Move libprotobuf-dev dependency to libarrow-dev from libarrow-flight-dev (#40617)
    GH-40616 - [Docs][GLib] Ensure overwriting placeholder front pages (#40618)
    GH-40619 - [Java] JDBC Adapter Build Issue (#40656)
    GH-40623 - [Python][Docs] Add workaround for autosummary (#40739)
    GH-40634 - [C#] ArrowStreamReader should not be null (#40765)
    GH-40642 - [Python] BUG: Empty slicing an array backwards beyond the start should be empty
    GH-40652 - [C++] Enlarge dest buffer according to dest offset for CopyBitmap benchmark (#40769)
    GH-40668 - [Ruby][CI] Require GLib 2.58 or later for timezone (#40669)
    GH-40672 - [Go][Parquet] Add proper build tags for min_max (#40676)
    GH-40674 - [GLib] Don’t assume gint64 and int64_t use the same type (#40736)
    GH-40693 - [Go] Fix Decimal type precision loss on GetOneForMarshal (#40694)
    GH-40700 - [Go][CI] test-debian-12-go-1.21 fails with `go: updates to go.mod needed`
    GH-40702 - [R] Avoid undocumented dbplyr internals in duckdb tests (#40710)
    GH-40703 - [CI][Packaging] Homebrew can’t install Python 3.12 on GHA runners (#40704)
    GH-40706 - [CI][Python] Activate ARROW_PYTHON_VENV if defined in sdist-test job (#40707)
    GH-40716 - [Java][Integration] Fix test_package_java in verification scripts (#40724)
    GH-40718 - [JS] Fix set visitor in vectors for js dates (#40725)
    GH-40719 - [Go] Make arrow.Null non-null for arrow.TypeEqual to work properly with new(arrow.NullType) (#40802)
    GH-40727 - [C++][Gandiva] ‘ilike’ function does not work (#40728)
    GH-40751 - [C++] Fix protobuf package name setting for builds with substrait (#40753)
    GH-40773 - [Java] add DENSEUNION case to StructWriters, resolves #40773 (#40809)
    GH-40775 - [Benchmarking][Java] Fix conbench timeout (#40786)
    GH-40788 - [C#] Override Accept in MapArray (#40789)
    GH-40790 - [C#] Account for offset and length when getting fields of a StructArray (#40805)
    GH-40792 - [C#] Fix slicing a previously sliced array (#40793)
    GH-40847 - [Go] update readme (#40877)
    GH-40851 - [JS] Fix nullcount and make vectors created from typed arrays not nullable (#40852)
    GH-40855 - [C++][ORC] Fix std::filesystem related link error with ORC 2.0.0 or later (#41023)
    GH-40858 - [R] Remove dangling commas from codegen.R (#40859)
    GH-40863 - [C++] Fix TSAN link error for module library (#40864)
    GH-40870 - [C#] Update CompareValidityBuffer() to pass when unspecified final bits are not identical (#40873)
    GH-40878 - [JAVA] Fix flight-sql-jdbc-driver shading issues (#40879)
    GH-40891 - [JS] Store Dates as TimestampMillisecond (#40892)
    GH-40893 - [Java][FlightRPC] Support IntervalMonthDayNanoVector in FlightSQL JDBC Driver (#40894)
    GH-40896 - [Java] Remove runtime dependencies on Eclipse, logback (#40904)
    GH-40898 - [C#] Do not import length-zero buffers from C Data Interface Arrays (#41054)
    GH-40900 - [Go] Fix Mallocator Weirdness (#40902)
    GH-40907 - [Java][FlightSQL] Shade slf4j-api in JDBC driver (#40908)
    GH-40952 - [Java][FlightSQL] Clean up flight-sql-jdbc-driver dependencies (#40953)
    GH-40954 - [CI] Fix use of obsolete docker-compose command on Github Actions (#40949)
    GH-40961 - [GLib] Suppress warnings for Vala examples on macOS (#40962)
    GH-40974 - [CI][Python] CI failures on Python builds due to pytest_cython (#40975)
    GH-40991 - [R] Prefer r-universe, add a startup message (#41019)
    GH-40999 - [Java] Fix AIOOBE trying to splitAndTransfer DUV within nullable struct (#41000)
    GH-41004 - [C++][FS][Azure] Don’t run TestGetFileInfoGenerator() with Valgrind (#41163)
    GH-41005 - [CI] HDFS and skyhook tests require docker compose usage because they require multiple containers (#41027)
    GH-41007 - [CI][Archery] Correctly interpolate environment variables from docker compose when using docker cli on archery docker (#41026)
    GH-41015 - [JS][Benchmarking] allow JS benchmarks to run more portably (#41031)
    GH-41016 - [C++] Fix null count check in BooleanArray.true_count() (#41070)
    GH-41024 - [C++] IO: fixing compiling in gcc 7.5.0 (#41025)
    GH-41032 - [C++][Parquet] Bugfixes and more tests in boolean arrow decoding (#41037)
    GH-41039 - [Python] ListView pandas tests should use np.nan instead of None (#41040)
    GH-41044 - [C++] formatting.h: Make sure space is allocated for the ‘Z’ when formatting timestamps (#41045)
    GH-41061 - [C++] Ignore ARROW_USE_MOLD/ARROW_USE_LLD with clang < 12 (#41062)
    GH-41088 - [CI][Crossbow] Fix GitHub Actions workflow syntax error (#41091)
    GH-41119 - [Archery][Packaging][CI] Avoid using –progress flag on Docker on Windows on archery (#41120)
    GH-41121 - [C++] Fix: left anti join filter empty rows. (#41122)
    GH-41124 - [CI][C++] Don’t use CMake 3.29.1 with vcpkg (#41151)
    GH-41127 - [CI] Use GitHub Actions instead of Azure Pipelines for docker-tests (#41153)
    GH-41145 - [R][CI] test-r-dev-duckdb fails installing duckdb (#41152)
    GH-41147 - [CI][C++] Use newer LLVM on Ubuntu 24.04 (#41150)
    GH-41154 - [C++] Fix Valgrind error in string-to-float16 conversion (#41155)
    GH-41167 - [CI][Release][GLib][Conda] Pin gobject-introspection to 1.78.1 (#41181)
    GH-41169 - [CI][Release] Specify –build-config explicitly on Windows (#41178)
    GH-41176 - [C++] Stop defining ARROW_TEST_MEMCHECK in config.h.cmake (#41177)
    GH-41201 - [C++] Fix mistake in integration test. Explicitly cast std::string to avoid compiler interpreting char* -> bool (#41202)

New Features and Improvements

    GH-18014 - [C++] Filesystem implementation for Azure Blob Storage
    GH-20127 - [Python][CI] Remove legacy hdfs tests from hdfs and hypothesis setup (#40363)
    GH-20127 - [Python] Remove deprecated pyarrow.filesystem legacy implementations (#39825)
    GH-20213 - [C++] Implement cast to/from halffloat (#40067)
    GH-20339 - [C++] Add residual filter support to swiss join (#39487)
    GH-23221 - [C++] Add support for building with Emscripten (#37821)
    GH-24826 - [Java] Add DUV.setOffset method (#40985)
    GH-24834 - [C#] Support writing compressed IPC data (#39871)
    GH-30915 - [C++][Python] Add missing methods to RecordBatch (#39506)
    GH-31545 - [GLib] Enable clang-format (#40451)
    GH-31735 - [Docs][Release] Move release verification guide to developers documentation (#39960)
    GH-33499 - [Python][CI] Support ORC in Windows wheels
    GH-34235 - [Python] Correct test marker for join_asof tests (#40666)
    GH-34235 - [Python] Add join_asof binding (#34234)
    GH-34865 - [C++][Java][Flight RPC] Add Session management messages (#34817)
    GH-35875 - [R] Update Readme (#40148)
    GH-35941 - [Dev][MATLAB] Add clang-format configuration to pre-commit (#40588)
    GH-36656 - [Dev] Validate in merge script if issue has an assigned milestone already (#40771)
    GH-37286 - [Java] Start adding nullability/nullness annotations (#37723)
    GH-37328 - [Python] Add a function to download and extract timezone database on Windows (#38179)
    GH-37381 - [Python][CI][Packaging] Enable ORC on Windows Appveyor CI and Windows wheels for pyarrow
    GH-37484 - [Python] Add a FixedSizeTensorScalar class (#37533)
    GH-37931 - [Python][CI][Dev][Python] Release and merge script errors (#37819)” (#40150)
    GH-38010 - [Python] Construct pyarrow.Field and ChunkedArray through Arrow PyCapsule Protocol (#40818)
    GH-38309 - [C++] build filesystems as separate modules (#39067)
    GH-38560 - [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations using xsimd (#40335)
    GH-38573 - [Java][FlightRPC] Try all locations in JDBC driver (#40104)
    GH-38659 - [CI][MATLAB][Packaging] Add MATLAB packaging task to crossbow tasks.yml (#38660)
    GH-38663 - [C++] Add support for service-specific endpoint for S3 using AWS_ENDPOINT_URL_S3 (#39160)
    GH-38703 - [C++][FS][Azure] Implement DeleteFile() (#39840)
    GH-38704 - [C++] Implement Azure FileSystem Move() via Azure DataLake Storage Gen 2 API (#39904)
    GH-38717 - [C++] Add ImportChunkedArray and ExportChunkedArray to/from ArrowArrayStream (#39455)
    GH-38916 - [R] Simplify dataset and table print output (#38917)
    GH-38988 - [Go] Expose dictionary size from DictionaryBuilder (#39521)
    GH-38998 - [Java] Build memory-core and memory-unsafe as JPMS modules (#39011)
    GH-39001 - [Java] Modularize remaining modules (#39221)
    GH-39057 - [CI][C++][Go] Don’t run jobs that use a self-hosted GitHub Actions Runner on fork (#39903)
    GH-39069 - [C++][FS][Azure] Use the generic filesystem tests (#40567)
    GH-39147 - [R] Add Bootstrap.r (#39148)
    GH-39231 - [C++][Compute] Add binary_slice kernel for fixed size binary (#39245)
    GH-39233 - [Compute] Add some duration kernels (#39358)
    GH-39270 - [C++] Avoid creating memory manager instance for every buffer view/copy (#39271)
    GH-39277 - [Python] Fix missing byte_width attribute on DataType class (#39592)
    GH-39330 - [Java][CI] Fix or suppress spurious errorprone warnings (#39529)
    GH-39336 - [C++][Parquet] Minor: Style enhancement for parquet::FileMetaData (#39337)
    GH-39352 - [FS][Azure] Enable azure in builds (#39971)
    GH-39377 - [C++] IO: Reuse same buffer in CompressedInputStream (#39807)
    GH-39385 - [C++] Use more permissable return code for rename (#39481)
    GH-39398 - [C++][Parquet] Use std::count in ColumnReader ReadLevels (#39397)
    GH-39427 - [GLib] Update script and documentation (#39428)
    GH-39463 - [C++] Support cast kernel from large string, (large) binary to dictionary (#40017)
    GH-39532 - [Python] Compatibility with NumPy 2.0
    GH-39549 - [C++] Pass -jN to make in external projects (#39550)
    GH-39552 - [Go] inclusion of option to use replacer when creating csv strings with go library (#39576)
    GH-39555 - [Packaging][Python] Enable building pyarrow against numpy 2.0 (#39557)
    GH-39560 - [C++][Parquet] Add integration test for BYTE_STREAM_SPLIT (#39570)
    GH-39574 - [Go] Enable PollFlightInfo in Flight RPC (#39575)
    GH-39621 - [CI][Packaging] Update vcpkg to 2023.11.20 release (#39622)
    GH-39651 - [Python] Basic pyarrow bindings for Binary/StringView classes (#39652)
    GH-39654 - [Java] Upgrade to Netty 4.1.105.Final (#39655)
    GH-39663 - [C++] Ensure top-level benchmarks present informative metrics (#40091)
    GH-39666 - [C++] Ensure CSV and JSON benchmarks present a bytes/s or items/s metric (#39764)
    GH-39667 - [C++] Ensure dataset benchmarks present a bytes/s or items/s metric (#39766)
    GH-39669 - [C++][Gandiva] Ensure Gandiva benchmarks present a bytes/s or items/s metric (#40435)
    GH-39680 - [Java] enable half float support on Java module (#39681)
    GH-39697 - [R] Source build should check if offline (#39699)
    GH-39702 - [GLib] Add support for time zone in GArrowTimestampDataType (#39717)
    GH-39704 - [C++][Parquet] Benchmark levels decoding (#39705)
    GH-39707 - [Java] Enable local build cache for Maven/Java build (#39708)
    GH-39718 - [C++][FS][Azure] Remove StatusFromErrorResponse as it’s not necessary (#39719)
    GH-39720 - [Swift] Switch reader to use arrow field instead of proto for building arrays (#39721)
    GH-39734 - [Java] Bump org.codehaus.mojo:exec-maven-plugin from 1.6.0 to 3.1.1 (#39696)
    GH-39747 - [C++][Parquet] Make BYTE_STREAM_SPLIT routines type-agnostic (#39748)
    GH-39752 - [Java] Remove Static imports for Utf8 Usage (#40683)
    GH-39761 - [Docs] Link to Go documentation references outdated documentation from 2018 (#39750)
    GH-39771 - [C++][Device] Generic CopyBatchTo/CopyArrayTo memory types (#39772)
    GH-39774 - [Go] Add public access to PreparedStatement handle (#39775)
    GH-39779 - [Python] Expose force_virtual_addressing in PyArrow (#39819)
    GH-39780 - [Python][Parquet] Support hashing for FileMetaData and ParquetSchema (#39781)
    GH-39812 - [Python] Add bindings for ListView and LargeListView (#39813)
    GH-39815 - [C++] Document and micro-optimize ChunkResolver::Resolve() (#39817)
    GH-39823 - [C++] Allow building cpp/src/arrow/*/.cc without waiting bundled libraries (#39824)
    GH-39837 - [Go][Flight] Allow cloning existing cookies in middleware (#39838)
    GH-39843 - [C++][Parquet] Parquet binary length overflow exception should contain the length of binary (#39844)
    GH-39845 - [C++][Parquet] Minor: avoid creating a new Reader object in Decoder::SetData (#39847)
    GH-39848 - [Python][Packaging] Build pyarrow wheels with numpy RC instead of nightly (#41097)
    GH-39852 - [Python] Support creating Binary/StringView arrays from python objects (#39853)
    GH-39855 - [Python] ListView support for pa.array() (#40160)
    GH-39859 - [R] Remove macOS from the allow list (#39861)
    GH-39863 - [C++] Thirdparty: Bump google benchmark to 1.8.3 (#39878)
    GH-39864 - [C++] DataType::ToString support optionally show metadata (#39888)
    GH-39872 - [Packaging][Ubuntu] Add support for Ubuntu 24.04 Noble Numbat (#39887)
    GH-39885 - [CI][MATLAB] Bump matlab-actions/setup-matlab and matlab-actions/run-tests from v1 to v2 (#39886)
    GH-39900 - [Java][CI] To upload Maven and Memory Netty Buffer Patch into Apache Nightly repository (#39901)
    GH-39910 - [Go] Add func to load prepared statement from ActionCreatePreparedStatementResult (#39913)
    GH-39928 - [C++][Gandiva] Accept LLVM 18 (#39934)
    GH-39930 - [C++] Use Requires instead of Libs for system RE2 in arrow.pc (#39932)
    GH-39946 - [Java] Bump com.puppycrawl.tools:checkstyle from 8.19 to 8.29 (#39694)
    GH-39958 - [Python][CI] Remove upper pin on pytest (#40487)
    GH-39962 - [C++] Small CSV reader refactoring (#39963)
    GH-39968 - [Python][FS][Azure] Minimal Python bindings for AzureFileSystem (#40021)
    GH-39978 - [C++][Parquet] Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64 (#40094)
    GH-39979 - [Python] Low-level bindings for exporting/importing the C Device Interface (#39980)
    GH-39984 - [Python] Add ChunkedArray import/export to/from C (#39985)
    GH-39987 - [R] Make it possible to use a rtools libarrow on windows (#39986)
    GH-40011 - [CI] Update Fedora to 39 from 38 (#40012)
    GH-40023 - [Python] Use Cast() instead of CastTo (#40116)
    GH-40026 - [C++][FS][Azure] Add support for reading user defined metadata (#40671)
    GH-40028 - [C++][FS][Azure] Add AzureFileSystem support to FileSystemFromUri() (#40325)
    GH-40029 - [Packaging][Ubuntu] Drop support for Ubuntu 23.10 Mantic Minotaur (#40030)
    GH-40037 - [C++][FS][Azure] Make attempted reads and writes against directories fail fast (#40119)
    GH-40055 - [Java][Docs] Simplify use of Filter and Expression into Dataset Substrait (#40056)
    GH-40059 - [C++][Python] Basic conversion of RecordBatch to Arrow Tensor (#40064)
    GH-40060 - [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for different data types (#40359)
    GH-40061 - [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add option to cast NULL to NaN (#40803)
    GH-40066 - [Python] Support requested_schema in __arrow_c_stream__() (#40070)
    GH-40074 - [C++][FS][Azure] Implement DeleteFile() for flat-namespace storage accounts (#40075)
    GH-40077 - [CI] Use GitHub hosted M1 macOS runner (#40437)
    GH-40079 - [CI][Packaging] Enable Azure in more tests and builds (#40080)
    GH-40082 - [CI][C++] Add a job on ARM64 macOS (#40456)
    GH-40092 - [Python] Support Binary/StringView conversion to numpy/pandas (#40093)
    GH-40095 - [C++][Parquet] Remove AVX512 variants of BYTE_STREAM_SPLIT encoding (#40127)
    GH-40113 - [Go][Parquet] New RegisterCodec function (#40114)
    GH-40133 - [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length (#40132)
    GH-40142 - [Python] Allow FileInfo instances to be passed to dataset init (#40143)
    GH-40151 - [C++] Make S3 narrative test more flexible (#40144)
    GH-40152 - [C++] Remove redundant invocation of BatchesFromTable (#40173)
    GH-40155 - [Go][FlightRPC][FlightSQL] Implement Session Management (#40284)
    GH-40159 - [Python][CI] Add 32-bit Debian build on Crossbow (#40164)
    GH-40190 - [R][Docs] Update NEWS.md with build system changes (#40191)
    GH-40205 - [Python] ListView arrow-to-pandas conversion (#40482)
    GH-40209 - [C++][CMake] Use “RapidJSON” CMake target for RapidJSON (#40210)
    GH-40212 - [R][CI] Add a C++ with gcc 14 build (#40244)
    GH-40221 - [C++][CMake] Use arrow/util/config.h.cmake instead of add_definitions() (#40222)
    GH-40224 - [C++] Fix: improve the backpressure handling in the dataset writer (#40722)
    GH-40228 - [C++][CMake] Improve description why we need to initialize AWS C++ SDK in arrow-s3fs-test (#40229)
    GH-40236 - [Python][CI] Disable generating C lines in Cython tracebacks (#40225)
    GH-40261 - [Go] Don’t export array functions with unexposed return types (#40272)
    GH-40273 - [Python] Support construction of Run-End Encoded arrays in pa.array(..) (#40341)
    GH-40274 - [C++] Add support for system glog 0.7 (#40275)
    GH-40280 - [C++] Specialize ResolvedChunk::Value on value-specific types instead of entire class (#40281)
    GH-40291 - [Python] Accept dict in pyarrow.record_batch() function (#40292)
    GH-40318 - [C++][Docs] Add documentation of array factories (#40373)
    GH-40323 - [R][CI] Use rocker/r-ver instead of library/r-base (#40321)
    GH-40328 - [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed (#40329)
    GH-40333 - [Docs] Improve env var docs for ARROW_USER_SIMD_LEVEL (#40374)
    GH-40345 - [FlightRPC][C++][Java][Go] Add URI scheme to reuse connection (#40084)
    GH-40357 - [C++] Add benchmark for ToTensor conversions (#40358)
    GH-40370 - [C++] Define ARROW_FORCE_INLINE for non-MSVC builds (#40372)
    GH-40376 - [Python] Update for NumPy 2.0 ABI change in PyArray_Descr->elsize (#40418)
    GH-40377 - [Python][CI] Fix install of nightly dask in integration tests (#40378)
    GH-40379 - [Python] Fix byte_width for binary(0) + fix hypothesis tests (#40381)
    GH-40394 - [C++] Add support for mold (#40397)
    GH-40400 - [C++] Add support for LLD (#40927)
    GH-40402 - [GLib] Add missing compute function options classes (#40403)
    GH-40405 - [C++] Produce better error message when Move is attempted on flat-namespace accounts (#40406)
    GH-40428 - [Python][CI] Fix dataset partition filter tests with pandas nightly (#40429)
    GH-40438 - [GLib] Add GArrowTimestampParser (#40457)
    GH-40441 - [GLib][Docs] Use Sphinx for Apache Arrow GLib front page (#40442)
    GH-40448 - [CI][Dev] Run pre-commit (#40449)
    GH-40454 - [CI][Debian] Update Debian to 12 from 11 (#40455)
    GH-40495 - [GLib] Use G_DECLARE_DERIVABLE_TYPE() (#40497)
    GH-40498 - [GLib] Remove arrow-glib/gobject-type.h (#40499)
    GH-40507 - [C++][ORC] Upgrade ORC to 2.0.0 (#40508)
    GH-40515 - [Java] Bump org.apache.maven dependencies from 3.3.9 to 3.8.7 (#40514)
    GH-40522 - [Dev][Go] Add Dependabot configuration for Go (#40523)
    GH-40536 - [CI] : Migrate remaining jobs away from self-hosted mac runners. (#40537)
    GH-40540 - [CI][C++] Don’t install FlatBuffers (#40541)
    GH-40542 - [Dev][CI] Run pre-commit to all files (#40543)
    GH-40544 - [Dev] Add cmake-format configuration to pre-commit (#40545)
    GH-40549 - [Java] Revert bump org.apache.maven.plugins:maven-shade-plugin from 3.2.4 to 3.5.2 in /java (#40462)” (#41006)
    GH-40551 - [Release][Docs] Improve documentation for patch Release process (#40552)
    GH-40553 - [C#] Avoid logger instantiations per request (#40554)
    GH-40573 - [GLib][Ruby][CSV] Add support for customizing timestamp parsers (#40590)
    GH-40575 - [Docs][Python] Added JsonFileFormat to docs (#40585)
    GH-40577 - [C++] Ensure pkg-config flags include -ldl for static builds (#40578)
    GH-40586 - [Dev][C++][Python][R] Use pre-commit for clang-format (#40587)
    GH-40607 - [C++] Rename Function::is_impure() to is_pure() (#40608)
    GH-40621 - [C++] Add missing util/config.h in arrow/io/compressed_test.cc (#40625)
    GH-40630 - [Go][Parquet] Enable writing of Parquet footer without closing file (#40654)
    GH-40659 - [Python][C++] Support conversion of pyarrow.RunEndEncodedArray to numpy/pandas (#40661)
    GH-40680 - [Java] Test JDK 22 in CI (#41038)
    GH-40684 - [Java][Docs] JNI module debugging with IntelliJ (#40685)
    GH-40689 - [Docs] Add nanoarrow to implementation status page (#41052)
    GH-40690 - [C#][FlightRPC] Add do_exchange csharp implementation (#40691)
    GH-40695 - [C++] Expand Substrait type support (#40696)
    GH-40698 - [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import (#40699)
    GH-40720 - [Python] Simplify and improve perf of creation of the column names in Table.to_pandas (#40721)
    GH-40731 - [C++][Parquet] Minor enhancement code of encryption (#40732)
    GH-40733 - [Go] Require Go 1.21 or later (#40848)
    GH-40745 - [Java][FlightRPC] Support configuring backpressure threshold (#41051)
    GH-40767 - [C++][Parquet] Simplify PageWriter and ColumnWriter creation (#40768)
    GH-40783 - [C++] Re-order loads and stores in MemoryPoolStats update (#40647)
    GH-40784 - [JS] Use bigIntToNumber (#40785)
    GH-40791 - [Dev][CI] Use the official hadolint configuration (#40794)
    GH-40796 - [Java] set lastSet in ListVector.setNull to avoid O(n²) in ListVectors with lots of nulls (#40810)
    GH-40799 - [Doc][Format] Implementation status page should list canonical extension types (#41053)
    GH-40801 - [Docs] Clarify device identifier documentation in the Arrow C Device data interface (#41101)
    GH-40806 - [C++] Revert changes from PR #40857 (#40980)
    GH-40806 - [C++] Correctly report asimd/neon in GetRuntimeInfo (#40857)
    GH-40814 - [C++] Thirdparty: bump zstd to 1.5.6 (#40837)
    GH-40833 - [Docs][Release] Make explicit in the documentation that verifying binaries is not required in order to case a vote (#40834)
    GH-40841 - [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842)
    GH-40843 - [Java] Cleanup protobuf-maven-plugin usage (#40844)
    GH-40866 - [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for row-major (#40867)
    GH-40872 - [C++][Parquet] Encoding: Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder (#40876)
    GH-40882 - [C++] Suppress shorten-64-to-32 warnings in CUDA/Skyhook codes (#40883)
    GH-40888 - [Go][FlightRPC] support conversion from array.Duration in FlightSQL driver (#40889)
    GH-40983 - [C++] Fix unused function build error (#40984)
    GH-40994 - [C++][Parquet] RleBooleanDecoder supports DecodeArrow with nulls (#40995)
    GH-41034 - [C++][FS][Azure] Adjust DeleteDir/DeleteDirContents/GetFileInfoSelector behaviors against Azure for generic filesystem tests (#41068)
    GH-41043 - [CI][Python] check message in test_make_write_options_error for Cython 2 (#41059)
    GH-41047 - [C#] Address performance issue of reading from StringArray (#41048)
    GH-41098 - [Python] Add copy keyword in Array.array for numpy 2.0+ compatibility (#41071)
    GH-41100 - [Python][Packaging] PyArrow wheel building is failing because of disabled vcpkg install of liblzma
    GH-41227 - [CI][Release][GLib][Conda] Unpin gobject-introspection (#41228)
    PARQUET-2423 - [C++][Parquet] Avoid allocating buffer object in RecordReader’s SkipRecords (#39818)

Apache Arrow 15.0.2 (2024-03-18 07:00:00)
Bug Fixes
GH-39582 - [C++][Acero] Increase size of Acero TempStack (#40007)
GH-39919 - [C++][Dataset] Add missing Protobuf static link dependency (#40015)
GH-39943 - [CI][Python] Update manylinux images to avoid GPG problems downloading packages (#39944)
GH-40068 - [C++] Possible data race when reading metadata of a parquet file (#40111)
GH-40252 - [C++] Make span SFINAE standards-conforming to enable compilation with nvcc (#40253)
GH-40386 - [Python] Fix except clauses (#40387)
GH-40485 - [Python][CI] Skip failing test_dateutil_tzinfo_to_string (#40486)
New Features and Improvements
GH-40248 - [R] fallback to the correct libtool when we find a GNU one (#40259)

Apache Arrow 15.0.1 (2024-03-07 08:00:00)
Bug Fixes
GH-38655 - [C++] “iso_calendar” kernel returns incorrect results for array length > 32 (#39360)
GH-39313 - [Python] Fix race condition in _pandas_api#_check_import (#39314)
GH-39332 - [C++] Explicit error in ExecBatchBuilder when appending var length data exceeds offset limit (int32 max) (#39383)
GH-39525 - [C++][Parquet] Pass memory pool to decoders (#39526)
GH-39527 - [C++][Parquet] Validate page sizes before truncating to int32 (#39528)
GH-39577 - [C++] Fix tail-word access cross buffer boundary in CompareBinaryColumnToRow (#39606)
GH-39582 - [C++][Acero] Random hangs when joining tables with ExecutePlan
GH-39583 - [C++] Fix the issue of ExecBatchBuilder when appending consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (#39585)
GH-39599 - [Python] Avoid leaking references to Numpy dtypes (#39636)
GH-39640 - [Docs] Pin pydata-sphinx-theme to 0.14.* (#39758)
GH-39640 - [Docs] Pin pydata-sphinx-theme to 0.14.1 (#39658)
GH-39656 - [Release] Update platform tags for macOS wheels to macosx_10_15 (#39657)
GH-39672 - [Go] Time to Date32/Date64 conversion issues for non-UTC timezones (#39674)
GH-39690 - [C++][FlightRPC] Fix nullptr dereference in PollInfo (#39711)
GH-39732 - [Python][CI] Fix test failures with latest/nightly pandas (#39760)
GH-39737 - [Release][Docs] Update post release documentation task (#39762)
GH-39740 - [C++] Fix filter and take kernel for month_day_nano intervals (#39795)
GH-39778 - [C++] Fix tail-byte access cross buffer boundary in key hash avx2 (#39800)
GH-39803 - [C++][Acero] Fix AsOfJoin with differently ordered schemas than the output (#39804)
GH-39860 - [C++] Expression ExecuteScalarExpression execute empty args function with a wrong result (#39908)
GH-39865 - [C++] Strip extension metadata when importing a registered extension (#39866)
GH-39897 - [C++] arrow::fs::FileSystemFromUri() not thread-safe with s3 URIs
GH-39916 - [C#] Restore support for .NET 4.6.2 (#40008)
GH-39933 - [R] Fix pointer conversion to Python for latest reticulate (#39969)
GH-39942 - [Python] Make capsule name check more lenient (#39977)
GH-39976 - [C++] Fix out-of-line data size calculation in BinaryViewBuilder::AppendArraySlice (#39994)
GH-40004 - [Python][FlightRPC] Release GIL in GeneratorStream (#40005)
GH-40068 - [C++] Possible data race when reading metadata of a parquet file
GH-40112 - [CI][Python] Ensure CPython is selected, not PyPy (#40131)
GH-40174 - [C++][CI][Parquet] Fixing parquet column_writer_test building (#40175)
GH-40386 - [Python] Python build was broken by Cython 3.0.9
New Features and Improvements
GH-39504 - [Docs] Update footer in main sphinx docs with correct attribution (#39505)
GH-39673 - [C++] PollFlightInfo does not follow rule of 5
GH-39849 - [Python] Remove the use of pytest-lazy-fixture (#39850)
GH-39876 - [C++] Thirdparty: Bump zlib to 1.3.1 (#39877)
GH-39880 - [Python][CI] Pin moto<5 for dask integration tests (#39881)
GH-39999 - [Python] Fix tests for pandas with CoW / nightly integration tests (#40000)
GH-40009 - [C++] Add missing “#include " (#40010)
GH-40248 - [R] Support gnu libtool?

To see a diff of this commit:
https://wip.pkgsrc.org/cgi-bin/gitweb.cgi?p=pkgsrc-wip.git;a=commitdiff;h=a8d2b132a38865edd0723daa03ec6ba5752d4a1e

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

diffstat:
 apache-arrow/PLIST      | 26 ++++++++++++--------------
 apache-arrow/distinfo   |  6 +++---
 apache-arrow/version.mk |  2 +-
 3 files changed, 16 insertions(+), 18 deletions(-)

diffs:
diff --git a/apache-arrow/PLIST b/apache-arrow/PLIST
index 7484b9f584..cbf828f6a7 100644
--- a/apache-arrow/PLIST
+++ b/apache-arrow/PLIST
@@ -75,9 +75,6 @@ include/arrow/compute/expression.h
 include/arrow/compute/function.h
 include/arrow/compute/function_options.h
 include/arrow/compute/kernel.h
-include/arrow/compute/key_hash.h
-include/arrow/compute/key_map.h
-include/arrow/compute/light_array.h
 include/arrow/compute/ordering.h
 include/arrow/compute/registry.h
 include/arrow/compute/row/grouper.h
@@ -121,6 +118,7 @@ include/arrow/extension_type.h
 include/arrow/filesystem/api.h
 include/arrow/filesystem/azurefs.h
 include/arrow/filesystem/filesystem.h
+include/arrow/filesystem/filesystem_library.h
 include/arrow/filesystem/gcsfs.h
 include/arrow/filesystem/hdfs.h
 include/arrow/filesystem/localfs.h
@@ -399,8 +397,8 @@ lib/cmake/Arrow/ArrowOptions.cmake
 lib/cmake/Arrow/ArrowTargets-release.cmake
 lib/cmake/Arrow/ArrowTargets.cmake
 lib/cmake/Arrow/FindBrotliAlt.cmake
-lib/cmake/Arrow/FindGLOG.cmake
 lib/cmake/Arrow/FindSnappyAlt.cmake
+lib/cmake/Arrow/FindglogAlt.cmake
 lib/cmake/Arrow/Findlz4Alt.cmake
 lib/cmake/Arrow/Findre2Alt.cmake
 lib/cmake/Arrow/Findutf8proc.cmake
@@ -421,21 +419,21 @@ lib/cmake/Parquet/ParquetTargets-release.cmake
 lib/cmake/Parquet/ParquetTargets.cmake
 lib/libarrow.a
 lib/libarrow.so
-lib/libarrow.so.1500
-lib/libarrow.so.1500.0.0
+lib/libarrow.so.1600
+lib/libarrow.so.1600.0.0
 lib/libarrow_acero.a
 lib/libarrow_acero.so
-lib/libarrow_acero.so.1500
-lib/libarrow_acero.so.1500.0.0
+lib/libarrow_acero.so.1600
+lib/libarrow_acero.so.1600.0.0
 lib/libarrow_bundled_dependencies.a
 lib/libarrow_dataset.a
 lib/libarrow_dataset.so
-lib/libarrow_dataset.so.1500
-lib/libarrow_dataset.so.1500.0.0
+lib/libarrow_dataset.so.1600
+lib/libarrow_dataset.so.1600.0.0
 lib/libparquet.a
 lib/libparquet.so
-lib/libparquet.so.1500
-lib/libparquet.so.1500.0.0
+lib/libparquet.so.1600
+lib/libparquet.so.1600.0.0
 lib/pkgconfig/arrow-acero.pc
 lib/pkgconfig/arrow-compute.pc
 lib/pkgconfig/arrow-csv.pc
@@ -445,8 +443,8 @@ lib/pkgconfig/arrow-json.pc
 lib/pkgconfig/arrow.pc
 lib/pkgconfig/parquet.pc
 share/arrow/gdb/gdb_arrow.py
-share/arrow/gdb/libarrow.so.1500.0.0-gdb.py
+share/arrow/gdb/libarrow.so.1600.0.0-gdb.py
 share/doc/arrow/LICENSE.txt
 share/doc/arrow/NOTICE.txt
 share/doc/arrow/README.md
-@pkgdir share/gdb/auto-load/home/matthew/pkgsrc/install.20231221/lib
+@pkgdir share/gdb/auto-load/home/matthew/pkgsrc/install.20240420/lib
diff --git a/apache-arrow/distinfo b/apache-arrow/distinfo
index a82ecb704b..258243a687 100644
--- a/apache-arrow/distinfo
+++ b/apache-arrow/distinfo
@@ -3,9 +3,9 @@ $NetBSD$
 BLAKE2s (9.0.1.tar.gz) = a785e1ad5fd5df76c95e7cf9a6eadeb86ffbc46ea4342f49f19381434bd0f78c
 SHA512 (9.0.1.tar.gz) = ed56287f608ccdf5bc5d5fc2918e313e7c4cecdd9ef2c9993a72ea900d9ff662c57ac5326c7a809eb11505c6f39d4599f3f161b97b6e03c65783b824b8d700d2
 Size (9.0.1.tar.gz) = 215065 bytes
-BLAKE2s (apache-arrow-15.0.0.tar.gz) = 04d54ce9da23d76b9cfc650e0c39af3b85340c9092368b08587c99c92b9c7eff
-SHA512 (apache-arrow-15.0.0.tar.gz) = d5dccaa0907b0e6f2a460e32ae75091942dcb70b51db4aefe2767ee8d99882694607b723a9c06898dda3938d8eb498258d7f9aad11054665b6ea9c2fbaeafa74
-Size (apache-arrow-15.0.0.tar.gz) = 21491996 bytes
+BLAKE2s (apache-arrow-16.0.0.tar.gz) = 103ca1044caec5c76cd460e047cdb620f954bb579a88522eeda745f74a1a3d23
+SHA512 (apache-arrow-16.0.0.tar.gz) = 773f4f3eef603032c8ba0cfdc023bfd2a24bb5e41c82da354a22d7854ab153294ede1f4782cc32b27451cf1b58303f105bac61ceeb3568faea747b93e21d79e4
+Size (apache-arrow-16.0.0.tar.gz) = 21695067 bytes
 BLAKE2s (jemalloc-5.3.0.tar.bz2) = 285e6145b9d3b575b1ec5cfdae8af40b461149085f001839d64685c0d56e2689
 SHA512 (jemalloc-5.3.0.tar.bz2) = 22907bb052096e2caffb6e4e23548aecc5cc9283dce476896a2b1127eee64170e3562fa2e7db9571298814a7a2c7df6e8d1fbe152bd3f3b0c1abec22a2de34b1
 Size (jemalloc-5.3.0.tar.bz2) = 736023 bytes
diff --git a/apache-arrow/version.mk b/apache-arrow/version.mk
index 0baf4349fd..2a50723d81 100644
--- a/apache-arrow/version.mk
+++ b/apache-arrow/version.mk
@@ -1,2 +1,2 @@
 # $NetBSD$
-APACHE_ARROW_VERSION=	15.0.0
+APACHE_ARROW_VERSION=	16.0.0



Home | Main Index | Thread Index | Old Index