pkgsrc-WIP-changes archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
apache-arrow: Update to 17.0.0
Module Name: pkgsrc-wip
Committed By: Matthew Danielson <matthewd%fastmail.us@localhost>
Pushed By: matthewd
Date: Mon Sep 16 05:08:27 2024 -0700
Changeset: 0bcd1546ad9798e40b42ffb3cd5c535f98108fb0
Modified Files:
apache-arrow/Makefile
apache-arrow/PLIST
apache-arrow/distinfo
apache-arrow/version.mk
Log Message:
apache-arrow: Update to 17.0.0
Also, enable more features by default
Changelog
Apache Arrow 17.0.0 (2024-07-16 07:00:00+00:00)
Bug Fixes
GH-15053 - [C++] Add option to string ‘center’ kernel to control left/right alignment on odd number of padding (#41449)
GH-30866 - [Java] fix SplitAndTransfer throws for (0,0) if vector empty (#41066)
GH-34484 - [Substrait] add an option to disable augmented fields (#41583)
GH-37669 - [C++][Python] Fix casting to extension type with fixed size list storage type (#42219)
GH-38553 - [C++] Replace null_count with MayHaveNulls in ListArrayFromArray and MapArray (#41957)
GH-38575 - [Python] Include metadata when creating pa.schema from PyCapsule (#41538)
GH-38770 - [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971)
GH-39129 - [Python] pa.array: add check for byte-swapped numpy arrays inside python objects (#41549)
GH-39489 - [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType
GH-39645 - [Python] Fix read_table for encrypted parquet (#39438)
GH-40270 - [C++] Use LargeStringArray for casting when writing tables to CSV (#40271)
GH-40560 - [Python] RunEndEncodedArray.from_arrays: bugfix for Array arguments (#40560) (#41093)
GH-40750 - [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871)
GH-40913 - [C++] Fix compile warning with ‘implicitly-defined constructor does not initialize’ in encoding_benchmark (#41060)
GH-40997 - [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 (#40998)
GH-41112 - [C++] Clean up unused parameter warnings (#41111)
GH-41149 - [C++][Acero] Fix asof join race (#41614)
GH-41164 - [C#] Fix concatenation of sliced arrays (#41245)
GH-41190 - [C++] support for single threaded joins (#41125)
GH-41192 - [C++] Fix hashjoin benchmark failed at make utf8’s random batches (#41195)
GH-41198 - [C#] Fix concatenation of union arrays (#41226)
GH-41199 - [C#] Fix accessing values of a sliced decimal array (#41200)
GH-41258 - [C#][Integration] Fix comparison of sliced validity buffers with non-zero offsets (#41259)
GH-41263 - [C#][Integration] Ensure offset is considered in all branches of the bitmap comparison (#41264)
GH-41282 - [Dev] Always prompt next major version on merge script if it exists (#41305)
GH-41306 - [C++] Check to avoid copying when NullBitmapBuffer is Null (#41452)
GH-41317 - [C++] Fix crash on invalid Parquet file (#41366)
GH-41319 - [Python] `test_numpy_array_protocol` test failures with numpy 2.0.0rc1
GH-41321 - [C++][Parquet] More strict Parquet level checking (#41346)
GH-41329 - [C++][Gandiva] Fix gandiva cache size env var (#41330)
GH-41340 - [C++][CMake][Windows] Remove needless .dll suffix from link libraries (#41341)
GH-41343 - [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
GH-41356 - [Release][Docs] Update post release documentation task to remove the warnings banner for stable version (#41377)
GH-41367 - [C++][maybe_unused] with Arrow macro (#41359)
GH-41371 - [CI][Release] Use the latest Ruby on macOS (#41379)
GH-41390 - [CI] Use setup-python GitHub action on csharp macOS job (#41392)
GH-41397 - [C#] Downgrade macOS test runner to avoid infrastructure bug (#41934)
GH-41418 - [C++][Large] ListView and Map nested types for scalar_if_else’s kernel functions (#41419)
GH-41426 - [R][CI] Install CRAN style openssl on gh runners. (#41629)
GH-41433 - [C++][Gandiva] Fix ascii_utf8 function to return same result on x86 and Arm (#41434)
GH-41464 - [Python] Fix StructArray.sort() for by=None (#41495)
GH-41467 - [CI][Release] Don’t push conda-verify-rc image (#41468)
GH-41470 - [C++] Reuse deduplication logic for direct registration (#41466)
GH-41471 - [Java] Fix performance uber-jar (#41473)
GH-41475 - [Python] Build with Python 3.13 (#42034)
GH-41478 - [C++] Clean up more redundant move warnings (#41487)
GH-41491 - [Python] remove special methods related to buffers in python <2.6 (#41492)
GH-41502 - [Python] Fix reading column index with decimal values (#41503)
GH-41529 - [C++][Compute] Remove redundant logic for ArrayData as ExecResults in ExecScalarCaseWhen (#41380)
GH-41534 - [Go] Fix mem leak importing 0 length C Array (#41535)
GH-41541 - [Go][Parquet] More fixes for writer performance regression (#42003)
GH-41541 - [Go][Parquet] Fix writer performance regression (#41638)
GH-41571 - [Java] Revert GH-41307 (#41309) (#41628)
GH-41573 - [Java] VectorSchemaRoot uses inefficient stream to copy fieldVectors (#41574)
GH-41581 - [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
GH-41587 - [Docs][Python] Remove duplicate contents (#41588)
GH-41602 - [C#] Resolve build warnings (#41645)
GH-41617 - [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
GH-41630 - [Benchmarking] Fix out-of-source build in benchmarks (#41631)
GH-41648 - [Java] Memory Leak about splitAndTransfer (#41898)
GH-41660 - [CI][Java] Restore devtoolset relatead GANDIVA_CXX_FLAGS (#41661)
GH-41679 - [Release][Packaging][deb] Update package name in 01-preparesh too (#41859)
GH-41684 - [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757)
GH-41686 - [Java] Nullability of struct child vectors not preserved in TransferPair (#41785)
GH-41688 - [Dev] Include all relevant CMakeLists.txt files in cmake-format precommit hook (#41689)
GH-41697 - [Go][Parquet] Release BufferWriter when BufferedPageWriter is closed (#41698)
GH-41699 - [Python][Parquet] Implement to_dict method on SortingColumn (#41704)
GH-41711 - [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
GH-41717 - [Java][Vector] fix issue with ByteBuffer rewind in MessageSerializer (#41718)
GH-41720 - [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark (#41716)
GH-41725 - [Python] CMake: ignore Parquet encryption option if Parquet itself is not enabled (fix Java integration build) (#41776)
GH-41735 - [CI][Archery] Update archery to be compatible with pygit2 1.15 API change (#41739)
GH-41738 - [C++] Fix the issue that temp vector stack may be under sized (#41746)
GH-41741 - [C++] Check that extension metadata key is present before attempting to delete it (#41763)
GH-41758 - [Python] Disallow direct pa.RecordBatchReader() construction to avoid segfaults (#41773)
GH-41771 - [C++] Iterator releases its resource immediately when it reads all values (#41824)
GH-41780 - [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
GH-41784 - [Packaging][RPM] Use SO version for -libs package name (#41838)
GH-41787 - Update fmpp-maven-plugin output directory (#41788)
GH-41791 - [CI][Conda] Update azure.linux.yml task, replace CondaEnvironment@1 with Bash@3 (#41883)
GH-41813 - [C++] Fix avx2 gather offset larger than 2GB in CompareColumnsToRows (#42188)
GH-41829 - [R] Update relative URLs in README to absolute paths to prevent CRAN check failures (#41830)
GH-41836 - [Java] Fix an undefined symbol error when ARROW_S3=OFF (#41837)
GH-41862 - [C++][S3] Fix potential deadlock when closing output stream (#41876)
GH-41884 - [Python] Fix RecordBatchReader.cast to support casting to equal schema for all types (#42098)
GH-41902 - [Java] Variadic Buffer Counts Incorrect (#41930)
GH-41903 - [CI][GLib] Use the latest Ruby to use OpenSSL 3 (#42001)
GH-41920 - [CI][JS] Add missing build directory argument (#41921)
GH-41924 - [Python] Fix tests when using NumPy 2.0 on Windows (#42099)
GH-41964 - [CI][C++] Clear cache for mamba on AppVeyor (#41977)
GH-42005 - [Java][Integration][CI] Fix ARROW_BUILD_ROOT Path to ␈find pom.xml (#42008)
GH-42006 - [CI][Python] Use pip install -e instead of setup.py build_ext –inplace for installing pyarrow on verification script (#42007)
GH-42015 - [MATLAB] Executing tfeather.m test class causes MATLAB to crash on windows-2022 after MSVC update from 14.39.33519 to 14.40.33807 (#42123)
GH-42017 - [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022)
GH-42039 - [Docs][Go] Fix broken link (#42040)
GH-42041 - [Swift] Fix nullable type decoder issue (#42043)
GH-42065 - [C++] Support list-views on list_slice (#42067)
GH-42104 - [C++] Fix an OTel test failure and remove needless logs (#42122)
GH-42107 - [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol (#42108)
GH-42116 - [C++] Support list-view typed arrays in array_take and array_filter (#42117)
GH-42130 - [GLib] Fix building gir files with MSVC (#42131)
GH-42136 - [CI][Go][Java][JS] Use AMD64-based macOS explicitly (#42175)
GH-42139 - [C++] Fix some potential uninitialized variable warnings (#42207)
GH-42140 - [C++] Avoid invalid accesses in parquet-encoding-benchmark (#42141)
GH-42149 - [C++] Use FetchContent for bundled ORC (#43011)
GH-42170 - [Python][CI] Update expected output for numpy 2.0.0 (#42172)
GH-42197 - [CI][Packaging][Java] Ensure updating “python@*” formulae on macOS (#42202)
GH-42198 - [C++] Fix GetRecordBatchPayload crashes for device data (#42199)
GH-42208 - [Java] Fix the Test in flight-sql-jdbc-driver Module (#42217)
GH-42213 - [Swift] Use “–warnings-as-errors” only on CI (#42214)
GH-42220 - [R] handle vctrs_rcrd extension type in metadata cleaning (#42226)
GH-42224 - [Java] Fix Typo in TestAceroSubstraitConsumer Test Method (#42225)
GH-42232 - [C++] Use non-stale c-ares download URL (#42250)
GH-42234 - [CI][R] Disable libarrow binary use on valgrind tests (#42249)
GH-43048 - [JAVA] Fix IndexOutOfBoundsException message by reporting index correctly (#43049)
GH-43058 - [C#] Revert upgrade of Xunit from 2.8.0 to 2.8.1 (#43074)
GH-43059 - [CI][Gandiva] Disable Python Gandiva tests on AlmaLinux 8 (#43093)
GH-43062 - [Go] Use calloc instead of malloc (#43052)
GH-43070 - [C++][Parquet] Check for valid ciphertext length to prevent segfault (#43071)
GH-43116 - [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as large memory test (#43128)
GH-43119 - [CI][Packaging] Update manylinux 2014 CentOS repos that have been deprecated (#43121)
GH-43122 - [CI][Packaging][RPM][CentOS] Use vault.centos.org for SCL (#43127)
GH-43134 - [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
GH-43158 - [Packaging] Use bundled nlohmann/json on AlmaLinux 8/CentOS Stream 8 (#43159)
GH-43199 - [CI][Packaging] dev/release/utils-create-release-tarball.sh should not include the release candidate number in the name of the tarball’s top-level directory. (#43200)
GH-43204 - [CI][Packaging] Apply vcpkg patch to fix Thrift version (#43208)
New Features and Improvements
GH-29537 - [R] Support mutate/summarize with implicit join (#41350)
GH-33484 - [C++][Compute] Implement Grouper::Reset (#41352)
GH-35804 - [CI][Packaging][Conan] Synchronize upstream conan (#39729)
GH-35888 - [Java] Add FlightStatusCode.RESOURCE_EXHAUSTED (#41508)
GH-37333 - [Python] Replace pandas.util.testing.rands with vendored version (#42089)
GH-37720 - [Go][FlightSQL] Add prepared statement handle to DoPut result (#40311)
GH-37728 - [Java] Add methods to get an Iterable for a ValueVector (#41895)
GH-37929 - [Python] begin moving static settings to pyproject.toml (#41041)
GH-37938 - [Swift] Add initial C data interface implementation (#41342)
GH-38255 - [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
GH-38325 - [Python] Implement PyCapsule interface for Device data in PyArrow (#40717)
GH-38325 - [Python] Expand the Arrow PyCapsule Interface with C Device Data support (#40708)
GH-38692 - [C#] Implement ICollection<T?> on scalar arrays (#41539)
GH-39204 - [Format][FlightRPC][Docs] Stabilize Flight SQL (#41657)
GH-39220 - [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array (#43043)
GH-39301 - [Archery][CI][Integration] Add nanoarrow to archery + integration setup (#39302)
GH-39344 - [C++][FS][Azure] Support azure cli auth (#41976)
GH-39345 - [C++][FS][Azure] Add support for environment credential (#41715)
GH-39649 - [Java][CI] Fix or suppress spurious errorprone warnings stage 2 (#39777)
GH-39722 - [JS] Clean up packaging (#39723)
GH-39798 - [C++] Optimize Take for fixed-size types including nested fixed-size lists (#41297)
GH-39858 - [C++][Device] Add Copy/View slice functions to a CPU pointer (#41477)
GH-39898 - [C++] Add support for OpenTelemetry logging (#39905)
GH-39990 - [Docs][CI] Add sphinx-lint for docs linting (#40022)
GH-40078 - [C++] Import/Export ArrowDeviceArrayStream (#40807)
GH-40339 - [Java] StringView Initial Implementation (#40340)
GH-40342 - [Python] Fix pickling of LocalFileSystem for cython 2 (#41459)
GH-40342 - [C++] move LocalFileSystem to the registry (#40356)
GH-40361 - [C++] Make flatbuffers serialization more deterministic (#40392)
GH-40384 - [Python] Expand the C Device Interface bindings to support import on CUDA device (#40385)
GH-40494 - [Go] add support for protobuf messages (#40496)
GH-40644 - [Python] Allow passing a mapping of column names to rename_columns (#40645)
GH-40734 - [Packaging][Debian] Drop support for Debian bullseye (#41394)
GH-40749 - [Python][Packaging] Strip unnecessary symbols when building wheels (#42028)
GH-40819 - [Java] Adding Spotless to Algorithm module (#41825)
GH-40820 - [Java] Adding Spotless to Adapter module (#42048)
GH-40822 - [Java] Adding Spotless to C module (#42059)
GH-40823 - [Java] Adding Spotless to Compression module (#42060)
GH-40824 - [Java] Adding Spotless to Dataset module (#42062)
GH-40825 - [Java] Adding Spotless to Flight module (#42063)
GH-40826 - [Java] Adding Spotless to Format module
GH-40827 - [Java] Adding Spotless to Gandiva module (#42055)
GH-40828 - [Java] Format arrow-maven-plugins modules (#42054)
GH-40829 - [Java] Adding Spotless to Memory modules (#42056)
GH-40830 - [Java] Adding Spotless to Performance module (#42057)
GH-40831 - [Java] Adding Spotless to Tools module (#42058)
GH-40832 - [Java] Adding Spotless to Vector module (#42061)
GH-40930 - [Java] Implement a function to retrieve reference buffers in StringView (#41796)
GH-40932 - [Java] Implement TransferPair functionality for StringView (#41861)
GH-40933 - [Java] Enhance the copyFrom* functionality in StringView (#41752)
GH-40942 - [Java] Implement C Data Interface for StringView (#41967)
GH-40943 - [Java] Implement RangeEqualsVisitor for StringView (#41636)
GH-40944 - [Java] Implement TypeEqualsVisitor for StringView (#41606)
GH-40968 - [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like function (#40970)
GH-41020 - [C++] Introduce portable compiler assumptions (#41021)
GH-41035 - [C++] Add a grouper benchmark for preventing performance regression (#41036)
GH-41055 - [C++] Support flatten for combining nested list related types (#41092)
GH-41085 - [CI][Java] Add Spark integration tests to “java” group in Crossbow tasks (#41086)
GH-41089 - [C++] Clean up remaining tasks related to half float casts (#41084)
GH-41095 - [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support (#41276)
GH-41102 - [Packaging][Release] Create unique git tags for release candidates (e.g. apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM}) (#41131)
GH-41105 - [Python][Docs] Update PyArrow installation docs for conda package split (#41135)
GH-41114 - [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
GH-41116 - [C++] IO: enhance boundary checking in CompressedInputStream (#41117)
GH-41126 - [Python] Basic bindings for Device and MemoryManager classes (#41685)
GH-41134 - [GLib] Support building arrow-glib with MSVC (#41599)
GH-41159 - [Go][Parquet] Improvement Parquet BitWriter WriteVlqInt Performance (#41160)
GH-41173 - [Java] Add spotless configuration for Maven pom.xml files (#41174)
GH-41183 - [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295)
GH-41186 - [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst (#41187)
GH-41203 - [Python][Packaging] Ensure to build with released numpy 2.0 (instead of RC) in the wheel building workflows (#42194)
GH-41240 - [Release][Packaging] Use Debian bookworm for uploading binaries (#41241)
GH-41243 - [Release][Packaging] Avoid needless download by “archery crossbow download-artifacts” (#41244)
GH-41256 - [Format][Docs] Add a canonical extension type specification for JSON (#41257)
GH-41262 - [Java][FlightSQL] Implement stateless prepared statements (#41237)
GH-41287 - [Java] ListViewVector Implementation (#41285)
GH-41298 - [Format][Docs] Add a canonical extension type specification for UUID (#41299)
GH-41301 - [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type (#41373)
GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41772)
GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41309)
GH-41314 - [CI][Python] Add a job on ARM64 macOS (#41313)
GH-41316 - [CI][Python] Reduce CI time on macOS (#41378)
GH-41323 - [R] Redo how summarize() evaluates expressions (#41223)
GH-41327 - [Ruby] Show type name in Arrow::Table#to_s (#41328)
GH-41334 - [C++][Acero] Use per-node basis temp vector stack to mitigate overflow (#41335)
GH-41349 - [C#] Optimize DecimalUtility.GetBytes(SqlDecimal) on .NET 7+ (#42150)
GH-41358 - [R] Support join “na_matches” argument (#41372)
GH-41361 - [C++][Parquet] Optimize DelimitRecords by batch execution when max_rep_level > 1 (#41362)
GH-41375 - [C#] Move to .NET 8.0 (#41376)
GH-41385 - [CI][MATLAB][Packaging] Add support for MATLAB R2024a in CI and crossbow packaging workflows (#41504)
GH-41389 - [Python] Expose byte_width and bit_width of ExtensionType in terms of the storage type (#41413)
GH-41400 - [MATLAB] Bump libmexclass version to commit ca3cea6 (#41436)
GH-41410 - [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference (#41411)
GH-41420 - [R] Update NEWS.md for 16.1.0 (#41422)
GH-41427 - [Go] Fix stateless prepared statements (#41428)
GH-41430 - [Docs] Use sphinxcontrib-mermaid instead of generating images from .mmd (#41455)
GH-41435 - [CI][MATLAB] Add job to build and test MATLAB Interface on macos-14 (#41592)
GH-41450 - [R][CI] rhub/container follow ons (#41451)
GH-41460 - [C++] Use ASAN to poison temp vector stack memory (#41695)
GH-41480 - [Python] Update Python development guide about components being enabled by default based on Arrow C++ (#41705)
GH-41480 - [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ (#41494)
GH-41493 - [C++][S3] Add a new option to check existence before CreateDir (#41822)
GH-41507 - [MATLAB][CI] Pass strict: true to matlab-actions/run-tests@v2 (#41530)
GH-41527 - [CI][Dev] Remove unncessary requirements for six (#43087)
GH-41531 - [MATLAB][Packaging] Bump matlab-actions/setup-matlab and matlab-actions/run-command from v1 to v2 in the crossbow job (#41532)
GH-41540 - [R] Simplify arrow_eval() logic and bindings environments (#41537)
GH-41545 - [C++][Parquet] Fix DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
GH-41547 - [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
GH-41558 - [C++] Improve fixed_width_test_util.h (#41575)
GH-41560 - [C++] ChunkResolver: Implement ResolveMany and add unit tests (#41561)
GH-41590 - [Java] Improve BaseRepeatedValueVector function on isEmpty and isNull operations (#41601)
GH-41596 - [C++] fixed_width_internal.h: Simplify docstring and support bit-sized types (BOOL) (#41597)
GH-41608 - [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633)
GH-41611 - [Docs][CI] Enable most sphinx-lint rules for documentation (#41612)
GH-41620 - [Docs] Document merge.conf usage (#41621)
GH-41626 - [R][CI] Update OpenSUSE to 15.5 from 15.3 (#41627)
GH-41652 - [C++][CMake][Windows] Don’t build needless object libraries (#41658)
GH-41653 - [MATLAB] Add new arrow.c.Array MATLAB class which wraps a C Data Interface format ArrowArray C struct (#41655)
GH-41654 - [MATLAB] Add new arrow.c.Schema MATLAB class which wraps a C Data Interface format ArrowSchema C struct (#41674)
GH-41656 - [MATLAB] Add C Data Interface format import/export functionality for arrow.array.Array (#41737)
GH-41662 - [Python] Ensure Buffer methods don’t crash with non-CPU data (#41889)
GH-41664 - [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010)
GH-41675 - [Packaging][MATLAB] Add crossbow job to package MATLAB interface on macos-14 (#41677)
GH-41681 - [GLib] Generate separate version macros for each GLib library (#41721)
GH-41691 - [Doc] Remove notion of “logical type” (#41958)
GH-41702 - [C++][Parquet] Thrift: generate template method to accelerate reading thrift (#41703)
GH-41726 - [C++][Parquet] Minor: moving EncodedStats by default rather than copying (#41727)
GH-41730 - [Java] Adding variadicBufferCounts to RecordBatch (#41732)
GH-41748 - [Python][Parquet] Update BYTE_STREAM_SPLIT description in write_table() docstring (#41759)
GH-41749 - [GLib] Allow getting a RecordBatchReader from a Dataset or Scanner (#41750)
GH-41755 - [C++][ORC] Ensure setting detected ORC version (#41767)
GH-41760 - [C++][Parquet] Add file metadata read/write benchmark (#41761)
GH-41770 - [CI][GLib] Remove temporary files explicitly (#41807)
GH-41783 - [C++] Make git-dependent definitions internal (#41781)
GH-41789 - [Java] Clean up immutables and checkerframework dependencies (#41790)
GH-41797 - [C++][S3] Remove GetBucketRegion hack for newer AWS SDK versions (#41798)
GH-41799 - [Java] Migrate to com.gradle:develocity-maven-extension (#41800)
GH-41803 - [MATLAB] Add C Data Interface format import/export functionality for arrow.tabular.RecordBatch (#41817)
GH-41804 - [Swift] Add Struct (Nested) type (#43082)
GH-41806 - [GLib][CI] Use vcpkg for C++ dependencies when building GLib libraries with MSVC (#41839)
GH-41818 - [C++][Parquet] normalize dictionary encoding to use RLE_DICTIONARY (#41819)
GH-41834 - [R] Better error handling in dplyr code (#41576)
GH-41841 - [R][CI] Remove more defunct rhub containers (#41828)
GH-41887 - [Go] Run linter via pre-commit (#41888)
GH-41899 - [C++] IPC: Minor enhance the code of writer (#41900)
GH-41905 - [JS] Update dependencies (#41906)
GH-41910 - [Python] Add support for Pyodide (#37822)
GH-41923 - [C++] Fix ExecuteScalar deduce all_scalar with chunked_array (#41925)
GH-41929 - [Java] pom.xml license formatting (#42049)
GH-41945 - [Swift] Add interface ArrowArrayHolderBuilder (#41946)
GH-41947 - [Java] Support catalog in JDBC driver with session options (#42035)
GH-41952 - [R] Turn S3 and ZSTD on by default for macOS (#42210)
GH-41953 - [C++] Minor enhance code style for FixedShapeTensorType (#41954)
GH-41955 - [C++] Follow up of adding null_bitmap to MapArray::FromArrays (#41956)
GH-41960 - Expose new S3 option check_directory_existence_before_creation (#41972)
GH-41968 - [Java] Implement TransferPair functionality for BinaryView (#41980)
GH-41970 - [C++] Misc changes making code around list-like types and list-view types behave the same way (#41971)
GH-41978 - [Python] Fix pandas tests to follow downstream datetime64 unit changes (#41979)
GH-41983 - [Dev] Run issue labeling bot only when opening an issue (not editing) (#41986)
GH-41994 - [C++] : kernel.cc: Remove defaults on switch so that compiler can check full enum coverage for us (#41995)
GH-41999 - [Swift] Add methods for adding array and vargs to arrow array (#42000)
GH-42002 - [Java] Update Unit Tests for Vector Module (#42019)
GH-42013 - [Python] Allow Array.filter() to take general array input (#42051)
GH-42016 - [Python] Expose new FLOAT16 logical type in the pyarrow.parquet bindings (#42103)
GH-42020 - [Swift] Add Arrow decoding implementation for Swift Codable (#42023)
GH-42021 - [Swift] Add Arrow encoder implementation for Swift Codable (#43063)
GH-42025 - [Java] Update Unit Tests for Algorithm Module (#42029)
GH-42030 - [Java] Update Unit Tests for Adapter Module (#42038)
GH-42042 - [Java] Update Unit Tests for Compressions Module (#42044)
GH-42045 - [Java] Update Unit Tests for Flight Module (#42158)
GH-42087 - [Swift] refactored to remove build warnings (#42088)
GH-42092 - [Java] Update Unit Tests for Tools Module (#42093)
GH-42100 - [C++][Parquet] ParquetFilePrinter::JSONPrint print length of FLBA (#41981)
GH-42101 - [Java] Create File for Output Validation in FileRoundtrip (#42115)
GH-42109 - [C++][CMake] Add preset for Valgrind (#42110)
GH-42112 - [Python] Array gracefully fails on non-cpu device (#42113)
GH-42121 - [Java] Cleanup spotless plugin configuration (#43019)
GH-42124 - [Swift] Add methods for loading and validating builder by type (#42195)
GH-42126 - [C++] Move TakeXXX free functions into TakeMetaFunction and make them private (#42127)
GH-42128 - [Packaging][CentOS] Migrate CentOS 7 and CentOS Stream 8 packaging jobs to use vault.centos.org (#42129)
GH-42134 - [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme (#42135)
GH-42143 - [R] Sanitize R metadata (#41969)
GH-42146 - [MATLAB] Add IPC RecordBatchFileReader and RecordBatchFileWriter MATLAB classes (#42201)
GH-42162 - [Java] Update Unit Tests for Dataset Module (#42163)
GH-42164 - [Java] Update Unit Tests for Gandiva Module (#42166)
GH-42165 - [Java] Update Unit Tests for Memory Module (#42161)
GH-42167 - [CI] Upgrade the version of vcpkg in .env (#42171)
GH-42168 - [Python][Parquet] Pyarrow store decimal as integer (#42169)
GH-42190 - [Python] Add CI job for Numpy 1.X (#42189)
GH-42193 - [Java] Update dependency to maintain JUnit 5 only (#42206)
GH-42228 - [CI][Java] Suppress transfer progress log in java-jars (#42230)
GH-42235 - [C++] list_parent_indices: Add support for list-view types (#42236)
GH-42243 - [Swift] Update isValidBuilderType to not required instance of type (#42244)
GH-42245 - [Swift] Ensure map behavior is the same for all key types (#42246)
GH-43020 - [Java] Simplify flight.properties generation (#43028)
GH-43033 - [CI][Docker] Enable linter for python-wheel-windows-test-vs2019 (#43034)
GH-43040 - [C++] Reduce the recursion of many-join test (#43042)
GH-43045 - [CI][Python] Pin openjdk=17 in python substrait integration (#43051)
GH-43060 - [C++] Limit buffer size in BufferedInputStream::SetBufferSize with raw_read_bound (#43064)
GH-43076 - [C#] Upgrade Xunit and change how Python integration tests are skipped (#43091)
mmit 323aba41c4c77db60ba0dff2cabbead08a3c0298
Author: Matthew Danielson <matthewd%fastmail.us@localhost>
Date: Mon Sep 16 05:00:59 2024 -0700
apache-arrow: Update to 17.0.0
Also, enable more features by default
Changelog
Apache Arrow 17.0.0 (2024-07-16 07:00:00+00:00)
Bug Fixes
GH-15053 - [C++] Add option to string ‘center’ kernel to control left/right alignment on odd number of padding (#41449)
GH-30866 - [Java] fix SplitAndTransfer throws for (0,0) if vector empty (#41066)
GH-34484 - [Substrait] add an option to disable augmented fields (#41583)
GH-37669 - [C++][Python] Fix casting to extension type with fixed size list storage type (#42219)
GH-38553 - [C++] Replace null_count with MayHaveNulls in ListArrayFromArray and MapArray (#41957)
GH-38575 - [Python] Include metadata when creating pa.schema from PyCapsule (#41538)
GH-38770 - [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971)
GH-39129 - [Python] pa.array: add check for byte-swapped numpy arrays inside python objects (#41549)
GH-39489 - [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType
GH-39645 - [Python] Fix read_table for encrypted parquet (#39438)
GH-40270 - [C++] Use LargeStringArray for casting when writing tables to CSV (#40271)
GH-40560 - [Python] RunEndEncodedArray.from_arrays: bugfix for Array arguments (#40560) (#41093)
GH-40750 - [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871)
GH-40913 - [C++] Fix compile warning with ‘implicitly-defined constructor does not initialize’ in encoding_benchmark (#41060)
GH-40997 - [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 (#40998)
GH-41112 - [C++] Clean up unused parameter warnings (#41111)
GH-41149 - [C++][Acero] Fix asof join race (#41614)
GH-41164 - [C#] Fix concatenation of sliced arrays (#41245)
GH-41190 - [C++] support for single threaded joins (#41125)
GH-41192 - [C++] Fix hashjoin benchmark failed at make utf8’s random batches (#41195)
GH-41198 - [C#] Fix concatenation of union arrays (#41226)
GH-41199 - [C#] Fix accessing values of a sliced decimal array (#41200)
GH-41258 - [C#][Integration] Fix comparison of sliced validity buffers with non-zero offsets (#41259)
GH-41263 - [C#][Integration] Ensure offset is considered in all branches of the bitmap comparison (#41264)
GH-41282 - [Dev] Always prompt next major version on merge script if it exists (#41305)
GH-41306 - [C++] Check to avoid copying when NullBitmapBuffer is Null (#41452)
GH-41317 - [C++] Fix crash on invalid Parquet file (#41366)
GH-41319 - [Python] `test_numpy_array_protocol` test failures with numpy 2.0.0rc1
GH-41321 - [C++][Parquet] More strict Parquet level checking (#41346)
GH-41329 - [C++][Gandiva] Fix gandiva cache size env var (#41330)
GH-41340 - [C++][CMake][Windows] Remove needless .dll suffix from link libraries (#41341)
GH-41343 - [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
GH-41356 - [Release][Docs] Update post release documentation task to remove the warnings banner for stable version (#41377)
GH-41367 - [C++][maybe_unused] with Arrow macro (#41359)
GH-41371 - [CI][Release] Use the latest Ruby on macOS (#41379)
GH-41390 - [CI] Use setup-python GitHub action on csharp macOS job (#41392)
GH-41397 - [C#] Downgrade macOS test runner to avoid infrastructure bug (#41934)
GH-41418 - [C++][Large] ListView and Map nested types for scalar_if_else’s kernel functions (#41419)
GH-41426 - [R][CI] Install CRAN style openssl on gh runners. (#41629)
GH-41433 - [C++][Gandiva] Fix ascii_utf8 function to return same result on x86 and Arm (#41434)
GH-41464 - [Python] Fix StructArray.sort() for by=None (#41495)
GH-41467 - [CI][Release] Don’t push conda-verify-rc image (#41468)
GH-41470 - [C++] Reuse deduplication logic for direct registration (#41466)
GH-41471 - [Java] Fix performance uber-jar (#41473)
GH-41475 - [Python] Build with Python 3.13 (#42034)
GH-41478 - [C++] Clean up more redundant move warnings (#41487)
GH-41491 - [Python] remove special methods related to buffers in python <2.6 (#41492)
GH-41502 - [Python] Fix reading column index with decimal values (#41503)
GH-41529 - [C++][Compute] Remove redundant logic for ArrayData as ExecResults in ExecScalarCaseWhen (#41380)
GH-41534 - [Go] Fix mem leak importing 0 length C Array (#41535)
GH-41541 - [Go][Parquet] More fixes for writer performance regression (#42003)
GH-41541 - [Go][Parquet] Fix writer performance regression (#41638)
GH-41571 - [Java] Revert GH-41307 (#41309) (#41628)
GH-41573 - [Java] VectorSchemaRoot uses inefficient stream to copy fieldVectors (#41574)
GH-41581 - [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
GH-41587 - [Docs][Python] Remove duplicate contents (#41588)
GH-41602 - [C#] Resolve build warnings (#41645)
GH-41617 - [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
GH-41630 - [Benchmarking] Fix out-of-source build in benchmarks (#41631)
GH-41648 - [Java] Memory Leak about splitAndTransfer (#41898)
GH-41660 - [CI][Java] Restore devtoolset relatead GANDIVA_CXX_FLAGS (#41661)
GH-41679 - [Release][Packaging][deb] Update package name in 01-preparesh too (#41859)
GH-41684 - [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757)
GH-41686 - [Java] Nullability of struct child vectors not preserved in TransferPair (#41785)
GH-41688 - [Dev] Include all relevant CMakeLists.txt files in cmake-format precommit hook (#41689)
GH-41697 - [Go][Parquet] Release BufferWriter when BufferedPageWriter is closed (#41698)
GH-41699 - [Python][Parquet] Implement to_dict method on SortingColumn (#41704)
GH-41711 - [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
GH-41717 - [Java][Vector] fix issue with ByteBuffer rewind in MessageSerializer (#41718)
GH-41720 - [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark (#41716)
GH-41725 - [Python] CMake: ignore Parquet encryption option if Parquet itself is not enabled (fix Java integration build) (#41776)
GH-41735 - [CI][Archery] Update archery to be compatible with pygit2 1.15 API change (#41739)
GH-41738 - [C++] Fix the issue that temp vector stack may be under sized (#41746)
GH-41741 - [C++] Check that extension metadata key is present before attempting to delete it (#41763)
GH-41758 - [Python] Disallow direct pa.RecordBatchReader() construction to avoid segfaults (#41773)
GH-41771 - [C++] Iterator releases its resource immediately when it reads all values (#41824)
GH-41780 - [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
GH-41784 - [Packaging][RPM] Use SO version for -libs package name (#41838)
GH-41787 - Update fmpp-maven-plugin output directory (#41788)
GH-41791 - [CI][Conda] Update azure.linux.yml task, replace CondaEnvironment@1 with Bash@3 (#41883)
GH-41813 - [C++] Fix avx2 gather offset larger than 2GB in CompareColumnsToRows (#42188)
GH-41829 - [R] Update relative URLs in README to absolute paths to prevent CRAN check failures (#41830)
GH-41836 - [Java] Fix an undefined symbol error when ARROW_S3=OFF (#41837)
GH-41862 - [C++][S3] Fix potential deadlock when closing output stream (#41876)
GH-41884 - [Python] Fix RecordBatchReader.cast to support casting to equal schema for all types (#42098)
GH-41902 - [Java] Variadic Buffer Counts Incorrect (#41930)
GH-41903 - [CI][GLib] Use the latest Ruby to use OpenSSL 3 (#42001)
GH-41920 - [CI][JS] Add missing build directory argument (#41921)
GH-41924 - [Python] Fix tests when using NumPy 2.0 on Windows (#42099)
GH-41964 - [CI][C++] Clear cache for mamba on AppVeyor (#41977)
GH-42005 - [Java][Integration][CI] Fix ARROW_BUILD_ROOT Path to ␈find pom.xml (#42008)
GH-42006 - [CI][Python] Use pip install -e instead of setup.py build_ext –inplace for installing pyarrow on verification script (#42007)
GH-42015 - [MATLAB] Executing tfeather.m test class causes MATLAB to crash on windows-2022 after MSVC update from 14.39.33519 to 14.40.33807 (#42123)
GH-42017 - [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022)
GH-42039 - [Docs][Go] Fix broken link (#42040)
GH-42041 - [Swift] Fix nullable type decoder issue (#42043)
GH-42065 - [C++] Support list-views on list_slice (#42067)
GH-42104 - [C++] Fix an OTel test failure and remove needless logs (#42122)
GH-42107 - [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol (#42108)
GH-42116 - [C++] Support list-view typed arrays in array_take and array_filter (#42117)
GH-42130 - [GLib] Fix building gir files with MSVC (#42131)
GH-42136 - [CI][Go][Java][JS] Use AMD64-based macOS explicitly (#42175)
GH-42139 - [C++] Fix some potential uninitialized variable warnings (#42207)
GH-42140 - [C++] Avoid invalid accesses in parquet-encoding-benchmark (#42141)
GH-42149 - [C++] Use FetchContent for bundled ORC (#43011)
GH-42170 - [Python][CI] Update expected output for numpy 2.0.0 (#42172)
GH-42197 - [CI][Packaging][Java] Ensure updating “python@*” formulae on macOS (#42202)
GH-42198 - [C++] Fix GetRecordBatchPayload crashes for device data (#42199)
GH-42208 - [Java] Fix the Test in flight-sql-jdbc-driver Module (#42217)
GH-42213 - [Swift] Use “–warnings-as-errors” only on CI (#42214)
GH-42220 - [R] handle vctrs_rcrd extension type in metadata cleaning (#42226)
GH-42224 - [Java] Fix Typo in TestAceroSubstraitConsumer Test Method (#42225)
GH-42232 - [C++] Use non-stale c-ares download URL (#42250)
GH-42234 - [CI][R] Disable libarrow binary use on valgrind tests (#42249)
GH-43048 - [JAVA] Fix IndexOutOfBoundsException message by reporting index correctly (#43049)
GH-43058 - [C#] Revert upgrade of Xunit from 2.8.0 to 2.8.1 (#43074)
GH-43059 - [CI][Gandiva] Disable Python Gandiva tests on AlmaLinux 8 (#43093)
GH-43062 - [Go] Use calloc instead of malloc (#43052)
GH-43070 - [C++][Parquet] Check for valid ciphertext length to prevent segfault (#43071)
GH-43116 - [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as large memory test (#43128)
GH-43119 - [CI][Packaging] Update manylinux 2014 CentOS repos that have been deprecated (#43121)
GH-43122 - [CI][Packaging][RPM][CentOS] Use vault.centos.org for SCL (#43127)
GH-43134 - [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
GH-43158 - [Packaging] Use bundled nlohmann/json on AlmaLinux 8/CentOS Stream 8 (#43159)
GH-43199 - [CI][Packaging] dev/release/utils-create-release-tarball.sh should not include the release candidate number in the name of the tarball’s top-level directory. (#43200)
GH-43204 - [CI][Packaging] Apply vcpkg patch to fix Thrift version (#43208)
New Features and Improvements
GH-29537 - [R] Support mutate/summarize with implicit join (#41350)
GH-33484 - [C++][Compute] Implement Grouper::Reset (#41352)
GH-35804 - [CI][Packaging][Conan] Synchronize upstream conan (#39729)
GH-35888 - [Java] Add FlightStatusCode.RESOURCE_EXHAUSTED (#41508)
GH-37333 - [Python] Replace pandas.util.testing.rands with vendored version (#42089)
GH-37720 - [Go][FlightSQL] Add prepared statement handle to DoPut result (#40311)
GH-37728 - [Java] Add methods to get an Iterable for a ValueVector (#41895)
GH-37929 - [Python] begin moving static settings to pyproject.toml (#41041)
GH-37938 - [Swift] Add initial C data interface implementation (#41342)
GH-38255 - [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
GH-38325 - [Python] Implement PyCapsule interface for Device data in PyArrow (#40717)
GH-38325 - [Python] Expand the Arrow PyCapsule Interface with C Device Data support (#40708)
GH-38692 - [C#] Implement ICollection<T?> on scalar arrays (#41539)
GH-39204 - [Format][FlightRPC][Docs] Stabilize Flight SQL (#41657)
GH-39220 - [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array (#43043)
GH-39301 - [Archery][CI][Integration] Add nanoarrow to archery + integration setup (#39302)
GH-39344 - [C++][FS][Azure] Support azure cli auth (#41976)
GH-39345 - [C++][FS][Azure] Add support for environment credential (#41715)
GH-39649 - [Java][CI] Fix or suppress spurious errorprone warnings stage 2 (#39777)
GH-39722 - [JS] Clean up packaging (#39723)
GH-39798 - [C++] Optimize Take for fixed-size types including nested fixed-size lists (#41297)
GH-39858 - [C++][Device] Add Copy/View slice functions to a CPU pointer (#41477)
GH-39898 - [C++] Add support for OpenTelemetry logging (#39905)
GH-39990 - [Docs][CI] Add sphinx-lint for docs linting (#40022)
GH-40078 - [C++] Import/Export ArrowDeviceArrayStream (#40807)
GH-40339 - [Java] StringView Initial Implementation (#40340)
GH-40342 - [Python] Fix pickling of LocalFileSystem for cython 2 (#41459)
GH-40342 - [C++] move LocalFileSystem to the registry (#40356)
GH-40361 - [C++] Make flatbuffers serialization more deterministic (#40392)
GH-40384 - [Python] Expand the C Device Interface bindings to support import on CUDA device (#40385)
GH-40494 - [Go] add support for protobuf messages (#40496)
GH-40644 - [Python] Allow passing a mapping of column names to rename_columns (#40645)
GH-40734 - [Packaging][Debian] Drop support for Debian bullseye (#41394)
GH-40749 - [Python][Packaging] Strip unnecessary symbols when building wheels (#42028)
GH-40819 - [Java] Adding Spotless to Algorithm module (#41825)
GH-40820 - [Java] Adding Spotless to Adapter module (#42048)
GH-40822 - [Java] Adding Spotless to C module (#42059)
GH-40823 - [Java] Adding Spotless to Compression module (#42060)
GH-40824 - [Java] Adding Spotless to Dataset module (#42062)
GH-40825 - [Java] Adding Spotless to Flight module (#42063)
GH-40826 - [Java] Adding Spotless to Format module
GH-40827 - [Java] Adding Spotless to Gandiva module (#42055)
GH-40828 - [Java] Format arrow-maven-plugins modules (#42054)
GH-40829 - [Java] Adding Spotless to Memory modules (#42056)
GH-40830 - [Java] Adding Spotless to Performance module (#42057)
GH-40831 - [Java] Adding Spotless to Tools module (#42058)
GH-40832 - [Java] Adding Spotless to Vector module (#42061)
GH-40930 - [Java] Implement a function to retrieve reference buffers in StringView (#41796)
GH-40932 - [Java] Implement TransferPair functionality for StringView (#41861)
GH-40933 - [Java] Enhance the copyFrom* functionality in StringView (#41752)
GH-40942 - [Java] Implement C Data Interface for StringView (#41967)
GH-40943 - [Java] Implement RangeEqualsVisitor for StringView (#41636)
GH-40944 - [Java] Implement TypeEqualsVisitor for StringView (#41606)
GH-40968 - [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like function (#40970)
GH-41020 - [C++] Introduce portable compiler assumptions (#41021)
GH-41035 - [C++] Add a grouper benchmark for preventing performance regression (#41036)
GH-41055 - [C++] Support flatten for combining nested list related types (#41092)
GH-41085 - [CI][Java] Add Spark integration tests to “java” group in Crossbow tasks (#41086)
GH-41089 - [C++] Clean up remaining tasks related to half float casts (#41084)
GH-41095 - [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support (#41276)
GH-41102 - [Packaging][Release] Create unique git tags for release candidates (e.g. apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM}) (#41131)
GH-41105 - [Python][Docs] Update PyArrow installation docs for conda package split (#41135)
GH-41114 - [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
GH-41116 - [C++] IO: enhance boundary checking in CompressedInputStream (#41117)
GH-41126 - [Python] Basic bindings for Device and MemoryManager classes (#41685)
GH-41134 - [GLib] Support building arrow-glib with MSVC (#41599)
GH-41159 - [Go][Parquet] Improvement Parquet BitWriter WriteVlqInt Performance (#41160)
GH-41173 - [Java] Add spotless configuration for Maven pom.xml files (#41174)
GH-41183 - [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295)
GH-41186 - [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst (#41187)
GH-41203 - [Python][Packaging] Ensure to build with released numpy 2.0 (instead of RC) in the wheel building workflows (#42194)
GH-41240 - [Release][Packaging] Use Debian bookworm for uploading binaries (#41241)
GH-41243 - [Release][Packaging] Avoid needless download by “archery crossbow download-artifacts” (#41244)
GH-41256 - [Format][Docs] Add a canonical extension type specification for JSON (#41257)
GH-41262 - [Java][FlightSQL] Implement stateless prepared statements (#41237)
GH-41287 - [Java] ListViewVector Implementation (#41285)
GH-41298 - [Format][Docs] Add a canonical extension type specification for UUID (#41299)
GH-41301 - [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type (#41373)
GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41772)
GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41309)
GH-41314 - [CI][Python] Add a job on ARM64 macOS (#41313)
GH-41316 - [CI][Python] Reduce CI time on macOS (#41378)
GH-41323 - [R] Redo how summarize() evaluates expressions (#41223)
GH-41327 - [Ruby] Show type name in Arrow::Table#to_s (#41328)
GH-41334 - [C++][Acero] Use per-node basis temp vector stack to mitigate overflow (#41335)
GH-41349 - [C#] Optimize DecimalUtility.GetBytes(SqlDecimal) on .NET 7+ (#42150)
GH-41358 - [R] Support join “na_matches” argument (#41372)
GH-41361 - [C++][Parquet] Optimize DelimitRecords by batch execution when max_rep_level > 1 (#41362)
GH-41375 - [C#] Move to .NET 8.0 (#41376)
GH-41385 - [CI][MATLAB][Packaging] Add support for MATLAB R2024a in CI and crossbow packaging workflows (#41504)
GH-41389 - [Python] Expose byte_width and bit_width of ExtensionType in terms of the storage type (#41413)
GH-41400 - [MATLAB] Bump libmexclass version to commit ca3cea6 (#41436)
GH-41410 - [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference (#41411)
GH-41420 - [R] Update NEWS.md for 16.1.0 (#41422)
GH-41427 - [Go] Fix stateless prepared statements (#41428)
GH-41430 - [Docs] Use sphinxcontrib-mermaid instead of generating images from .mmd (#41455)
GH-41435 - [CI][MATLAB] Add job to build and test MATLAB Interface on macos-14 (#41592)
GH-41450 - [R][CI] rhub/container follow ons (#41451)
GH-41460 - [C++] Use ASAN to poison temp vector stack memory (#41695)
GH-41480 - [Python] Update Python development guide about components being enabled by default based on Arrow C++ (#41705)
GH-41480 - [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ (#41494)
GH-41493 - [C++][S3] Add a new option to check existence before CreateDir (#41822)
GH-41507 - [MATLAB][CI] Pass strict: true to matlab-actions/run-tests@v2 (#41530)
GH-41527 - [CI][Dev] Remove unncessary requirements for six (#43087)
GH-41531 - [MATLAB][Packaging] Bump matlab-actions/setup-matlab and matlab-actions/run-command from v1 to v2 in the crossbow job (#41532)
GH-41540 - [R] Simplify arrow_eval() logic and bindings environments (#41537)
GH-41545 - [C++][Parquet] Fix DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
GH-41547 - [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
GH-41558 - [C++] Improve fixed_width_test_util.h (#41575)
GH-41560 - [C++] ChunkResolver: Implement ResolveMany and add unit tests (#41561)
GH-41590 - [Java] Improve BaseRepeatedValueVector function on isEmpty and isNull operations (#41601)
GH-41596 - [C++] fixed_width_internal.h: Simplify docstring and support bit-sized types (BOOL) (#41597)
GH-41608 - [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633)
GH-41611 - [Docs][CI] Enable most sphinx-lint rules for documentation (#41612)
GH-41620 - [Docs] Document merge.conf usage (#41621)
GH-41626 - [R][CI] Update OpenSUSE to 15.5 from 15.3 (#41627)
GH-41652 - [C++][CMake][Windows] Don’t build needless object libraries (#41658)
GH-41653 - [MATLAB] Add new arrow.c.Array MATLAB class which wraps a C Data Interface format ArrowArray C struct (#41655)
GH-41654 - [MATLAB] Add new arrow.c.Schema MATLAB class which wraps a C Data Interface format ArrowSchema C struct (#41674)
GH-41656 - [MATLAB] Add C Data Interface format import/export functionality for arrow.array.Array (#41737)
GH-41662 - [Python] Ensure Buffer methods don’t crash with non-CPU data (#41889)
GH-41664 - [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010)
GH-41675 - [Packaging][MATLAB] Add crossbow job to package MATLAB interface on macos-14 (#41677)
GH-41681 - [GLib] Generate separate version macros for each GLib library (#41721)
GH-41691 - [Doc] Remove notion of “logical type” (#41958)
GH-41702 - [C++][Parquet] Thrift: generate template method to accelerate reading thrift (#41703)
GH-41726 - [C++][Parquet] Minor: moving EncodedStats by default rather than copying (#41727)
GH-41730 - [Java] Adding variadicBufferCounts to RecordBatch (#41732)
GH-41748 - [Python][Parquet] Update BYTE_STREAM_SPLIT description in write_table() docstring (#41759)
GH-41749 - [GLib] Allow getting a RecordBatchReader from a Dataset or Scanner (#41750)
GH-41755 - [C++][ORC] Ensure setting detected ORC version (#41767)
GH-41760 - [C++][Parquet] Add file metadata read/write benchmark (#41761)
GH-41770 - [CI][GLib] Remove temporary files explicitly (#41807)
GH-41783 - [C++] Make git-dependent definitions internal (#41781)
GH-41789 - [Java] Clean up immutables and checkerframework dependencies (#41790)
GH-41797 - [C++][S3] Remove GetBucketRegion hack for newer AWS SDK versions (#41798)
GH-41799 - [Java] Migrate to com.gradle:develocity-maven-extension (#41800)
GH-41803 - [MATLAB] Add C Data Interface format import/export functionality for arrow.tabular.RecordBatch (#41817)
GH-41804 - [Swift] Add Struct (Nested) type (#43082)
GH-41806 - [GLib][CI] Use vcpkg for C++ dependencies when building GLib libraries with MSVC (#41839)
GH-41818 - [C++][Parquet] normalize dictionary encoding to use RLE_DICTIONARY (#41819)
GH-41834 - [R] Better error handling in dplyr code (#41576)
GH-41841 - [R][CI] Remove more defunct rhub containers (#41828)
GH-41887 - [Go] Run linter via pre-commit (#41888)
GH-41899 - [C++] IPC: Minor enhance the code of writer (#41900)
GH-41905 - [JS] Update dependencies (#41906)
GH-41910 - [Python] Add support for Pyodide (#37822)
GH-41923 - [C++] Fix ExecuteScalar deduce all_scalar with chunked_array (#41925)
GH-41929 - [Java] pom.xml license formatting (#42049)
GH-41945 - [Swift] Add interface ArrowArrayHolderBuilder (#41946)
GH-41947 - [Java] Support catalog in JDBC driver with session options (#42035)
GH-41952 - [R] Turn S3 and ZSTD on by default for macOS (#42210)
GH-41953 - [C++] Minor enhance code style for FixedShapeTensorType (#41954)
GH-41955 - [C++] Follow up of adding null_bitmap to MapArray::FromArrays (#41956)
GH-41960 - Expose new S3 option check_directory_existence_before_creation (#41972)
GH-41968 - [Java] Implement TransferPair functionality for BinaryView (#41980)
GH-41970 - [C++] Misc changes making code around list-like types and list-view types behave the same way (#41971)
GH-41978 - [Python] Fix pandas tests to follow downstream datetime64 unit changes (#41979)
GH-41983 - [Dev] Run issue labeling bot only when opening an issue (not editing) (#41986)
GH-41994 - [C++] : kernel.cc: Remove defaults on switch so that compiler can check full enum coverage for us (#41995)
GH-41999 - [Swift] Add methods for adding array and vargs to arrow array (#42000)
GH-42002 - [Java] Update Unit Tests for Vector Module (#42019)
GH-42013 - [Python] Allow Array.filter() to take general array input (#42051)
GH-42016 - [Python] Expose new FLOAT16 logical type in the pyarrow.parquet bindings (#42103)
GH-42020 - [Swift] Add Arrow decoding implementation for Swift Codable (#42023)
GH-42021 - [Swift] Add Arrow encoder implementation for Swift Codable (#43063)
GH-42025 - [Java] Update Unit Tests for Algorithm Module (#42029)
GH-42030 - [Java] Update Unit Tests for Adapter Module (#42038)
GH-42042 - [Java] Update Unit Tests for Compressions Module (#42044)
GH-42045 - [Java] Update Unit Tests for Flight Module (#42158)
GH-42087 - [Swift] refactored to remove build warnings (#42088)
GH-42092 - [Java] Update Unit Tests for Tools Module (#42093)
GH-42100 - [C++][Parquet] ParquetFilePrinter::JSONPrint print length of FLBA (#41981)
GH-42101 - [Java] Create File for Output Validation in FileRoundtrip (#42115)
GH-42109 - [C++][CMake] Add preset for Valgrind (#42110)
GH-42112 - [Python] Array gracefully fails on non-cpu device (#42113)
GH-42121 - [Java] Cleanup spotless plugin configuration (#43019)
GH-42124 - [Swift] Add methods for loading and validating builder by type (#42195)
GH-42126 - [C++] Move TakeXXX free functions into TakeMetaFunction and make them private (#42127)
GH-42128 - [Packaging][CentOS] Migrate CentOS 7 and CentOS Stream 8 packaging jobs to use vault.centos.org (#42129)
GH-42134 - [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme (#42135)
GH-42143 - [R] Sanitize R metadata (#41969)
GH-42146 - [MATLAB] Add IPC RecordBatchFileReader and RecordBatchFileWriter MATLAB classes (#42201)
GH-42162 - [Java] Update Unit Tests for Dataset Module (#42163)
GH-42164 - [Java] Update Unit Tests for Gandiva Module (#42166)
GH-42165 - [Java] Update Unit Tests for Memory Module (#42161)
GH-42167 - [CI] Upgrade the version of vcpkg in .env (#42171)
GH-42168 - [Python][Parquet] Pyarrow store decimal as integer (#42169)
GH-42190 - [Python] Add CI job for Numpy 1.X (#42189)
GH-42193 - [Java] Update dependency to maintain JUnit 5 only (#42206)
GH-42228 - [CI][Java] Suppress transfer progress log in java-jars (#42230)
GH-42235 - [C++] list_parent_indices: Add support for list-view types (#42236)
GH-42243 - [Swift] Update isValidBuilderType to not required instance of type (#42244)
GH-42245 - [Swift] Ensure map behavior is the same for all key types (#42246)
GH-43020 - [Java] Simplify flight.properties generation (#43028)
GH-43033 - [CI][Docker] Enable linter for python-wheel-windows-test-vs2019 (#43034)
GH-43040 - [C++] Reduce the recursion of many-join test (#43042)
GH-43045 - [CI][Python] Pin openjdk=17 in python substrait integration (#43051)
GH-43060 - [C++] Limit buffer size in BufferedInputStream::SetBufferSize with raw_read_bound (#43064)
GH-43076 - [C#] Upgrade Xunit and change how Python integration tests are skipped (#43091)
uthor: Matthew Danielson <matthewd%fastmail.us@localhost>
Date: Mon Sep 16 05:00:59 2024 -0700
apache-arrow: Update to 17.0.0
Also, enable more features by default
Changelog
Apache Arrow 17.0.0 (2024-07-16 07:00:00+00:00)
Bug Fixes
GH-15053 - [C++] Add option to string ‘center’ kernel to control left/right alignment on odd number of padding (#41449)
GH-30866 - [Java] fix SplitAndTransfer throws for (0,0) if vector empty (#41066)
GH-34484 - [Substrait] add an option to disable augmented fields (#41583)
GH-37669 - [C++][Python] Fix casting to extension type with fixed size list storage type (#42219)
GH-38553 - [C++] Replace null_count with MayHaveNulls in ListArrayFromArray and MapArray (#41957)
GH-38575 - [Python] Include metadata when creating pa.schema from PyCapsule (#41538)
GH-38770 - [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971)
GH-39129 - [Python] pa.array: add check for byte-swapped numpy arrays inside python objects (#41549)
GH-39489 - [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType
GH-39645 - [Python] Fix read_table for encrypted parquet (#39438)
GH-40270 - [C++] Use LargeStringArray for casting when writing tables to CSV (#40271)
GH-40560 - [Python] RunEndEncodedArray.from_arrays: bugfix for Array arguments (#40560) (#41093)
GH-40750 - [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871)
GH-40913 - [C++] Fix compile warning with ‘implicitly-defined constructor does not initialize’ in encoding_benchmark (#41060)
GH-40997 - [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 (#40998)
GH-41112 - [C++] Clean up unused parameter warnings (#41111)
GH-41149 - [C++][Acero] Fix asof join race (#41614)
GH-41164 - [C#] Fix concatenation of sliced arrays (#41245)
GH-41190 - [C++] support for single threaded joins (#41125)
GH-41192 - [C++] Fix hashjoin benchmark failed at make utf8’s random batches (#41195)
GH-41198 - [C#] Fix concatenation of union arrays (#41226)
GH-41199 - [C#] Fix accessing values of a sliced decimal array (#41200)
GH-41258 - [C#][Integration] Fix comparison of sliced validity buffers with non-zero offsets (#41259)
GH-41263 - [C#][Integration] Ensure offset is considered in all branches of the bitmap comparison (#41264)
GH-41282 - [Dev] Always prompt next major version on merge script if it exists (#41305)
GH-41306 - [C++] Check to avoid copying when NullBitmapBuffer is Null (#41452)
GH-41317 - [C++] Fix crash on invalid Parquet file (#41366)
GH-41319 - [Python] `test_numpy_array_protocol` test failures with numpy 2.0.0rc1
GH-41321 - [C++][Parquet] More strict Parquet level checking (#41346)
GH-41329 - [C++][Gandiva] Fix gandiva cache size env var (#41330)
GH-41340 - [C++][CMake][Windows] Remove needless .dll suffix from link libraries (#41341)
GH-41343 - [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
GH-41356 - [Release][Docs] Update post release documentation task to remove the warnings banner for stable version (#41377)
GH-41367 - [C++][maybe_unused] with Arrow macro (#41359)
GH-41371 - [CI][Release] Use the latest Ruby on macOS (#41379)
GH-41390 - [CI] Use setup-python GitHub action on csharp macOS job (#41392)
GH-41397 - [C#] Downgrade macOS test runner to avoid infrastructure bug (#41934)
GH-41418 - [C++][Large] ListView and Map nested types for scalar_if_else’s kernel functions (#41419)
GH-41426 - [R][CI] Install CRAN style openssl on gh runners. (#41629)
GH-41433 - [C++][Gandiva] Fix ascii_utf8 function to return same result on x86 and Arm (#41434)
GH-41464 - [Python] Fix StructArray.sort() for by=None (#41495)
GH-41467 - [CI][Release] Don’t push conda-verify-rc image (#41468)
GH-41470 - [C++] Reuse deduplication logic for direct registration (#41466)
GH-41471 - [Java] Fix performance uber-jar (#41473)
GH-41475 - [Python] Build with Python 3.13 (#42034)
GH-41478 - [C++] Clean up more redundant move warnings (#41487)
GH-41491 - [Python] remove special methods related to buffers in python <2.6 (#41492)
GH-41502 - [Python] Fix reading column index with decimal values (#41503)
GH-41529 - [C++][Compute] Remove redundant logic for ArrayData as ExecResults in ExecScalarCaseWhen (#41380)
GH-41534 - [Go] Fix mem leak importing 0 length C Array (#41535)
GH-41541 - [Go][Parquet] More fixes for writer performance regression (#42003)
GH-41541 - [Go][Parquet] Fix writer performance regression (#41638)
GH-41571 - [Java] Revert GH-41307 (#41309) (#41628)
GH-41573 - [Java] VectorSchemaRoot uses inefficient stream to copy fieldVectors (#41574)
GH-41581 - [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
GH-41587 - [Docs][Python] Remove duplicate contents (#41588)
GH-41602 - [C#] Resolve build warnings (#41645)
GH-41617 - [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
GH-41630 - [Benchmarking] Fix out-of-source build in benchmarks (#41631)
GH-41648 - [Java] Memory Leak about splitAndTransfer (#41898)
GH-41660 - [CI][Java] Restore devtoolset relatead GANDIVA_CXX_FLAGS (#41661)
GH-41679 - [Release][Packaging][deb] Update package name in 01-preparesh too (#41859)
GH-41684 - [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757)
GH-41686 - [Java] Nullability of struct child vectors not preserved in TransferPair (#41785)
GH-41688 - [Dev] Include all relevant CMakeLists.txt files in cmake-format precommit hook (#41689)
GH-41697 - [Go][Parquet] Release BufferWriter when BufferedPageWriter is closed (#41698)
GH-41699 - [Python][Parquet] Implement to_dict method on SortingColumn (#41704)
GH-41711 - [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
GH-41717 - [Java][Vector] fix issue with ByteBuffer rewind in MessageSerializer (#41718)
GH-41720 - [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark (#41716)
GH-41725 - [Python] CMake: ignore Parquet encryption option if Parquet itself is not enabled (fix Java integration build) (#41776)
GH-41735 - [CI][Archery] Update archery to be compatible with pygit2 1.15 API change (#41739)
GH-41738 - [C++] Fix the issue that temp vector stack may be under sized (#41746)
GH-41741 - [C++] Check that extension metadata key is present before attempting to delete it (#41763)
GH-41758 - [Python] Disallow direct pa.RecordBatchReader() construction to avoid segfaults (#41773)
GH-41771 - [C++] Iterator releases its resource immediately when it reads all values (#41824)
GH-41780 - [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
GH-41784 - [Packaging][RPM] Use SO version for -libs package name (#41838)
GH-41787 - Update fmpp-maven-plugin output directory (#41788)
GH-41791 - [CI][Conda] Update azure.linux.yml task, replace CondaEnvironment@1 with Bash@3 (#41883)
GH-41813 - [C++] Fix avx2 gather offset larger than 2GB in CompareColumnsToRows (#42188)
GH-41829 - [R] Update relative URLs in README to absolute paths to prevent CRAN check failures (#41830)
GH-41836 - [Java] Fix an undefined symbol error when ARROW_S3=OFF (#41837)
GH-41862 - [C++][S3] Fix potential deadlock when closing output stream (#41876)
GH-41884 - [Python] Fix RecordBatchReader.cast to support casting to equal schema for all types (#42098)
GH-41902 - [Java] Variadic Buffer Counts Incorrect (#41930)
GH-41903 - [CI][GLib] Use the latest Ruby to use OpenSSL 3 (#42001)
GH-41920 - [CI][JS] Add missing build directory argument (#41921)
GH-41924 - [Python] Fix tests when using NumPy 2.0 on Windows (#42099)
GH-41964 - [CI][C++] Clear cache for mamba on AppVeyor (#41977)
GH-42005 - [Java][Integration][CI] Fix ARROW_BUILD_ROOT Path to ␈find pom.xml (#42008)
GH-42006 - [CI][Python] Use pip install -e instead of setup.py build_ext –inplace for installing pyarrow on verification script (#42007)
GH-42015 - [MATLAB] Executing tfeather.m test class causes MATLAB to crash on windows-2022 after MSVC update from 14.39.33519 to 14.40.33807 (#42123)
GH-42017 - [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022)
GH-42039 - [Docs][Go] Fix broken link (#42040)
GH-42041 - [Swift] Fix nullable type decoder issue (#42043)
GH-42065 - [C++] Support list-views on list_slice (#42067)
GH-42104 - [C++] Fix an OTel test failure and remove needless logs (#42122)
GH-42107 - [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol (#42108)
GH-42116 - [C++] Support list-view typed arrays in array_take and array_filter (#42117)
GH-42130 - [GLib] Fix building gir files with MSVC (#42131)
GH-42136 - [CI][Go][Java][JS] Use AMD64-based macOS explicitly (#42175)
GH-42139 - [C++] Fix some potential uninitialized variable warnings (#42207)
GH-42140 - [C++] Avoid invalid accesses in parquet-encoding-benchmark (#42141)
GH-42149 - [C++] Use FetchContent for bundled ORC (#43011)
GH-42170 - [Python][CI] Update expected output for numpy 2.0.0 (#42172)
GH-42197 - [CI][Packaging][Java] Ensure updating “python@*” formulae on macOS (#42202)
GH-42198 - [C++] Fix GetRecordBatchPayload crashes for device data (#42199)
GH-42208 - [Java] Fix the Test in flight-sql-jdbc-driver Module (#42217)
GH-42213 - [Swift] Use “–warnings-as-errors” only on CI (#42214)
GH-42220 - [R] handle vctrs_rcrd extension type in metadata cleaning (#42226)
GH-42224 - [Java] Fix Typo in TestAceroSubstraitConsumer Test Method (#42225)
GH-42232 - [C++] Use non-stale c-ares download URL (#42250)
GH-42234 - [CI][R] Disable libarrow binary use on valgrind tests (#42249)
GH-43048 - [JAVA] Fix IndexOutOfBoundsException message by reporting index correctly (#43049)
GH-43058 - [C#] Revert upgrade of Xunit from 2.8.0 to 2.8.1 (#43074)
GH-43059 - [CI][Gandiva] Disable Python Gandiva tests on AlmaLinux 8 (#43093)
GH-43062 - [Go] Use calloc instead of malloc (#43052)
GH-43070 - [C++][Parquet] Check for valid ciphertext length to prevent segfault (#43071)
GH-43116 - [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as large memory test (#43128)
GH-43119 - [CI][Packaging] Update manylinux 2014 CentOS repos that have been deprecated (#43121)
GH-43122 - [CI][Packaging][RPM][CentOS] Use vault.centos.org for SCL (#43127)
GH-43134 - [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
GH-43158 - [Packaging] Use bundled nlohmann/json on AlmaLinux 8/CentOS Stream 8 (#43159)
GH-43199 - [CI][Packaging] dev/release/utils-create-release-tarball.sh should not include the release candidate number in the name of the tarball’s top-level directory. (#43200)
GH-43204 - [CI][Packaging] Apply vcpkg patch to fix Thrift version (#43208)
New Features and Improvements
GH-29537 - [R] Support mutate/summarize with implicit join (#41350)
GH-33484 - [C++][Compute] Implement Grouper::Reset (#41352)
GH-35804 - [CI][Packaging][Conan] Synchronize upstream conan (#39729)
GH-35888 - [Java] Add FlightStatusCode.RESOURCE_EXHAUSTED (#41508)
GH-37333 - [Python] Replace pandas.util.testing.rands with vendored version (#42089)
GH-37720 - [Go][FlightSQL] Add prepared statement handle to DoPut result (#40311)
GH-37728 - [Java] Add methods to get an Iterable for a ValueVector (#41895)
GH-37929 - [Python] begin moving static settings to pyproject.toml (#41041)
GH-37938 - [Swift] Add initial C data interface implementation (#41342)
GH-38255 - [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
GH-38325 - [Python] Implement PyCapsule interface for Device data in PyArrow (#40717)
GH-38325 - [Python] Expand the Arrow PyCapsule Interface with C Device Data support (#40708)
GH-38692 - [C#] Implement ICollection<T?> on scalar arrays (#41539)
GH-39204 - [Format][FlightRPC][Docs] Stabilize Flight SQL (#41657)
GH-39220 - [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array (#43043)
GH-39301 - [Archery][CI][Integration] Add nanoarrow to archery + integration setup (#39302)
GH-39344 - [C++][FS][Azure] Support azure cli auth (#41976)
GH-39345 - [C++][FS][Azure] Add support for environment credential (#41715)
GH-39649 - [Java][CI] Fix or suppress spurious errorprone warnings stage 2 (#39777)
GH-39722 - [JS] Clean up packaging (#39723)
GH-39798 - [C++] Optimize Take for fixed-size types including nested fixed-size lists (#41297)
GH-39858 - [C++][Device] Add Copy/View slice functions to a CPU pointer (#41477)
GH-39898 - [C++] Add support for OpenTelemetry logging (#39905)
GH-39990 - [Docs][CI] Add sphinx-lint for docs linting (#40022)
GH-40078 - [C++] Import/Export ArrowDeviceArrayStream (#40807)
GH-40339 - [Java] StringView Initial Implementation (#40340)
GH-40342 - [Python] Fix pickling of LocalFileSystem for cython 2 (#41459)
GH-40342 - [C++] move LocalFileSystem to the registry (#40356)
GH-40361 - [C++] Make flatbuffers serialization more deterministic (#40392)
GH-40384 - [Python] Expand the C Device Interface bindings to support import on CUDA device (#40385)
GH-40494 - [Go] add support for protobuf messages (#40496)
GH-40644 - [Python] Allow passing a mapping of column names to rename_columns (#40645)
GH-40734 - [Packaging][Debian] Drop support for Debian bullseye (#41394)
GH-40749 - [Python][Packaging] Strip unnecessary symbols when building wheels (#42028)
GH-40819 - [Java] Adding Spotless to Algorithm module (#41825)
GH-40820 - [Java] Adding Spotless to Adapter module (#42048)
GH-40822 - [Java] Adding Spotless to C module (#42059)
GH-40823 - [Java] Adding Spotless to Compression module (#42060)
GH-40824 - [Java] Adding Spotless to Dataset module (#42062)
GH-40825 - [Java] Adding Spotless to Flight module (#42063)
GH-40826 - [Java] Adding Spotless to Format module
GH-40827 - [Java] Adding Spotless to Gandiva module (#42055)
GH-40828 - [Java] Format arrow-maven-plugins modules (#42054)
GH-40829 - [Java] Adding Spotless to Memory modules (#42056)
GH-40830 - [Java] Adding Spotless to Performance module (#42057)
GH-40831 - [Java] Adding Spotless to Tools module (#42058)
GH-40832 - [Java] Adding Spotless to Vector module (#42061)
GH-40930 - [Java] Implement a function to retrieve reference buffers in StringView (#41796)
GH-40932 - [Java] Implement TransferPair functionality for StringView (#41861)
GH-40933 - [Java] Enhance the copyFrom* functionality in StringView (#41752)
GH-40942 - [Java] Implement C Data Interface for StringView (#41967)
GH-40943 - [Java] Implement RangeEqualsVisitor for StringView (#41636)
GH-40944 - [Java] Implement TypeEqualsVisitor for StringView (#41606)
GH-40968 - [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like function (#40970)
GH-41020 - [C++] Introduce portable compiler assumptions (#41021)
GH-41035 - [C++] Add a grouper benchmark for preventing performance regression (#41036)
GH-41055 - [C++] Support flatten for combining nested list related types (#41092)
GH-41085 - [CI][Java] Add Spark integration tests to “java” group in Crossbow tasks (#41086)
GH-41089 - [C++] Clean up remaining tasks related to half float casts (#41084)
GH-41095 - [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support (#41276)
GH-41102 - [Packaging][Release] Create unique git tags for release candidates (e.g. apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM}) (#41131)
GH-41105 - [Python][Docs] Update PyArrow installation docs for conda package split (#41135)
GH-41114 - [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
GH-41116 - [C++] IO: enhance boundary checking in CompressedInputStream (#41117)
GH-41126 - [Python] Basic bindings for Device and MemoryManager classes (#41685)
GH-41134 - [GLib] Support building arrow-glib with MSVC (#41599)
GH-41159 - [Go][Parquet] Improvement Parquet BitWriter WriteVlqInt Performance (#41160)
GH-41173 - [Java] Add spotless configuration for Maven pom.xml files (#41174)
GH-41183 - [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295)
GH-41186 - [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst (#41187)
GH-41203 - [Python][Packaging] Ensure to build with released numpy 2.0 (instead of RC) in the wheel building workflows (#42194)
GH-41240 - [Release][Packaging] Use Debian bookworm for uploading binaries (#41241)
GH-41243 - [Release][Packaging] Avoid needless download by “archery crossbow download-artifacts” (#41244)
GH-41256 - [Format][Docs] Add a canonical extension type specification for JSON (#41257)
GH-41262 - [Java][FlightSQL] Implement stateless prepared statements (#41237)
GH-41287 - [Java] ListViewVector Implementation (#41285)
GH-41298 - [Format][Docs] Add a canonical extension type specification for UUID (#41299)
GH-41301 - [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type (#41373)
GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41772)
GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41309)
GH-41314 - [CI][Python] Add a job on ARM64 macOS (#41313)
GH-41316 - [CI][Python] Reduce CI time on macOS (#41378)
GH-41323 - [R] Redo how summarize() evaluates expressions (#41223)
GH-41327 - [Ruby] Show type name in Arrow::Table#to_s (#41328)
GH-41334 - [C++][Acero] Use per-node basis temp vector stack to mitigate overflow (#41335)
GH-41349 - [C#] Optimize DecimalUtility.GetBytes(SqlDecimal) on .NET 7+ (#42150)
GH-41358 - [R] Support join “na_matches” argument (#41372)
GH-41361 - [C++][Parquet] Optimize DelimitRecords by batch execution when max_rep_level > 1 (#41362)
GH-41375 - [C#] Move to .NET 8.0 (#41376)
GH-41385 - [CI][MATLAB][Packaging] Add support for MATLAB R2024a in CI and crossbow packaging workflows (#41504)
GH-41389 - [Python] Expose byte_width and bit_width of ExtensionType in terms of the storage type (#41413)
GH-41400 - [MATLAB] Bump libmexclass version to commit ca3cea6 (#41436)
GH-41410 - [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference (#41411)
GH-41420 - [R] Update NEWS.md for 16.1.0 (#41422)
GH-41427 - [Go] Fix stateless prepared statements (#41428)
GH-41430 - [Docs] Use sphinxcontrib-mermaid instead of generating images from .mmd (#41455)
GH-41435 - [CI][MATLAB] Add job to build and test MATLAB Interface on macos-14 (#41592)
GH-41450 - [R][CI] rhub/container follow ons (#41451)
GH-41460 - [C++] Use ASAN to poison temp vector stack memory (#41695)
GH-41480 - [Python] Update Python development guide about components being enabled by default based on Arrow C++ (#41705)
GH-41480 - [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ (#41494)
GH-41493 - [C++][S3] Add a new option to check existence before CreateDir (#41822)
GH-41507 - [MATLAB][CI] Pass strict: true to matlab-actions/run-tests@v2 (#41530)
GH-41527 - [CI][Dev] Remove unncessary requirements for six (#43087)
GH-41531 - [MATLAB][Packaging] Bump matlab-actions/setup-matlab and matlab-actions/run-command from v1 to v2 in the crossbow job (#41532)
GH-41540 - [R] Simplify arrow_eval() logic and bindings environments (#41537)
GH-41545 - [C++][Parquet] Fix DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
GH-41547 - [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
GH-41558 - [C++] Improve fixed_width_test_util.h (#41575)
GH-41560 - [C++] ChunkResolver: Implement ResolveMany and add unit tests (#41561)
GH-41590 - [Java] Improve BaseRepeatedValueVector function on isEmpty and isNull operations (#41601)
GH-41596 - [C++] fixed_width_internal.h: Simplify docstring and support bit-sized types (BOOL) (#41597)
GH-41608 - [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633)
GH-41611 - [Docs][CI] Enable most sphinx-lint rules for documentation (#41612)
GH-41620 - [Docs] Document merge.conf usage (#41621)
GH-41626 - [R][CI] Update OpenSUSE to 15.5 from 15.3 (#41627)
GH-41652 - [C++][CMake][Windows] Don’t build needless object libraries (#41658)
GH-41653 - [MATLAB] Add new arrow.c.Array MATLAB class which wraps a C Data Interface format ArrowArray C struct (#41655)
GH-41654 - [MATLAB] Add new arrow.c.Schema MATLAB class which wraps a C Data Interface format ArrowSchema C struct (#41674)
GH-41656 - [MATLAB] Add C Data Interface format import/export functionality for arrow.array.Array (#41737)
GH-41662 - [Python] Ensure Buffer methods don’t crash with non-CPU data (#41889)
GH-41664 - [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010)
GH-41675 - [Packaging][MATLAB] Add crossbow job to package MATLAB interface on macos-14 (#41677)
GH-41681 - [GLib] Generate separate version macros for each GLib library (#41721)
GH-41691 - [Doc] Remove notion of “logical type” (#41958)
GH-41702 - [C++][Parquet] Thrift: generate template method to accelerate reading thrift (#41703)
GH-41726 - [C++][Parquet] Minor: moving EncodedStats by default rather than copying (#41727)
GH-41730 - [Java] Adding variadicBufferCounts to RecordBatch (#41732)
GH-41748 - [Python][Parquet] Update BYTE_STREAM_SPLIT description in write_table() docstring (#41759)
GH-41749 - [GLib] Allow getting a RecordBatchReader from a Dataset or Scanner (#41750)
GH-41755 - [C++][ORC] Ensure setting detected ORC version (#41767)
GH-41760 - [C++][Parquet] Add file metadata read/write benchmark (#41761)
GH-41770 - [CI][GLib] Remove temporary files explicitly (#41807)
GH-41783 - [C++] Make git-dependent definitions internal (#41781)
GH-41789 - [Java] Clean up immutables and checkerframework dependencies (#41790)
GH-41797 - [C++][S3] Remove GetBucketRegion hack for newer AWS SDK versions (#41798)
GH-41799 - [Java] Migrate to com.gradle:develocity-maven-extension (#41800)
GH-41803 - [MATLAB] Add C Data Interface format import/export functionality for arrow.tabular.RecordBatch (#41817)
GH-41804 - [Swift] Add Struct (Nested) type (#43082)
GH-41806 - [GLib][CI] Use vcpkg for C++ dependencies when building GLib libraries with MSVC (#41839)
GH-41818 - [C++][Parquet] normalize dictionary encoding to use RLE_DICTIONARY (#41819)
GH-41834 - [R] Better error handling in dplyr code (#41576)
GH-41841 - [R][CI] Remove more defunct rhub containers (#41828)
GH-41887 - [Go] Run linter via pre-commit (#41888)
GH-41899 - [C++] IPC: Minor enhance the code of writer (#41900)
GH-41905 - [JS] Update dependencies (#41906)
GH-41910 - [Python] Add support for Pyodide (#37822)
GH-41923 - [C++] Fix ExecuteScalar deduce all_scalar with chunked_array (#41925)
GH-41929 - [Java] pom.xml license formatting (#42049)
GH-41945 - [Swift] Add interface ArrowArrayHolderBuilder (#41946)
GH-41947 - [Java] Support catalog in JDBC driver with session options (#42035)
GH-41952 - [R] Turn S3 and ZSTD on by default for macOS (#42210)
GH-41953 - [C++] Minor enhance code style for FixedShapeTensorType (#41954)
GH-41955 - [C++] Follow up of adding null_bitmap to MapArray::FromArrays (#41956)
GH-41960 - Expose new S3 option check_directory_existence_before_creation (#41972)
GH-41968 - [Java] Implement TransferPair functionality for BinaryView (#41980)
GH-41970 - [C++] Misc changes making code around list-like types and list-view types behave the same way (#41971)
GH-41978 - [Python] Fix pandas tests to follow downstream datetime64 unit changes (#41979)
GH-41983 - [Dev] Run issue labeling bot only when opening an issue (not editing) (#41986)
GH-41994 - [C++] : kernel.cc: Remove defaults on switch so that compiler can check full enum coverage for us (#41995)
GH-41999 - [Swift] Add methods for adding array and vargs to arrow array (#42000)
GH-42002 - [Java] Update Unit Tests for Vector Module (#42019)
GH-42013 - [Python] Allow Array.filter() to take general array input (#42051)
GH-42016 - [Python] Expose new FLOAT16 logical type in the pyarrow.parquet bindings (#42103)
GH-42020 - [Swift] Add Arrow decoding implementation for Swift Codable (#42023)
GH-42021 - [Swift] Add Arrow encoder implementation for Swift Codable (#43063)
GH-42025 - [Java] Update Unit Tests for Algorithm Module (#42029)
GH-42030 - [Java] Update Unit Tests for Adapter Module (#42038)
GH-42042 - [Java] Update Unit Tests for Compressions Module (#42044)
GH-42045 - [Java] Update Unit Tests for Flight Module (#42158)
GH-42087 - [Swift] refactored to remove build warnings (#42088)
GH-42092 - [Java] Update Unit Tests for Tools Module (#42093)
GH-42100 - [C++][Parquet] ParquetFilePrinter::JSONPrint print length of FLBA (#41981)
GH-42101 - [Java] Create File for Output Validation in FileRoundtrip (#42115)
GH-42109 - [C++][CMake] Add preset for Valgrind (#42110)
GH-42112 - [Python] Array gracefully fails on non-cpu device (#42113)
GH-42121 - [Java] Cleanup spotless plugin configuration (#43019)
GH-42124 - [Swift] Add methods for loading and validating builder by type (#42195)
GH-42126 - [C++] Move TakeXXX free functions into TakeMetaFunction and make them private (#42127)
GH-42128 - [Packaging][CentOS] Migrate CentOS 7 and CentOS Stream 8 packaging jobs to use vault.centos.org (#42129)
GH-42134 - [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme (#42135)
GH-42143 - [R] Sanitize R metadata (#41969)
GH-42146 - [MATLAB] Add IPC RecordBatchFileReader and RecordBatchFileWriter MATLAB classes (#42201)
GH-42162 - [Java] Update Unit Tests for Dataset Module (#42163)
GH-42164 - [Java] Update Unit Tests for Gandiva Module (#42166)
GH-42165 - [Java] Update Unit Tests for Memory Module (#42161)
GH-42167 - [CI] Upgrade the version of vcpkg in .env (#42171)
GH-42168 - [Python][Parquet] Pyarrow store decimal as integer (#42169)
GH-42190 - [Python] Add CI job for Numpy 1.X (#42189)
GH-42193 - [Java] Update dependency to maintain JUnit 5 only (#42206)
GH-42228 - [CI][Java] Suppress transfer progress log in java-jars (#42230)
GH-42235 - [C++] list_parent_indices: Add support for list-view types (#42236)
GH-42243 - [Swift] Update isValidBuilderType to not required instance of type (#42244)
GH-42245 - [Swift] Ensure map behavior is the same for all key types (#42246)
GH-43020 - [Java] Simplify flight.properties generation (#43028)
GH-43033 - [CI][Docker] Enable linter for python-wheel-windows-test-vs2019 (#43034)
GH-43040 - [C++] Reduce the recursion of many-join test (#43042)
GH-43045 - [CI][Python] Pin openjdk=17 in python substrait integration (#43051)
GH-43060 - [C++] Limit buffer size in BufferedInputStream::SetBufferSize with raw_read_bound (#43064)
GH-43076 - [C#] Upgrade Xunit and change how Python integration tests are skipped (#43091)
ommit 323aba41c4c77db60ba0dff2cabbead08a3c0298
Author: Matthew Danielson <matthewd%fastmail.us@localhost>
Date: Mon Sep 16 05:00:59 2024 -0700
apache-arrow: Update to 17.0.0
Also, enable more features by default
Changelog
Apache Arrow 17.0.0 (2024-07-16 07:00:00+00:00)
Bug Fixes
GH-15053 - [C++] Add option to string ‘center’ kernel to control left/right alignment on odd number of padding (#41449)
GH-30866 - [Java] fix SplitAndTransfer throws for (0,0) if vector empty (#41066)
GH-34484 - [Substrait] add an option to disable augmented fields (#41583)
GH-37669 - [C++][Python] Fix casting to extension type with fixed size list storage type (#42219)
GH-38553 - [C++] Replace null_count with MayHaveNulls in ListArrayFromArray and MapArray (#41957)
GH-38575 - [Python] Include metadata when creating pa.schema from PyCapsule (#41538)
GH-38770 - [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971)
GH-39129 - [Python] pa.array: add check for byte-swapped numpy arrays inside python objects (#41549)
GH-39489 - [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType
GH-39645 - [Python] Fix read_table for encrypted parquet (#39438)
GH-40270 - [C++] Use LargeStringArray for casting when writing tables to CSV (#40271)
GH-40560 - [Python] RunEndEncodedArray.from_arrays: bugfix for Array arguments (#40560) (#41093)
GH-40750 - [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871)
GH-40913 - [C++] Fix compile warning with ‘implicitly-defined constructor does not initialize’ in encoding_benchmark (#41060)
GH-40997 - [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 (#40998)
GH-41112 - [C++] Clean up unused parameter warnings (#41111)
GH-41149 - [C++][Acero] Fix asof join race (#41614)
GH-41164 - [C#] Fix concatenation of sliced arrays (#41245)
GH-41190 - [C++] support for single threaded joins (#41125)
GH-41192 - [C++] Fix hashjoin benchmark failed at make utf8’s random batches (#41195)
GH-41198 - [C#] Fix concatenation of union arrays (#41226)
GH-41199 - [C#] Fix accessing values of a sliced decimal array (#41200)
GH-41258 - [C#][Integration] Fix comparison of sliced validity buffers with non-zero offsets (#41259)
GH-41263 - [C#][Integration] Ensure offset is considered in all branches of the bitmap comparison (#41264)
GH-41282 - [Dev] Always prompt next major version on merge script if it exists (#41305)
GH-41306 - [C++] Check to avoid copying when NullBitmapBuffer is Null (#41452)
GH-41317 - [C++] Fix crash on invalid Parquet file (#41366)
GH-41319 - [Python] `test_numpy_array_protocol` test failures with numpy 2.0.0rc1
GH-41321 - [C++][Parquet] More strict Parquet level checking (#41346)
GH-41329 - [C++][Gandiva] Fix gandiva cache size env var (#41330)
GH-41340 - [C++][CMake][Windows] Remove needless .dll suffix from link libraries (#41341)
GH-41343 - [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
GH-41356 - [Release][Docs] Update post release documentation task to remove the warnings banner for stable version (#41377)
GH-41367 - [C++][maybe_unused] with Arrow macro (#41359)
GH-41371 - [CI][Release] Use the latest Ruby on macOS (#41379)
GH-41390 - [CI] Use setup-python GitHub action on csharp macOS job (#41392)
GH-41397 - [C#] Downgrade macOS test runner to avoid infrastructure bug (#41934)
GH-41418 - [C++][Large] ListView and Map nested types for scalar_if_else’s kernel functions (#41419)
GH-41426 - [R][CI] Install CRAN style openssl on gh runners. (#41629)
GH-41433 - [C++][Gandiva] Fix ascii_utf8 function to return same result on x86 and Arm (#41434)
GH-41464 - [Python] Fix StructArray.sort() for by=None (#41495)
GH-41467 - [CI][Release] Don’t push conda-verify-rc image (#41468)
GH-41470 - [C++] Reuse deduplication logic for direct registration (#41466)
GH-41471 - [Java] Fix performance uber-jar (#41473)
GH-41475 - [Python] Build with Python 3.13 (#42034)
GH-41478 - [C++] Clean up more redundant move warnings (#41487)
GH-41491 - [Python] remove special methods related to buffers in python <2.6 (#41492)
GH-41502 - [Python] Fix reading column index with decimal values (#41503)
GH-41529 - [C++][Compute] Remove redundant logic for ArrayData as ExecResults in ExecScalarCaseWhen (#41380)
GH-41534 - [Go] Fix mem leak importing 0 length C Array (#41535)
GH-41541 - [Go][Parquet] More fixes for writer performance regression (#42003)
GH-41541 - [Go][Parquet] Fix writer performance regression (#41638)
GH-41571 - [Java] Revert GH-41307 (#41309) (#41628)
GH-41573 - [Java] VectorSchemaRoot uses inefficient stream to copy fieldVectors (#41574)
GH-41581 - [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
GH-41587 - [Docs][Python] Remove duplicate contents (#41588)
GH-41602 - [C#] Resolve build warnings (#41645)
GH-41617 - [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
GH-41630 - [Benchmarking] Fix out-of-source build in benchmarks (#41631)
GH-41648 - [Java] Memory Leak about splitAndTransfer (#41898)
GH-41660 - [CI][Java] Restore devtoolset relatead GANDIVA_CXX_FLAGS (#41661)
GH-41679 - [Release][Packaging][deb] Update package name in 01-preparesh too (#41859)
GH-41684 - [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757)
GH-41686 - [Java] Nullability of struct child vectors not preserved in TransferPair (#41785)
GH-41688 - [Dev] Include all relevant CMakeLists.txt files in cmake-format precommit hook (#41689)
GH-41697 - [Go][Parquet] Release BufferWriter when BufferedPageWriter is closed (#41698)
GH-41699 - [Python][Parquet] Implement to_dict method on SortingColumn (#41704)
GH-41711 - [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
GH-41717 - [Java][Vector] fix issue with ByteBuffer rewind in MessageSerializer (#41718)
GH-41720 - [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark (#41716)
GH-41725 - [Python] CMake: ignore Parquet encryption option if Parquet itself is not enabled (fix Java integration build) (#41776)
GH-41735 - [CI][Archery] Update archery to be compatible with pygit2 1.15 API change (#41739)
GH-41738 - [C++] Fix the issue that temp vector stack may be under sized (#41746)
GH-41741 - [C++] Check that extension metadata key is present before attempting to delete it (#41763)
GH-41758 - [Python] Disallow direct pa.RecordBatchReader() construction to avoid segfaults (#41773)
GH-41771 - [C++] Iterator releases its resource immediately when it reads all values (#41824)
GH-41780 - [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
GH-41784 - [Packaging][RPM] Use SO version for -libs package name (#41838)
GH-41787 - Update fmpp-maven-plugin output directory (#41788)
GH-41791 - [CI][Conda] Update azure.linux.yml task, replace CondaEnvironment@1 with Bash@3 (#41883)
GH-41813 - [C++] Fix avx2 gather offset larger than 2GB in CompareColumnsToRows (#42188)
GH-41829 - [R] Update relative URLs in README to absolute paths to prevent CRAN check failures (#41830)
GH-41836 - [Java] Fix an undefined symbol error when ARROW_S3=OFF (#41837)
GH-41862 - [C++][S3] Fix potential deadlock when closing output stream (#41876)
GH-41884 - [Python] Fix RecordBatchReader.cast to support casting to equal schema for all types (#42098)
GH-41902 - [Java] Variadic Buffer Counts Incorrect (#41930)
GH-41903 - [CI][GLib] Use the latest Ruby to use OpenSSL 3 (#42001)
GH-41920 - [CI][JS] Add missing build directory argument (#41921)
GH-41924 - [Python] Fix tests when using NumPy 2.0 on Windows (#42099)
GH-41964 - [CI][C++] Clear cache for mamba on AppVeyor (#41977)
GH-42005 - [Java][Integration][CI] Fix ARROW_BUILD_ROOT Path to ␈find pom.xml (#42008)
GH-42006 - [CI][Python] Use pip install -e instead of setup.py build_ext –inplace for installing pyarrow on verification script (#42007)
GH-42015 - [MATLAB] Executing tfeather.m test class causes MATLAB to crash on windows-2022 after MSVC update from 14.39.33519 to 14.40.33807 (#42123)
GH-42017 - [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022)
GH-42039 - [Docs][Go] Fix broken link (#42040)
GH-42041 - [Swift] Fix nullable type decoder issue (#42043)
GH-42065 - [C++] Support list-views on list_slice (#42067)
GH-42104 - [C++] Fix an OTel test failure and remove needless logs (#42122)
GH-42107 - [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol (#42108)
GH-42116 - [C++] Support list-view typed arrays in array_take and array_filter (#42117)
GH-42130 - [GLib] Fix building gir files with MSVC (#42131)
GH-42136 - [CI][Go][Java][JS] Use AMD64-based macOS explicitly (#42175)
GH-42139 - [C++] Fix some potential uninitialized variable warnings (#42207)
GH-42140 - [C++] Avoid invalid accesses in parquet-encoding-benchmark (#42141)
GH-42149 - [C++] Use FetchContent for bundled ORC (#43011)
GH-42170 - [Python][CI] Update expected output for numpy 2.0.0 (#42172)
GH-42197 - [CI][Packaging][Java] Ensure updating “python@*” formulae on macOS (#42202)
GH-42198 - [C++] Fix GetRecordBatchPayload crashes for device data (#42199)
GH-42208 - [Java] Fix the Test in flight-sql-jdbc-driver Module (#42217)
GH-42213 - [Swift] Use “–warnings-as-errors” only on CI (#42214)
GH-42220 - [R] handle vctrs_rcrd extension type in metadata cleaning (#42226)
GH-42224 - [Java] Fix Typo in TestAceroSubstraitConsumer Test Method (#42225)
GH-42232 - [C++] Use non-stale c-ares download URL (#42250)
GH-42234 - [CI][R] Disable libarrow binary use on valgrind tests (#42249)
GH-43048 - [JAVA] Fix IndexOutOfBoundsException message by reporting index correctly (#43049)
GH-43058 - [C#] Revert upgrade of Xunit from 2.8.0 to 2.8.1 (#43074)
GH-43059 - [CI][Gandiva] Disable Python Gandiva tests on AlmaLinux 8 (#43093)
GH-43062 - [Go] Use calloc instead of malloc (#43052)
GH-43070 - [C++][Parquet] Check for valid ciphertext length to prevent segfault (#43071)
GH-43116 - [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as large memory test (#43128)
GH-43119 - [CI][Packaging] Update manylinux 2014 CentOS repos that have been deprecated (#43121)
GH-43122 - [CI][Packaging][RPM][CentOS] Use vault.centos.org for SCL (#43127)
GH-43134 - [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
GH-43158 - [Packaging] Use bundled nlohmann/json on AlmaLinux 8/CentOS Stream 8 (#43159)
GH-43199 - [CI][Packaging] dev/release/utils-create-release-tarball.sh should not include the release candidate number in the name of the tarball’s top-level directory. (#43200)
GH-43204 - [CI][Packaging] Apply vcpkg patch to fix Thrift version (#43208)
New Features and Improvements
GH-29537 - [R] Support mutate/summarize with implicit join (#41350)
GH-33484 - [C++][Compute] Implement Grouper::Reset (#41352)
GH-35804 - [CI][Packaging][Conan] Synchronize upstream conan (#39729)
GH-35888 - [Java] Add FlightStatusCode.RESOURCE_EXHAUSTED (#41508)
GH-37333 - [Python] Replace pandas.util.testing.rands with vendored version (#42089)
GH-37720 - [Go][FlightSQL] Add prepared statement handle to DoPut result (#40311)
GH-37728 - [Java] Add methods to get an Iterable for a ValueVector (#41895)
GH-37929 - [Python] begin moving static settings to pyproject.toml (#41041)
GH-37938 - [Swift] Add initial C data interface implementation (#41342)
GH-38255 - [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
GH-38325 - [Python] Implement PyCapsule interface for Device data in PyArrow (#40717)
GH-38325 - [Python] Expand the Arrow PyCapsule Interface with C Device Data support (#40708)
GH-38692 - [C#] Implement ICollection<T?> on scalar arrays (#41539)
GH-39204 - [Format][FlightRPC][Docs] Stabilize Flight SQL (#41657)
GH-39220 - [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array (#43043)
GH-39301 - [Archery][CI][Integration] Add nanoarrow to archery + integration setup (#39302)
GH-39344 - [C++][FS][Azure] Support azure cli auth (#41976)
GH-39345 - [C++][FS][Azure] Add support for environment credential (#41715)
GH-39649 - [Java][CI] Fix or suppress spurious errorprone warnings stage 2 (#39777)
GH-39722 - [JS] Clean up packaging (#39723)
GH-39798 - [C++] Optimize Take for fixed-size types including nested fixed-size lists (#41297)
GH-39858 - [C++][Device] Add Copy/View slice functions to a CPU pointer (#41477)
GH-39898 - [C++] Add support for OpenTelemetry logging (#39905)
GH-39990 - [Docs][CI] Add sphinx-lint for docs linting (#40022)
GH-40078 - [C++] Import/Export ArrowDeviceArrayStream (#40807)
GH-40339 - [Java] StringView Initial Implementation (#40340)
GH-40342 - [Python] Fix pickling of LocalFileSystem for cython 2 (#41459)
GH-40342 - [C++] move LocalFileSystem to the registry (#40356)
GH-40361 - [C++] Make flatbuffers serialization more deterministic (#40392)
GH-40384 - [Python] Expand the C Device Interface bindings to support import on CUDA device (#40385)
GH-40494 - [Go] add support for protobuf messages (#40496)
GH-40644 - [Python] Allow passing a mapping of column names to rename_columns (#40645)
GH-40734 - [Packaging][Debian] Drop support for Debian bullseye (#41394)
GH-40749 - [Python][Packaging] Strip unnecessary symbols when building wheels (#42028)
GH-40819 - [Java] Adding Spotless to Algorithm module (#41825)
GH-40820 - [Java] Adding Spotless to Adapter module (#42048)
GH-40822 - [Java] Adding Spotless to C module (#42059)
GH-40823 - [Java] Adding Spotless to Compression module (#42060)
GH-40824 - [Java] Adding Spotless to Dataset module (#42062)
GH-40825 - [Java] Adding Spotless to Flight module (#42063)
GH-40826 - [Java] Adding Spotless to Format module
GH-40827 - [Java] Adding Spotless to Gandiva module (#42055)
GH-40828 - [Java] Format arrow-maven-plugins modules (#42054)
GH-40829 - [Java] Adding Spotless to Memory modules (#42056)
GH-40830 - [Java] Adding Spotless to Performance module (#42057)
GH-40831 - [Java] Adding Spotless to Tools module (#42058)
GH-40832 - [Java] Adding Spotless to Vector module (#42061)
GH-40930 - [Java] Implement a function to retrieve reference buffers in StringView (#41796)
GH-40932 - [Java] Implement TransferPair functionality for StringView (#41861)
GH-40933 - [Java] Enhance the copyFrom* functionality in StringView (#41752)
GH-40942 - [Java] Implement C Data Interface for StringView (#41967)
GH-40943 - [Java] Implement RangeEqualsVisitor for StringView (#41636)
GH-40944 - [Java] Implement TypeEqualsVisitor for StringView (#41606)
GH-40968 - [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like function (#40970)
GH-41020 - [C++] Introduce portable compiler assumptions (#41021)
GH-41035 - [C++] Add a grouper benchmark for preventing performance regression (#41036)
GH-41055 - [C++] Support flatten for combining nested list related types (#41092)
GH-41085 - [CI][Java] Add Spark integration tests to “java” group in Crossbow tasks (#41086)
GH-41089 - [C++] Clean up remaining tasks related to half float casts (#41084)
GH-41095 - [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support (#41276)
GH-41102 - [Packaging][Release] Create unique git tags for release candidates (e.g. apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM}) (#41131)
GH-41105 - [Python][Docs] Update PyArrow installation docs for conda package split (#41135)
GH-41114 - [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
GH-41116 - [C++] IO: enhance boundary checking in CompressedInputStream (#41117)
GH-41126 - [Python] Basic bindings for Device and MemoryManager classes (#41685)
GH-41134 - [GLib] Support building arrow-glib with MSVC (#41599)
GH-41159 - [Go][Parquet] Improvement Parquet BitWriter WriteVlqInt Performance (#41160)
GH-41173 - [Java] Add spotless configuration for Maven pom.xml files (#41174)
GH-41183 - [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295)
GH-41186 - [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst (#41187)
GH-41203 - [Python][Packaging] Ensure to build with released numpy 2.0 (instead of RC) in the wheel building workflows (#42194)
GH-41240 - [Release][Packaging] Use Debian bookworm for uploading binaries (#41241)
GH-41243 - [Release][Packaging] Avoid needless download by “archery crossbow download-artifacts” (#41244)
GH-41256 - [Format][Docs] Add a canonical extension type specification for JSON (#41257)
GH-41262 - [Java][FlightSQL] Implement stateless prepared statements (#41237)
GH-41287 - [Java] ListViewVector Implementation (#41285)
GH-41298 - [Format][Docs] Add a canonical extension type specification for UUID (#41299)
GH-41301 - [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type (#41373)
GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41772)
GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41309)
GH-41314 - [CI][Python] Add a job on ARM64 macOS (#41313)
GH-41316 - [CI][Python] Reduce CI time on macOS (#41378)
GH-41323 - [R] Redo how summarize() evaluates expressions (#41223)
GH-41327 - [Ruby] Show type name in Arrow::Table#to_s (#41328)
GH-41334 - [C++][Acero] Use per-node basis temp vector stack to mitigate overflow (#41335)
GH-41349 - [C#] Optimize DecimalUtility.GetBytes(SqlDecimal) on .NET 7+ (#42150)
GH-41358 - [R] Support join “na_matches” argument (#41372)
GH-41361 - [C++][Parquet] Optimize DelimitRecords by batch execution when max_rep_level > 1 (#41362)
GH-41375 - [C#] Move to .NET 8.0 (#41376)
GH-41385 - [CI][MATLAB][Packaging] Add support for MATLAB R2024a in CI and crossbow packaging workflows (#41504)
GH-41389 - [Python] Expose byte_width and bit_width of ExtensionType in terms of the storage type (#41413)
GH-41400 - [MATLAB] Bump libmexclass version to commit ca3cea6 (#41436)
GH-41410 - [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference (#41411)
GH-41420 - [R] Update NEWS.md for 16.1.0 (#41422)
GH-41427 - [Go] Fix stateless prepared statements (#41428)
GH-41430 - [Docs] Use sphinxcontrib-mermaid instead of generating images from .mmd (#41455)
GH-41435 - [CI][MATLAB] Add job to build and test MATLAB Interface on macos-14 (#41592)
GH-41450 - [R][CI] rhub/container follow ons (#41451)
GH-41460 - [C++] Use ASAN to poison temp vector stack memory (#41695)
GH-41480 - [Python] Update Python development guide about components being enabled by default based on Arrow C++ (#41705)
GH-41480 - [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ (#41494)
GH-41493 - [C++][S3] Add a new option to check existence before CreateDir (#41822)
GH-41507 - [MATLAB][CI] Pass strict: true to matlab-actions/run-tests@v2 (#41530)
GH-41527 - [CI][Dev] Remove unncessary requirements for six (#43087)
GH-41531 - [MATLAB][Packaging] Bump matlab-actions/setup-matlab and matlab-actions/run-command from v1 to v2 in the crossbow job (#41532)
GH-41540 - [R] Simplify arrow_eval() logic and bindings environments (#41537)
GH-41545 - [C++][Parquet] Fix DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
GH-41547 - [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
GH-41558 - [C++] Improve fixed_width_test_util.h (#41575)
GH-41560 - [C++] ChunkResolver: Implement ResolveMany and add unit tests (#41561)
GH-41590 - [Java] Improve BaseRepeatedValueVector function on isEmpty and isNull operations (#41601)
GH-41596 - [C++] fixed_width_internal.h: Simplify docstring and support bit-sized types (BOOL) (#41597)
GH-41608 - [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633)
GH-41611 - [Docs][CI] Enable most sphinx-lint rules for documentation (#41612)
GH-41620 - [Docs] Document merge.conf usage (#41621)
GH-41626 - [R][CI] Update OpenSUSE to 15.5 from 15.3 (#41627)
GH-41652 - [C++][CMake][Windows] Don’t build needless object libraries (#41658)
GH-41653 - [MATLAB] Add new arrow.c.Array MATLAB class which wraps a C Data Interface format ArrowArray C struct (#41655)
GH-41654 - [MATLAB] Add new arrow.c.Schema MATLAB class which wraps a C Data Interface format ArrowSchema C struct (#41674)
GH-41656 - [MATLAB] Add C Data Interface format import/export functionality for arrow.array.Array (#41737)
GH-41662 - [Python] Ensure Buffer methods don’t crash with non-CPU data (#41889)
GH-41664 - [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010)
GH-41675 - [Packaging][MATLAB] Add crossbow job to package MATLAB interface on macos-14 (#41677)
GH-41681 - [GLib] Generate separate version macros for each GLib library (#41721)
GH-41691 - [Doc] Remove notion of “logical type” (#41958)
GH-41702 - [C++][Parquet] Thrift: generate template method to accelerate reading thrift (#41703)
GH-41726 - [C++][Parquet] Minor: moving EncodedStats by default rather than copying (#41727)
GH-41730 - [Java] Adding variadicBufferCounts to RecordBatch (#41732)
GH-41748 - [Python][Parquet] Update BYTE_STREAM_SPLIT description in write_table() docstring (#41759)
GH-41749 - [GLib] Allow getting a RecordBatchReader from a Dataset or Scanner (#41750)
GH-41755 - [C++][ORC] Ensure setting detected ORC version (#41767)
GH-41760 - [C++][Parquet] Add file metadata read/write benchmark (#41761)
GH-41770 - [CI][GLib] Remove temporary files explicitly (#41807)
GH-41783 - [C++] Make git-dependent definitions internal (#41781)
GH-41789 - [Java] Clean up immutables and checkerframework dependencies (#41790)
GH-41797 - [C++][S3] Remove GetBucketRegion hack for newer AWS SDK versions (#41798)
GH-41799 - [Java] Migrate to com.gradle:develocity-maven-extension (#41800)
GH-41803 - [MATLAB] Add C Data Interface format import/export functionality for arrow.tabular.RecordBatch (#41817)
GH-41804 - [Swift] Add Struct (Nested) type (#43082)
GH-41806 - [GLib][CI] Use vcpkg for C++ dependencies when building GLib libraries with MSVC (#41839)
GH-41818 - [C++][Parquet] normalize dictionary encoding to use RLE_DICTIONARY (#41819)
GH-41834 - [R] Better error handling in dplyr code (#41576)
GH-41841 - [R][CI] Remove more defunct rhub containers (#41828)
GH-41887 - [Go] Run linter via pre-commit (#41888)
GH-41899 - [C++] IPC: Minor enhance the code of writer (#41900)
GH-41905 - [JS] Update dependencies (#41906)
GH-41910 - [Python] Add support for Pyodide (#37822)
GH-41923 - [C++] Fix ExecuteScalar deduce all_scalar with chunked_array (#41925)
GH-41929 - [Java] pom.xml license formatting (#42049)
GH-41945 - [Swift] Add interface ArrowArrayHolderBuilder (#41946)
GH-41947 - [Java] Support catalog in JDBC driver with session options (#42035)
GH-41952 - [R] Turn S3 and ZSTD on by default for macOS (#42210)
GH-41953 - [C++] Minor enhance code style for FixedShapeTensorType (#41954)
GH-41955 - [C++] Follow up of adding null_bitmap to MapArray::FromArrays (#41956)
GH-41960 - Expose new S3 option check_directory_existence_before_creation (#41972)
GH-41968 - [Java] Implement TransferPair functionality for BinaryView (#41980)
GH-41970 - [C++] Misc changes making code around list-like types and list-view types behave the same way (#41971)
GH-41978 - [Python] Fix pandas tests to follow downstream datetime64 unit changes (#41979)
GH-41983 - [Dev] Run issue labeling bot only when opening an issue (not editing) (#41986)
GH-41994 - [C++] : kernel.cc: Remove defaults on switch so that compiler can check full enum coverage for us (#41995)
GH-41999 - [Swift] Add methods for adding array and vargs to arrow array (#42000)
GH-42002 - [Java] Update Unit Tests for Vector Module (#42019)
GH-42013 - [Python] Allow Array.filter() to take general array input (#42051)
GH-42016 - [Python] Expose new FLOAT16 logical type in the pyarrow.parquet bindings (#42103)
GH-42020 - [Swift] Add Arrow decoding implementation for Swift Codable (#42023)
GH-42021 - [Swift] Add Arrow encoder implementation for Swift Codable (#43063)
GH-42025 - [Java] Update Unit Tests for Algorithm Module (#42029)
GH-42030 - [Java] Update Unit Tests for Adapter Module (#42038)
GH-42042 - [Java] Update Unit Tests for Compressions Module (#42044)
GH-42045 - [Java] Update Unit Tests for Flight Module (#42158)
GH-42087 - [Swift] refactored to remove build warnings (#42088)
GH-42092 - [Java] Update Unit Tests for Tools Module (#42093)
GH-42100 - [C++][Parquet] ParquetFilePrinter::JSONPrint print length of FLBA (#41981)
GH-42101 - [Java] Create File for Output Validation in FileRoundtrip (#42115)
GH-42109 - [C++][CMake] Add preset for Valgrind (#42110)
GH-42112 - [Python] Array gracefully fails on non-cpu device (#42113)
GH-42121 - [Java] Cleanup spotless plugin configuration (#43019)
GH-42124 - [Swift] Add methods for loading and validating builder by type (#42195)
GH-42126 - [C++] Move TakeXXX free functions into TakeMetaFunction and make them private (#42127)
GH-42128 - [Packaging][CentOS] Migrate CentOS 7 and CentOS Stream 8 packaging jobs to use vault.centos.org (#42129)
GH-42134 - [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme (#42135)
GH-42143 - [R] Sanitize R metadata (#41969)
GH-42146 - [MATLAB] Add IPC RecordBatchFileReader and RecordBatchFileWriter MATLAB classes (#42201)
GH-42162 - [Java] Update Unit Tests for Dataset Module (#42163)
GH-42164 - [Java] Update Unit Tests for Gandiva Module (#42166)
GH-42165 - [Java] Update Unit Tests for Memory Module (#42161)
GH-42167 - [CI] Upgrade the version of vcpkg in .env (#42171)
GH-42168 - [Python][Parquet] Pyarrow store decimal as integer (#42169)
GH-42190 - [Python] Add CI job for Numpy 1.X (#42189)
GH-42193 - [Java] Update dependency to maintain JUnit 5 only (#42206)
GH-42228 - [CI][Java] Suppress transfer progress log in java-jars (#42230)
GH-42235 - [C++] list_parent_indices: Add support for list-view types (#42236)
GH-42243 - [Swift] Update isValidBuilderType to not required instance of type (#42244)
GH-42245 - [Swift] Ensure map behavior is the same for all key types (#42246)
GH-43020 - [Java] Simplify flight.properties generation (#43028)
GH-43033 - [CI][Docker] Enable linter for python-wheel-windows-test-vs2019 (#43034)
GH-43040 - [C++] Reduce the recursion of many-join test (#43042)
GH-43045 - [CI][Python] Pin openjdk=17 in python substrait integration (#43051)
GH-43060 - [C++] Limit buffer size in BufferedInputStream::SetBufferSize with raw_read_bound (#43064)
GH-43076 - [C#] Upgrade Xunit and change how Python integration tests are skipped (#43091)
mmit 323aba41c4c77db60ba0dff2cabbead08a3c0298
Author: Matthew Danielson <matthewd%fastmail.us@localhost>
Date: Mon Sep 16 05:00:59 2024 -0700
apache-arrow: Update to 17.0.0
Also, enable more features by default
Changelog
Apache Arrow 17.0.0 (2024-07-16 07:00:00+00:00)
Bug Fixes
GH-15053 - [C++] Add option to string ‘center’ kernel to control left/right alignment on odd number of padding (#41449)
GH-30866 - [Java] fix SplitAndTransfer throws for (0,0) if vector empty (#41066)
GH-34484 - [Substrait] add an option to disable augmented fields (#41583)
GH-37669 - [C++][Python] Fix casting to extension type with fixed size list storage type (#42219)
GH-38553 - [C++] Replace null_count with MayHaveNulls in ListArrayFromArray and MapArray (#41957)
GH-38575 - [Python] Include metadata when creating pa.schema from PyCapsule (#41538)
GH-38770 - [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971)
GH-39129 - [Python] pa.array: add check for byte-swapped numpy arrays inside python objects (#41549)
GH-39489 - [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType
GH-39645 - [Python] Fix read_table for encrypted parquet (#39438)
GH-40270 - [C++] Use LargeStringArray for casting when writing tables to CSV (#40271)
GH-40560 - [Python] RunEndEncodedArray.from_arrays: bugfix for Array arguments (#40560) (#41093)
GH-40750 - [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871)
GH-40913 - [C++] Fix compile warning with ‘implicitly-defined constructor does not initialize’ in encoding_benchmark (#41060)
GH-40997 - [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 (#40998)
GH-41112 - [C++] Clean up unused parameter warnings (#41111)
GH-41149 - [C++][Acero] Fix asof join race (#41614)
GH-41164 - [C#] Fix concatenation of sliced arrays (#41245)
GH-41190 - [C++] support for single threaded joins (#41125)
GH-41192 - [C++] Fix hashjoin benchmark failed at make utf8’s random batches (#41195)
GH-41198 - [C#] Fix concatenation of union arrays (#41226)
GH-41199 - [C#] Fix accessing values of a sliced decimal array (#41200)
GH-41258 - [C#][Integration] Fix comparison of sliced validity buffers with non-zero offsets (#41259)
GH-41263 - [C#][Integration] Ensure offset is considered in all branches of the bitmap comparison (#41264)
GH-41282 - [Dev] Always prompt next major version on merge script if it exists (#41305)
GH-41306 - [C++] Check to avoid copying when NullBitmapBuffer is Null (#41452)
GH-41317 - [C++] Fix crash on invalid Parquet file (#41366)
GH-41319 - [Python] `test_numpy_array_protocol` test failures with numpy 2.0.0rc1
GH-41321 - [C++][Parquet] More strict Parquet level checking (#41346)
GH-41329 - [C++][Gandiva] Fix gandiva cache size env var (#41330)
GH-41340 - [C++][CMake][Windows] Remove needless .dll suffix from link libraries (#41341)
GH-41343 - [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
GH-41356 - [Release][Docs] Update post release documentation task to remove the warnings banner for stable version (#41377)
GH-41367 - [C++][maybe_unused] with Arrow macro (#41359)
GH-41371 - [CI][Release] Use the latest Ruby on macOS (#41379)
GH-41390 - [CI] Use setup-python GitHub action on csharp macOS job (#41392)
GH-41397 - [C#] Downgrade macOS test runner to avoid infrastructure bug (#41934)
GH-41418 - [C++][Large] ListView and Map nested types for scalar_if_else’s kernel functions (#41419)
GH-41426 - [R][CI] Install CRAN style openssl on gh runners. (#41629)
GH-41433 - [C++][Gandiva] Fix ascii_utf8 function to return same result on x86 and Arm (#41434)
GH-41464 - [Python] Fix StructArray.sort() for by=None (#41495)
GH-41467 - [CI][Release] Don’t push conda-verify-rc image (#41468)
GH-41470 - [C++] Reuse deduplication logic for direct registration (#41466)
GH-41471 - [Java] Fix performance uber-jar (#41473)
GH-41475 - [Python] Build with Python 3.13 (#42034)
GH-41478 - [C++] Clean up more redundant move warnings (#41487)
GH-41491 - [Python] remove special methods related to buffers in python <2.6 (#41492)
GH-41502 - [Python] Fix reading column index with decimal values (#41503)
GH-41529 - [C++][Compute] Remove redundant logic for ArrayData as ExecResults in ExecScalarCaseWhen (#41380)
GH-41534 - [Go] Fix mem leak importing 0 length C Array (#41535)
GH-41541 - [Go][Parquet] More fixes for writer performance regression (#42003)
GH-41541 - [Go][Parquet] Fix writer performance regression (#41638)
GH-41571 - [Java] Revert GH-41307 (#41309) (#41628)
GH-41573 - [Java] VectorSchemaRoot uses inefficient stream to copy fieldVectors (#41574)
GH-41581 - [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
GH-41587 - [Docs][Python] Remove duplicate contents (#41588)
GH-41602 - [C#] Resolve build warnings (#41645)
GH-41617 - [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
GH-41630 - [Benchmarking] Fix out-of-source build in benchmarks (#41631)
GH-41648 - [Java] Memory Leak about splitAndTransfer (#41898)
GH-41660 - [CI][Java] Restore devtoolset relatead GANDIVA_CXX_FLAGS (#41661)
GH-41679 - [Release][Packaging][deb] Update package name in 01-preparesh too (#41859)
GH-41684 - [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757)
GH-41686 - [Java] Nullability of struct child vectors not preserved in TransferPair (#41785)
GH-41688 - [Dev] Include all relevant CMakeLists.txt files in cmake-format precommit hook (#41689)
GH-41697 - [Go][Parquet] Release BufferWriter when BufferedPageWriter is closed (#41698)
GH-41699 - [Python][Parquet] Implement to_dict method on SortingColumn (#41704)
GH-41711 - [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
GH-41717 - [Java][Vector] fix issue with ByteBuffer rewind in MessageSerializer (#41718)
GH-41720 - [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark (#41716)
GH-41725 - [Python] CMake: ignore Parquet encryption option if Parquet itself is not enabled (fix Java integration build) (#41776)
GH-41735 - [CI][Archery] Update archery to be compatible with pygit2 1.15 API change (#41739)
GH-41738 - [C++] Fix the issue that temp vector stack may be under sized (#41746)
GH-41741 - [C++] Check that extension metadata key is present before attempting to delete it (#41763)
GH-41758 - [Python] Disallow direct pa.RecordBatchReader() construction to avoid segfaults (#41773)
GH-41771 - [C++] Iterator releases its resource immediately when it reads all values (#41824)
GH-41780 - [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
GH-41784 - [Packaging][RPM] Use SO version for -libs package name (#41838)
GH-41787 - Update fmpp-maven-plugin output directory (#41788)
GH-41791 - [CI][Conda] Update azure.linux.yml task, replace CondaEnvironment@1 with Bash@3 (#41883)
GH-41813 - [C++] Fix avx2 gather offset larger than 2GB in CompareColumnsToRows (#42188)
GH-41829 - [R] Update relative URLs in README to absolute paths to prevent CRAN check failures (#41830)
GH-41836 - [Java] Fix an undefined symbol error when ARROW_S3=OFF (#41837)
GH-41862 - [C++][S3] Fix potential deadlock when closing output stream (#41876)
GH-41884 - [Python] Fix RecordBatchReader.cast to support casting to equal schema for all types (#42098)
GH-41902 - [Java] Variadic Buffer Counts Incorrect (#41930)
GH-41903 - [CI][GLib] Use the latest Ruby to use OpenSSL 3 (#42001)
GH-41920 - [CI][JS] Add missing build directory argument (#41921)
GH-41924 - [Python] Fix tests when using NumPy 2.0 on Windows (#42099)
GH-41964 - [CI][C++] Clear cache for mamba on AppVeyor (#41977)
GH-42005 - [Java][Integration][CI] Fix ARROW_BUILD_ROOT Path to ␈find pom.xml (#42008)
GH-42006 - [CI][Python] Use pip install -e instead of setup.py build_ext –inplace for installing pyarrow on verification script (#42007)
GH-42015 - [MATLAB] Executing tfeather.m test class causes MATLAB to crash on windows-2022 after MSVC update from 14.39.33519 to 14.40.33807 (#42123)
GH-42017 - [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022)
GH-42039 - [Docs][Go] Fix broken link (#42040)
GH-42041 - [Swift] Fix nullable type decoder issue (#42043)
GH-42065 - [C++] Support list-views on list_slice (#42067)
GH-42104 - [C++] Fix an OTel test failure and remove needless logs (#42122)
GH-42107 - [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol (#42108)
GH-42116 - [C++] Support list-view typed arrays in array_take and array_filter (#42117)
GH-42130 - [GLib] Fix building gir files with MSVC (#42131)
GH-42136 - [CI][Go][Java][JS] Use AMD64-based macOS explicitly (#42175)
GH-42139 - [C++] Fix some potential uninitialized variable warnings (#42207)
GH-42140 - [C++] Avoid invalid accesses in parquet-encoding-benchmark (#42141)
GH-42149 - [C++] Use FetchContent for bundled ORC (#43011)
GH-42170 - [Python][CI] Update expected output for numpy 2.0.0 (#42172)
GH-42197 - [CI][Packaging][Java] Ensure updating “python@*” formulae on macOS (#42202)
GH-42198 - [C++] Fix GetRecordBatchPayload crashes for device data (#42199)
GH-42208 - [Java] Fix the Test in flight-sql-jdbc-driver Module (#42217)
GH-42213 - [Swift] Use “–warnings-as-errors” only on CI (#42214)
GH-42220 - [R] handle vctrs_rcrd extension type in metadata cleaning (#42226)
GH-42224 - [Java] Fix Typo in TestAceroSubstraitConsumer Test Method (#42225)
GH-42232 - [C++] Use non-stale c-ares download URL (#42250)
GH-42234 - [CI][R] Disable libarrow binary use on valgrind tests (#42249)
GH-43048 - [JAVA] Fix IndexOutOfBoundsException message by reporting index correctly (#43049)
GH-43058 - [C#] Revert upgrade of Xunit from 2.8.0 to 2.8.1 (#43074)
GH-43059 - [CI][Gandiva] Disable Python Gandiva tests on AlmaLinux 8 (#43093)
GH-43062 - [Go] Use calloc instead of malloc (#43052)
GH-43070 - [C++][Parquet] Check for valid ciphertext length to prevent segfault (#43071)
GH-43116 - [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as large memory test (#43128)
GH-43119 - [CI][Packaging] Update manylinux 2014 CentOS repos that have been deprecated (#43121)
GH-43122 - [CI][Packaging][RPM][CentOS] Use vault.centos.org for SCL (#43127)
GH-43134 - [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
GH-43158 - [Packaging] Use bundled nlohmann/json on AlmaLinux 8/CentOS Stream 8 (#43159)
GH-43199 - [CI][Packaging] dev/release/utils-create-release-tarball.sh should not include the release candidate number in the name of the tarball’s top-level directory. (#43200)
GH-43204 - [CI][Packaging] Apply vcpkg patch to fix Thrift version (#43208)
New Features and Improvements
GH-29537 - [R] Support mutate/summarize with implicit join (#41350)
GH-33484 - [C++][Compute] Implement Grouper::Reset (#41352)
GH-35804 - [CI][Packaging][Conan] Synchronize upstream conan (#39729)
GH-35888 - [Java] Add FlightStatusCode.RESOURCE_EXHAUSTED (#41508)
GH-37333 - [Python] Replace pandas.util.testing.rands with vendored version (#42089)
GH-37720 - [Go][FlightSQL] Add prepared statement handle to DoPut result (#40311)
GH-37728 - [Java] Add methods to get an Iterable for a ValueVector (#41895)
GH-37929 - [Python] begin moving static settings to pyproject.toml (#41041)
GH-37938 - [Swift] Add initial C data interface implementation (#41342)
GH-38255 - [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
GH-38325 - [Python] Implement PyCapsule interface for Device data in PyArrow (#40717)
GH-38325 - [Python] Expand the Arrow PyCapsule Interface with C Device Data support (#40708)
GH-38692 - [C#] Implement ICollection<T?> on scalar arrays (#41539)
GH-39204 - [Format][FlightRPC][Docs] Stabilize Flight SQL (#41657)
GH-39220 - [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array (#43043)
GH-39301 - [Archery][CI][Integration] Add nanoarrow to archery + integration setup (#39302)
GH-39344 - [C++][FS][Azure] Support azure cli auth (#41976)
GH-39345 - [C++][FS][Azure] Add support for environment credential (#41715)
GH-39649 - [Java][CI] Fix or suppress spurious errorprone warnings stage 2 (#39777)
GH-39722 - [JS] Clean up packaging (#39723)
GH-39798 - [C++] Optimize Take for fixed-size types including nested fixed-size lists (#41297)
GH-39858 - [C++][Device] Add Copy/View slice functions to a CPU pointer (#41477)
GH-39898 - [C++] Add support for OpenTelemetry logging (#39905)
GH-39990 - [Docs][CI] Add sphinx-lint for docs linting (#40022)
GH-40078 - [C++] Import/Export ArrowDeviceArrayStream (#40807)
GH-40339 - [Java] StringView Initial Implementation (#40340)
GH-40342 - [Python] Fix pickling of LocalFileSystem for cython 2 (#41459)
GH-40342 - [C++] move LocalFileSystem to the registry (#40356)
GH-40361 - [C++] Make flatbuffers serialization more deterministic (#40392)
GH-40384 - [Python] Expand the C Device Interface bindings to support import on CUDA device (#40385)
GH-40494 - [Go] add support for protobuf messages (#40496)
GH-40644 - [Python] Allow passing a mapping of column names to rename_columns (#40645)
GH-40734 - [Packaging][Debian] Drop support for Debian bullseye (#41394)
GH-40749 - [Python][Packaging] Strip unnecessary symbols when building wheels (#42028)
GH-40819 - [Java] Adding Spotless to Algorithm module (#41825)
GH-40820 - [Java] Adding Spotless to Adapter module (#42048)
GH-40822 - [Java] Adding Spotless to C module (#42059)
GH-40823 - [Java] Adding Spotless to Compression module (#42060)
GH-40824 - [Java] Adding Spotless to Dataset module (#42062)
GH-40825 - [Java] Adding Spotless to Flight module (#42063)
GH-40826 - [Java] Adding Spotless to Format module
GH-40827 - [Java] Adding Spotless to Gandiva module (#42055)
GH-40828 - [Java] Format arrow-maven-plugins modules (#42054)
GH-40829 - [Java] Adding Spotless to Memory modules (#42056)
GH-40830 - [Java] Adding Spotless to Performance module (#42057)
GH-40831 - [Java] Adding Spotless to Tools module (#42058)
GH-40832 - [Java] Adding Spotless to Vector module (#42061)
GH-40930 - [Java] Implement a function to retrieve reference buffers in StringView (#41796)
GH-40932 - [Java] Implement TransferPair functionality for StringView (#41861)
GH-40933 - [Java] Enhance the copyFrom* functionality in StringView (#41752)
GH-40942 - [Java] Implement C Data Interface for StringView (#41967)
GH-40943 - [Java] Implement RangeEqualsVisitor for StringView (#41636)
GH-40944 - [Java] Implement TypeEqualsVisitor for StringView (#41606)
GH-40968 - [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like function (#40970)
GH-41020 - [C++] Introduce portable compiler assumptions (#41021)
GH-41035 - [C++] Add a grouper benchmark for preventing performance regression (#41036)
GH-41055 - [C++] Support flatten for combining nested list related types (#41092)
GH-41085 - [CI][Java] Add Spark integration tests to “java” group in Crossbow tasks (#41086)
GH-41089 - [C++] Clean up remaining tasks related to half float casts (#41084)
GH-41095 - [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support (#41276)
GH-41102 - [Packaging][Release] Create unique git tags for release candidates (e.g. apache-arrow-{MAJOR}.{MINOR}.{PATCH}-rc{RC_NUM}) (#41131)
GH-41105 - [Python][Docs] Update PyArrow installation docs for conda package split (#41135)
GH-41114 - [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
GH-41116 - [C++] IO: enhance boundary checking in CompressedInputStream (#41117)
GH-41126 - [Python] Basic bindings for Device and MemoryManager classes (#41685)
GH-41134 - [GLib] Support building arrow-glib with MSVC (#41599)
GH-41159 - [Go][Parquet] Improvement Parquet BitWriter WriteVlqInt Performance (#41160)
GH-41173 - [Java] Add spotless configuration for Maven pom.xml files (#41174)
GH-41183 - [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295)
GH-41186 - [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst (#41187)
GH-41203 - [Python][Packaging] Ensure to build with released numpy 2.0 (instead of RC) in the wheel building workflows (#42194)
GH-41240 - [Release][Packaging] Use Debian bookworm for uploading binaries (#41241)
GH-41243 - [Release][Packaging] Avoid needless download by “archery crossbow download-artifacts” (#41244)
GH-41256 - [Format][Docs] Add a canonical extension type specification for JSON (#41257)
GH-41262 - [Java][FlightSQL] Implement stateless prepared statements (#41237)
GH-41287 - [Java] ListViewVector Implementation (#41285)
GH-41298 - [Format][Docs] Add a canonical extension type specification for UUID (#41299)
GH-41301 - [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type (#41373)
GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41772)
GH-41307 - [Java] Use org.apache:apache parent pom version 31 (#41309)
GH-41314 - [CI][Python] Add a job on ARM64 macOS (#41313)
GH-41316 - [CI][Python] Reduce CI time on macOS (#41378)
GH-41323 - [R] Redo how summarize() evaluates expressions (#41223)
GH-41327 - [Ruby] Show type name in Arrow::Table#to_s (#41328)
GH-41334 - [C++][Acero] Use per-node basis temp vector stack to mitigate overflow (#41335)
GH-41349 - [C#] Optimize DecimalUtility.GetBytes(SqlDecimal) on .NET 7+ (#42150)
GH-41358 - [R] Support join “na_matches” argument (#41372)
GH-41361 - [C++][Parquet] Optimize DelimitRecords by batch execution when max_rep_level > 1 (#41362)
GH-41375 - [C#] Move to .NET 8.0 (#41376)
GH-41385 - [CI][MATLAB][Packaging] Add support for MATLAB R2024a in CI and crossbow packaging workflows (#41504)
GH-41389 - [Python] Expose byte_width and bit_width of ExtensionType in terms of the storage type (#41413)
GH-41400 - [MATLAB] Bump libmexclass version to commit ca3cea6 (#41436)
GH-41410 - [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference (#41411)
GH-41420 - [R] Update NEWS.md for 16.1.0 (#41422)
GH-41427 - [Go] Fix stateless prepared statements (#41428)
GH-41430 - [Docs] Use sphinxcontrib-mermaid instead of generating images from .mmd (#41455)
GH-41435 - [CI][MATLAB] Add job to build and test MATLAB Interface on macos-14 (#41592)
GH-41450 - [R][CI] rhub/container follow ons (#41451)
GH-41460 - [C++] Use ASAN to poison temp vector stack memory (#41695)
GH-41480 - [Python] Update Python development guide about components being enabled by default based on Arrow C++ (#41705)
GH-41480 - [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ (#41494)
GH-41493 - [C++][S3] Add a new option to check existence before CreateDir (#41822)
GH-41507 - [MATLAB][CI] Pass strict: true to matlab-actions/run-tests@v2 (#41530)
GH-41527 - [CI][Dev] Remove unncessary requirements for six (#43087)
GH-41531 - [MATLAB][Packaging] Bump matlab-actions/setup-matlab and matlab-actions/run-command from v1 to v2 in the crossbow job (#41532)
GH-41540 - [R] Simplify arrow_eval() logic and bindings environments (#41537)
GH-41545 - [C++][Parquet] Fix DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
GH-41547 - [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
GH-41558 - [C++] Improve fixed_width_test_util.h (#41575)
GH-41560 - [C++] ChunkResolver: Implement ResolveMany and add unit tests (#41561)
GH-41590 - [Java] Improve BaseRepeatedValueVector function on isEmpty and isNull operations (#41601)
GH-41596 - [C++] fixed_width_internal.h: Simplify docstring and support bit-sized types (BOOL) (#41597)
GH-41608 - [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633)
GH-41611 - [Docs][CI] Enable most sphinx-lint rules for documentation (#41612)
GH-41620 - [Docs] Document merge.conf usage (#41621)
GH-41626 - [R][CI] Update OpenSUSE to 15.5 from 15.3 (#41627)
GH-41652 - [C++][CMake][Windows] Don’t build needless object libraries (#41658)
GH-41653 - [MATLAB] Add new arrow.c.Array MATLAB class which wraps a C Data Interface format ArrowArray C struct (#41655)
GH-41654 - [MATLAB] Add new arrow.c.Schema MATLAB class which wraps a C Data Interface format ArrowSchema C struct (#41674)
GH-41656 - [MATLAB] Add C Data Interface format import/export functionality for arrow.array.Array (#41737)
GH-41662 - [Python] Ensure Buffer methods don’t crash with non-CPU data (#41889)
GH-41664 - [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010)
GH-41675 - [Packaging][MATLAB] Add crossbow job to package MATLAB interface on macos-14 (#41677)
GH-41681 - [GLib] Generate separate version macros for each GLib library (#41721)
GH-41691 - [Doc] Remove notion of “logical type” (#41958)
GH-41702 - [C++][Parquet] Thrift: generate template method to accelerate reading thrift (#41703)
GH-41726 - [C++][Parquet] Minor: moving EncodedStats by default rather than copying (#41727)
GH-41730 - [Java] Adding variadicBufferCounts to RecordBatch (#41732)
GH-41748 - [Python][Parquet] Update BYTE_STREAM_SPLIT description in write_table() docstring (#41759)
GH-41749 - [GLib] Allow getting a RecordBatchReader from a Dataset or Scanner (#41750)
GH-41755 - [C++][ORC] Ensure setting detected ORC version (#41767)
GH-41760 - [C++][Parquet] Add file metadata read/write benchmark (#41761)
GH-41770 - [CI][GLib] Remove temporary files explicitly (#41807)
GH-41783 - [C++] Make git-dependent definitions internal (#41781)
GH-41789 - [Java] Clean up immutables and checkerframework dependencies (#41790)
GH-41797 - [C++][S3] Remove GetBucketRegion hack for newer AWS SDK versions (#41798)
GH-41799 - [Java] Migrate to com.gradle:develocity-maven-extension (#41800)
GH-41803 - [MATLAB] Add C Data Interface format import/export functionality for arrow.tabular.RecordBatch (#41817)
GH-41804 - [Swift] Add Struct (Nested) type (#43082)
GH-41806 - [GLib][CI] Use vcpkg for C++ dependencies when building GLib libraries with MSVC (#41839)
GH-41818 - [C++][Parquet] normalize dictionary encoding to use RLE_DICTIONARY (#41819)
GH-41834 - [R] Better error handling in dplyr code (#41576)
GH-41841 - [R][CI] Remove more defunct rhub containers (#41828)
GH-41887 - [Go] Run linter via pre-commit (#41888)
GH-41899 - [C++] IPC: Minor enhance the code of writer (#41900)
GH-41905 - [JS] Update dependencies (#41906)
GH-41910 - [Python] Add support for Pyodide (#37822)
GH-41923 - [C++] Fix ExecuteScalar deduce all_scalar with chunked_array (#41925)
GH-41929 - [Java] pom.xml license formatting (#42049)
GH-41945 - [Swift] Add interface ArrowArrayHolderBuilder (#41946)
GH-41947 - [Java] Support catalog in JDBC driver with session options (#42035)
GH-41952 - [R] Turn S3 and ZSTD on by default for macOS (#42210)
GH-41953 - [C++] Minor enhance code style for FixedShapeTensorType (#41954)
GH-41955 - [C++] Follow up of adding null_bitmap to MapArray::FromArrays (#41956)
GH-41960 - Expose new S3 option check_directory_existence_before_creation (#41972)
GH-41968 - [Java] Implement TransferPair functionality for BinaryView (#41980)
GH-41970 - [C++] Misc changes making code around list-like types and list-view types behave the same way (#41971)
GH-41978 - [Python] Fix pandas tests to follow downstream datetime64 unit changes (#41979)
GH-41983 - [Dev] Run issue labeling bot only when opening an issue (not editing) (#41986)
GH-41994 - [C++] : kernel.cc: Remove defaults on switch so that compiler can check full enum coverage for us (#41995)
GH-41999 - [Swift] Add methods for adding array and vargs to arrow array (#42000)
GH-42002 - [Java] Update Unit Tests for Vector Module (#42019)
GH-42013 - [Python] Allow Array.filter() to take general array input (#42051)
GH-42016 - [Python] Expose new FLOAT16 logical type in the pyarrow.parquet bindings (#42103)
GH-42020 - [Swift] Add Arrow decoding implementation for Swift Codable (#42023)
GH-42021 - [Swift] Add Arrow encoder implementation for Swift Codable (#43063)
GH-42025 - [Java] Update Unit Tests for Algorithm Module (#42029)
GH-42030 - [Java] Update Unit Tests for Adapter Module (#42038)
GH-42042 - [Java] Update Unit Tests for Compressions Module (#42044)
GH-42045 - [Java] Update Unit Tests for Flight Module (#42158)
GH-42087 - [Swift] refactored to remove build warnings (#42088)
GH-42092 - [Java] Update Unit Tests for Tools Module (#42093)
GH-42100 - [C++][Parquet] ParquetFilePrinter::JSONPrint print length of FLBA (#41981)
GH-42101 - [Java] Create File for Output Validation in FileRoundtrip (#42115)
GH-42109 - [C++][CMake] Add preset for Valgrind (#42110)
GH-42112 - [Python] Array gracefully fails on non-cpu device (#42113)
GH-42121 - [Java] Cleanup spotless plugin configuration (#43019)
GH-42124 - [Swift] Add methods for loading and validating builder by type (#42195)
GH-42126 - [C++] Move TakeXXX free functions into TakeMetaFunction and make them private (#42127)
GH-42128 - [Packaging][CentOS] Migrate CentOS 7 and CentOS Stream 8 packaging jobs to use vault.centos.org (#42129)
GH-42134 - [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme (#42135)
GH-42143 - [R] Sanitize R metadata (#41969)
GH-42146 - [MATLAB] Add IPC RecordBatchFileReader and RecordBatchFileWriter MATLAB classes (#42201)
GH-42162 - [Java] Update Unit Tests for Dataset Module (#42163)
GH-42164 - [Java] Update Unit Tests for Gandiva Module (#42166)
GH-42165 - [Java] Update Unit Tests for Memory Module (#42161)
GH-42167 - [CI] Upgrade the version of vcpkg in .env (#42171)
GH-42168 - [Python][Parquet] Pyarrow store decimal as integer (#42169)
GH-42190 - [Python] Add CI job for Numpy 1.X (#42189)
GH-42193 - [Java] Update dependency to maintain JUnit 5 only (#42206)
GH-42228 - [CI][Java] Suppress transfer progress log in java-jars (#42230)
GH-42235 - [C++] list_parent_indices: Add support for list-view types (#42236)
GH-42243 - [Swift] Update isValidBuilderType to not required instance of type (#42244)
GH-42245 - [Swift] Ensure map behavior is the same for all key types (#42246)
GH-43020 - [Java] Simplify flight.properties generation (#43028)
GH-43033 - [CI][Docker] Enable linter for python-wheel-windows-test-vs2019 (#43034)
GH-43040 - [C++] Reduce the recursion of many-join test (#43042)
GH-43045 - [CI][Python] Pin openjdk=17 in python substrait integration (#43051)
GH-43060 - [C++] Limit buffer size in BufferedInputStream::SetBufferSize with raw_read_bound (#43064)
GH-43076 - [C#] Upgrade Xunit and change how Python integration tests are skipped (#43091)
To see a diff of this commit:
https://wip.pkgsrc.org/cgi-bin/gitweb.cgi?p=pkgsrc-wip.git;a=commitdiff;h=0bcd1546ad9798e40b42ffb3cd5c535f98108fb0
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
diffstat:
apache-arrow/Makefile | 31 ++++++++++++++++++--
apache-arrow/PLIST | 78 ++++++++++++++++++++++++++++++++++++++++++-------
apache-arrow/distinfo | 15 ++++++----
apache-arrow/version.mk | 2 +-
4 files changed, 106 insertions(+), 20 deletions(-)
diffs:
diff --git a/apache-arrow/Makefile b/apache-arrow/Makefile
index f58843f0fe..2c8db81680 100644
--- a/apache-arrow/Makefile
+++ b/apache-arrow/Makefile
@@ -13,17 +13,20 @@ LICENSE= apache-2.0
# These packages are built within arrow, and it
# looks difficult to decouple them
# They come from ./cpp/thirdparty/versions.txt
-XSIMD= 9.0.1.tar.gz
+XSIMD= 13.0.0.tar.gz
JEMALLOC= jemalloc-5.3.0.tar.bz2
+SUBSTRAIT= v0.44.0.tar.gz
+
DISTFILES+= ${DISTNAME}${EXTRACT_SUFX}
DISTFILES+= ${XSIMD}
DISTFILES+= ${JEMALLOC}
+DISTFILES+= ${SUBSTRAIT}
SITES.${XSIMD}= https://github.com/xtensor-stack/xsimd/archive/
+SITES.${SUBSTRAIT}= https://github.com/substrait-io/substrait/archive/
SITES.${JEMALLOC}= ${MASTER_SITE_GITHUB:=jemalloc/jemalloc/releases/download/5.3.0/}
.include "../../mk/bsd.prefs.mk"
-.include "options.mk"
CONFIGURE_DIR= cpp
@@ -43,7 +46,19 @@ CMAKE_CONFIGURE_ARGS+= -DARROW_BUILD_UTILITIES=ON
CMAKE_CONFIGURE_ARGS+= -DARROW_CSV=ON
CMAKE_CONFIGURE_ARGS+= -DARROW_ACERO=ON
CMAKE_CONFIGURE_ARGS+= -DARROW_DATASET=ON
+CMAKE_CONFIGURE_ARGS+= -DARROW_WITH_BROTLI=ON
+CMAKE_CONFIGURE_ARGS+= -DARROW_WITH_BZ2=ON
+CMAKE_CONFIGURE_ARGS+= -DARROW_USE_GLOG=OFF=ON
+CMAKE_CONFIGURE_ARGS+= -DARROW_JSON=ON
+CMAKE_CONFIGURE_ARGS+= -DARROW_WITH_LZ4=ON
+CMAKE_CONFIGURE_ARGS+= -DARROW_PARQUET=ON
CMAKE_CONFIGURE_ARGS+= -DPARQUET_BUILD_EXECUTABLES=ON
+CMAKE_CONFIGURE_ARGS+= -DPARQUET_REQUIRE_ENCRYPTION=ON
+CMAKE_CONFIGURE_ARGS+= -DARROW_SUBSTRAIT=ON
+CMAKE_CONFIGURE_ARGS+= -DARROW_FLIGHT=ON
+CMAKE_CONFIGURE_ARGS+= -DARROW_WITH_SNAPPY=ON
+CMAKE_CONFIGURE_ARGS+= -DARROW_WITH_ZLIB=ON
+CMAKE_CONFIGURE_ARGS+= -DARROW_WITH_ZSTD=ON
# For finding deps
CMAKE_CONFIGURE_ARGS+= -Dxsimd_SOURCE=BUNDLED
@@ -51,6 +66,7 @@ CMAKE_CONFIGURE_ARGS+= -Dxsimd_SOURCE=BUNDLED
# Set environment variable to find the extra source packages
CONFIGURE_ENV+= ARROW_JEMALLOC_URL=/${DISTDIR}/${JEMALLOC}
CONFIGURE_ENV+= ARROW_XSIMD_URL=/${DISTDIR}/${XSIMD}
+CONFIGURE_ENV+= ARROW_SUBSTRAIT_URL=/${DISTDIR}/${SUBSTRAIT}
# To enable tests, devel/googletest needs -fPIE removed
CMAKE_CONFIGURE_ARGS+= -DARROW_BUILD_TESTS=OFF
@@ -61,15 +77,24 @@ post-install:
.include "../../wip/apache-arrow/version.mk"
+.include "../../archivers/brotli/buildlink3.mk"
+.include "../../archivers/bzip2/buildlink3.mk"
+.include "../../archivers/lz4/buildlink3.mk"
+.include "../../archivers/zstd/buildlink3.mk"
.include "../../converters/utf8proc/buildlink3.mk"
.include "../../devel/boost-libs/buildlink3.mk"
.include "../../devel/cmake/build.mk"
.include "../../devel/gflags/buildlink3.mk"
-# .include "../../devel/googletest/buildlink3.mk"
+.include "../../devel/google-glog/buildlink3.mk"
+.include "../../devel/snappy/buildlink3.mk"
+.include "../../devel/zlib/buildlink3.mk"
+#.include "../../devel/googletest/buildlink3.mk"
.include "../../devel/flatbuffers/buildlink3.mk"
.include "../../devel/libthrift/buildlink3.mk"
.include "../../devel/protobuf/buildlink3.mk"
.include "../../devel/re2/buildlink3.mk"
+.include "../../lang/llvm/buildlink3.mk"
.include "../../net/grpc/buildlink3.mk"
+.include "../../textproc/rapidjson/buildlink3.mk"
.include "../../mk/bsd.pkg.mk"
diff --git a/apache-arrow/PLIST b/apache-arrow/PLIST
index 4897b2f231..b644ef3f42 100644
--- a/apache-arrow/PLIST
+++ b/apache-arrow/PLIST
@@ -113,6 +113,19 @@ include/arrow/dataset/type_fwd.h
include/arrow/dataset/visibility.h
include/arrow/datum.h
include/arrow/device.h
+include/arrow/engine/api.h
+include/arrow/engine/pch.h
+include/arrow/engine/substrait/api.h
+include/arrow/engine/substrait/extension_set.h
+include/arrow/engine/substrait/extension_types.h
+include/arrow/engine/substrait/options.h
+include/arrow/engine/substrait/relation.h
+include/arrow/engine/substrait/serde.h
+include/arrow/engine/substrait/test_plan_builder.h
+include/arrow/engine/substrait/test_util.h
+include/arrow/engine/substrait/type_fwd.h
+include/arrow/engine/substrait/util.h
+include/arrow/engine/substrait/visibility.h
include/arrow/extension/fixed_shape_tensor.h
include/arrow/extension_type.h
include/arrow/filesystem/api.h
@@ -128,6 +141,28 @@ include/arrow/filesystem/s3_test_util.h
include/arrow/filesystem/s3fs.h
include/arrow/filesystem/test_util.h
include/arrow/filesystem/type_fwd.h
+include/arrow/flight/api.h
+include/arrow/flight/client.h
+include/arrow/flight/client_auth.h
+include/arrow/flight/client_cookie_middleware.h
+include/arrow/flight/client_middleware.h
+include/arrow/flight/client_tracing_middleware.h
+include/arrow/flight/middleware.h
+include/arrow/flight/otel_logging.h
+include/arrow/flight/pch.h
+include/arrow/flight/platform.h
+include/arrow/flight/server.h
+include/arrow/flight/server_auth.h
+include/arrow/flight/server_middleware.h
+include/arrow/flight/server_tracing_middleware.h
+include/arrow/flight/test_definitions.h
+include/arrow/flight/test_util.h
+include/arrow/flight/transport.h
+include/arrow/flight/transport_server.h
+include/arrow/flight/type_fwd.h
+include/arrow/flight/types.h
+include/arrow/flight/types_async.h
+include/arrow/flight/visibility.h
include/arrow/io/api.h
include/arrow/io/buffered.h
include/arrow/io/caching.h
@@ -186,6 +221,7 @@ include/arrow/testing/async_test_util.h
include/arrow/testing/builder.h
include/arrow/testing/executor_util.h
include/arrow/testing/extension_type.h
+include/arrow/testing/fixed_width_test_util.h
include/arrow/testing/future_util.h
include/arrow/testing/generator.h
include/arrow/testing/gtest_compat.h
@@ -258,6 +294,7 @@ include/arrow/util/iterator.h
include/arrow/util/key_value_metadata.h
include/arrow/util/launder.h
include/arrow/util/list_util.h
+include/arrow/util/logger.h
include/arrow/util/logging.h
include/arrow/util/macros.h
include/arrow/util/map.h
@@ -397,6 +434,8 @@ lib/cmake/Arrow/ArrowOptions.cmake
lib/cmake/Arrow/ArrowTargets-release.cmake
lib/cmake/Arrow/ArrowTargets.cmake
lib/cmake/Arrow/FindBrotliAlt.cmake
+lib/cmake/Arrow/FindOpenSSLAlt.cmake
+lib/cmake/Arrow/FindProtobufAlt.cmake
lib/cmake/Arrow/FindSnappyAlt.cmake
lib/cmake/Arrow/FindglogAlt.cmake
lib/cmake/Arrow/Findlz4Alt.cmake
@@ -412,6 +451,15 @@ lib/cmake/ArrowDataset/ArrowDatasetConfig.cmake
lib/cmake/ArrowDataset/ArrowDatasetConfigVersion.cmake
lib/cmake/ArrowDataset/ArrowDatasetTargets-release.cmake
lib/cmake/ArrowDataset/ArrowDatasetTargets.cmake
+lib/cmake/ArrowFlight/ArrowFlightConfig.cmake
+lib/cmake/ArrowFlight/ArrowFlightConfigVersion.cmake
+lib/cmake/ArrowFlight/ArrowFlightTargets-release.cmake
+lib/cmake/ArrowFlight/ArrowFlightTargets.cmake
+lib/cmake/ArrowFlight/FindgRPCAlt.cmake
+lib/cmake/ArrowSubstrait/ArrowSubstraitConfig.cmake
+lib/cmake/ArrowSubstrait/ArrowSubstraitConfigVersion.cmake
+lib/cmake/ArrowSubstrait/ArrowSubstraitTargets-release.cmake
+lib/cmake/ArrowSubstrait/ArrowSubstraitTargets.cmake
lib/cmake/Parquet/FindThriftAlt.cmake
lib/cmake/Parquet/ParquetConfig.cmake
lib/cmake/Parquet/ParquetConfigVersion.cmake
@@ -419,32 +467,42 @@ lib/cmake/Parquet/ParquetTargets-release.cmake
lib/cmake/Parquet/ParquetTargets.cmake
lib/libarrow.a
lib/libarrow.so
-lib/libarrow.so.1601
-lib/libarrow.so.1601.0.0
+lib/libarrow.so.1700
+lib/libarrow.so.1700.0.0
lib/libarrow_acero.a
lib/libarrow_acero.so
-lib/libarrow_acero.so.1601
-lib/libarrow_acero.so.1601.0.0
+lib/libarrow_acero.so.1700
+lib/libarrow_acero.so.1700.0.0
lib/libarrow_bundled_dependencies.a
lib/libarrow_dataset.a
lib/libarrow_dataset.so
-lib/libarrow_dataset.so.1601
-lib/libarrow_dataset.so.1601.0.0
+lib/libarrow_dataset.so.1700
+lib/libarrow_dataset.so.1700.0.0
+lib/libarrow_flight.a
+lib/libarrow_flight.so
+lib/libarrow_flight.so.1700
+lib/libarrow_flight.so.1700.0.0
+lib/libarrow_substrait.a
+lib/libarrow_substrait.so
+lib/libarrow_substrait.so.1700
+lib/libarrow_substrait.so.1700.0.0
lib/libparquet.a
lib/libparquet.so
-lib/libparquet.so.1601
-lib/libparquet.so.1601.0.0
+lib/libparquet.so.1700
+lib/libparquet.so.1700.0.0
lib/pkgconfig/arrow-acero.pc
lib/pkgconfig/arrow-compute.pc
lib/pkgconfig/arrow-csv.pc
lib/pkgconfig/arrow-dataset.pc
lib/pkgconfig/arrow-filesystem.pc
+lib/pkgconfig/arrow-flight.pc
lib/pkgconfig/arrow-json.pc
+lib/pkgconfig/arrow-substrait.pc
lib/pkgconfig/arrow.pc
lib/pkgconfig/parquet.pc
share/arrow/gdb/gdb_arrow.py
-share/arrow/gdb/libarrow.so.1601.0.0-gdb.py
+share/arrow/gdb/libarrow.so.1700.0.0-gdb.py
share/doc/arrow/LICENSE.txt
share/doc/arrow/NOTICE.txt
share/doc/arrow/README.md
-@pkgdir share/gdb/auto-load/usr/pkg/lib
+@pkgdir share/gdb/auto-load/home/matthew/pkgsrc/install.20240810/lib
diff --git a/apache-arrow/distinfo b/apache-arrow/distinfo
index af9e5ee814..99188f243e 100644
--- a/apache-arrow/distinfo
+++ b/apache-arrow/distinfo
@@ -1,12 +1,15 @@
$NetBSD$
-BLAKE2s (9.0.1.tar.gz) = a785e1ad5fd5df76c95e7cf9a6eadeb86ffbc46ea4342f49f19381434bd0f78c
-SHA512 (9.0.1.tar.gz) = ed56287f608ccdf5bc5d5fc2918e313e7c4cecdd9ef2c9993a72ea900d9ff662c57ac5326c7a809eb11505c6f39d4599f3f161b97b6e03c65783b824b8d700d2
-Size (9.0.1.tar.gz) = 215065 bytes
-BLAKE2s (apache-arrow-16.1.0.tar.gz) = 086a7c5d98488d5934d5b3284a93c064c6d4a38767735dbb359ddfac71f0a2bd
-SHA512 (apache-arrow-16.1.0.tar.gz) = 28975f59e1fdde2dba4afaf4a5ba934b63db3a7f27656e2aa0af0f0d2a046c9dbfa9a6082de94629c36d03809b296566a37ea65ec5a2fc17fedac7d21e272d31
-Size (apache-arrow-16.1.0.tar.gz) = 21707079 bytes
+BLAKE2s (13.0.0.tar.gz) = b2edcdae20ea56461825c8ccd41f69ac17c3ffaf06b73239d1c62675e1b5ecf4
+SHA512 (13.0.0.tar.gz) = cdc42ddad3353297cf25ea2b6b3f09967f5f388efc26241f2997979fdbbac072819ff771145bc5bfa86cb326cca84b4119e8e6e3f658407961cf203a40603a7f
+Size (13.0.0.tar.gz) = 259967 bytes
+BLAKE2s (apache-arrow-17.0.0.tar.gz) = f43e8c901e26fe2b17ebd36d0a6758ad3dfa5cf5c92cdc908c62b7cb869a2a8a
+SHA512 (apache-arrow-17.0.0.tar.gz) = 4e2a617b8deeb9f94ee085653a721904a75696f0827bcba82b535cc7f4f723066a09914c7fa83c593e51a8a4031e8bf99e563cac1ebb1d89604cb406975d4864
+Size (apache-arrow-17.0.0.tar.gz) = 21822331 bytes
BLAKE2s (jemalloc-5.3.0.tar.bz2) = 285e6145b9d3b575b1ec5cfdae8af40b461149085f001839d64685c0d56e2689
SHA512 (jemalloc-5.3.0.tar.bz2) = 22907bb052096e2caffb6e4e23548aecc5cc9283dce476896a2b1127eee64170e3562fa2e7db9571298814a7a2c7df6e8d1fbe152bd3f3b0c1abec22a2de34b1
Size (jemalloc-5.3.0.tar.bz2) = 736023 bytes
+BLAKE2s (v0.44.0.tar.gz) = 02ac19748c8d788fde68954028a4cb196814eb1a8d1703e4c12ed8d58c7578ba
+SHA512 (v0.44.0.tar.gz) = 6cc2502a976ec93a8cbdeaa1a6dfe72fd9a96d44520ffa8b1cca522e9d3cb4c3df224bd8619c37cd14b53bf749aeb6e74129ac496c3fd3429936cdff21beb2d4
+Size (v0.44.0.tar.gz) = 131614 bytes
SHA1 (patch-cpp_src_arrow_compute_kernels_vector__pairwise.cc) = 808d9f035b413c95531fc1a82c3c838ebbb14297
diff --git a/apache-arrow/version.mk b/apache-arrow/version.mk
index 3719a67525..9ee843c808 100644
--- a/apache-arrow/version.mk
+++ b/apache-arrow/version.mk
@@ -1,2 +1,2 @@
# $NetBSD$
-APACHE_ARROW_VERSION= 16.1.0
+APACHE_ARROW_VERSION= 17.0.0
Home |
Main Index |
Thread Index |
Old Index