Browse Source

Restoring authorship annotation for <primorial@yandex-team.ru>. Commit 2 of 2.

primorial 3 years ago
parent
commit
b228f91bb4

+ 2 - 2
build/rules/contrib_restricted.policy

@@ -38,9 +38,9 @@ ALLOW passport/infra -> contrib/restricted/thrift
 # keyutils is LGPL: CONTRIB-2236
 ALLOW passport/infra -> contrib/restricted/keyutils
 
-# For Apache Arrow: CONTRIB-1662 
+# For Apache Arrow: CONTRIB-1662
 ALLOW mds -> contrib/restricted/uriparser
- 
+
 # https://st.yandex-team.ru/CONTRIB-2020
 ALLOW weather -> contrib/restricted/range-v3
 

+ 25 - 25
contrib/libs/apache/arrow/README.md

@@ -1,35 +1,35 @@
-<!--- 
-  Licensed to the Apache Software Foundation (ASF) under one 
-  or more contributor license agreements.  See the NOTICE file 
-  distributed with this work for additional information 
-  regarding copyright ownership.  The ASF licenses this file 
-  to you under the Apache License, Version 2.0 (the 
-  "License"); you may not use this file except in compliance 
-  with the License.  You may obtain a copy of the License at 
- 
-    http://www.apache.org/licenses/LICENSE-2.0 
- 
-  Unless required by applicable law or agreed to in writing, 
-  software distributed under the License is distributed on an 
-  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
-  KIND, either express or implied.  See the License for the 
-  specific language governing permissions and limitations 
-  under the License. 
---> 
- 
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
 # Apache Arrow
- 
+
 [![Fuzzing Status](https://oss-fuzz-build-logs.storage.googleapis.com/badges/arrow.svg)](https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:arrow)
 [![License](http://img.shields.io/:license-Apache%202-blue.svg)](https://github.com/apache/arrow/blob/master/LICENSE.txt)
 [![Twitter Follow](https://img.shields.io/twitter/follow/apachearrow.svg?style=social&label=Follow)](https://twitter.com/apachearrow)
- 
+
 ## Powering In-Memory Analytics
- 
+
 Apache Arrow is a development platform for in-memory analytics. It contains a
 set of technologies that enable big data systems to process and move data fast.
- 
+
 Major components of the project include:
- 
+
  - [The Arrow Columnar In-Memory Format](https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst):
    a standard and efficient in-memory representation of various datatypes, plain or nested
  - [The Arrow IPC Format](https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#serialization-and-interprocess-communication-ipc):
@@ -52,7 +52,7 @@ Major components of the project include:
  - [R libraries](https://github.com/apache/arrow/tree/master/r)
  - [Ruby libraries](https://github.com/apache/arrow/tree/master/ruby)
  - [Rust libraries](https://github.com/apache/arrow-rs)
- 
+
 Arrow is an [Apache Software Foundation](https://www.apache.org) project. Learn more at
 [arrow.apache.org](https://arrow.apache.org).
 

+ 501 - 501
contrib/libs/apache/arrow/cpp/CHANGELOG_PARQUET.md

@@ -1,501 +1,501 @@
-Parquet C++ 1.5.0 
--------------------------------------------------------------------------------- 
-## Bug 
-    * [PARQUET-979] - [C++] Limit size of min, max or disable stats for long binary types 
-    * [PARQUET-1071] - [C++] parquet::arrow::FileWriter::Close is not idempotent 
-    * [PARQUET-1349] - [C++] PARQUET_RPATH_ORIGIN is not picked by the build 
-    * [PARQUET-1334] - [C++] memory_map parameter seems missleading in parquet file opener 
-    * [PARQUET-1333] - [C++] Reading of files with dictionary size 0 fails on Windows with bad_alloc 
-    * [PARQUET-1283] - [C++] FormatStatValue appends trailing space to string and int96 
-    * [PARQUET-1270] - [C++] Executable tools do not get installed 
-    * [PARQUET-1272] - [C++] ScanFileContents reports wrong row count for nested columns 
-    * [PARQUET-1268] - [C++] Conversion of Arrow null list columns fails 
-    * [PARQUET-1255] - [C++] Exceptions thrown in some tests 
-    * [PARQUET-1358] - [C++] index_page_offset should be unset as it is not supported. 
-    * [PARQUET-1357] - [C++] FormatStatValue truncates binary statistics on zero character 
-    * [PARQUET-1319] - [C++] Pass BISON_EXECUTABLE to Thrift EP for MacOS 
-    * [PARQUET-1313] - [C++] Compilation failure with VS2017 
-    * [PARQUET-1315] - [C++] ColumnChunkMetaData.has_dictionary_page() should return bool, not int64_t 
-    * [PARQUET-1307] - [C++] memory-test fails with latest Arrow 
-    * [PARQUET-1274] - [Python] SegFault in pyarrow.parquet.write_table with specific options 
-    * [PARQUET-1209] - locally defined symbol ... imported in function .. 
-    * [PARQUET-1245] - [C++] Segfault when writing Arrow table with duplicate columns 
-    * [PARQUET-1273] - [Python] Error writing to partitioned Parquet dataset 
-    * [PARQUET-1384] - [C++] Clang compiler warnings in bloom_filter-test.cc 
- 
-## Improvement 
-    * [PARQUET-1348] - [C++] Allow Arrow FileWriter To Write FileMetaData 
-    * [PARQUET-1346] - [C++] Protect against null values data in empty Arrow array 
-    * [PARQUET-1340] - [C++] Fix Travis Ci valgrind errors related to std::random_device 
-    * [PARQUET-1323] - [C++] Fix compiler warnings with clang-6.0 
-    * [PARQUET-1279] - Use ASSERT_NO_FATAIL_FAILURE in C++ unit tests 
-    * [PARQUET-1262] - [C++] Use the same BOOST_ROOT and Boost_NAMESPACE for Thrift 
-    * [PARQUET-1267] - replace "unsafe" std::equal by std::memcmp 
-    * [PARQUET-1360] - [C++] Minor API + style changes follow up to PARQUET-1348 
-    * [PARQUET-1166] - [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h 
-    * [PARQUET-1378] - [c++] Allow RowGroups with zero rows to be written 
-    * [PARQUET-1256] - [C++] Add --print-key-value-metadata option to parquet_reader tool 
-    * [PARQUET-1276] - [C++] Reduce the amount of memory used for writing null decimal values 
- 
-## New Feature 
-    * [PARQUET-1392] - [C++] Supply row group indices to parquet::arrow::FileReader::ReadTable 
- 
-## Sub-task 
-    * [PARQUET-1227] - Thrift crypto metadata structures 
-    * [PARQUET-1332] - [C++] Add bloom filter utility class 
- 
-## Task 
-    * [PARQUET-1350] - [C++] Use abstract ResizableBuffer instead of concrete PoolBuffer 
-    * [PARQUET-1366] - [C++] Streamline use of Arrow bit-util.h 
-    * [PARQUET-1308] - [C++] parquet::arrow should use thread pool, not ParallelFor 
-    * [PARQUET-1382] - [C++] Prepare for arrow::test namespace removal 
-    * [PARQUET-1372] - [C++] Add an API to allow writing RowGroups based on their size rather than num_rows 
- 
- 
-Parquet C++ 1.4.0 
--------------------------------------------------------------------------------- 
-## Bug 
-    * [PARQUET-1193] - [CPP] Implement ColumnOrder to support min_value and max_value 
-    * [PARQUET-1180] - C++: Fix behaviour of num_children element of primitive nodes 
-    * [PARQUET-1146] - C++: Add macOS-compatible sha512sum call to release verify script 
-    * [PARQUET-1167] - [C++] FieldToNode function should return a status when throwing an exception 
-    * [PARQUET-1175] - [C++] Fix usage of deprecated Arrow API 
-    * [PARQUET-1113] - [C++] Incorporate fix from ARROW-1601 on bitmap read path 
-    * [PARQUET-1111] - dev/release/verify-release-candidate has stale help 
-    * [PARQUET-1109] - C++: Update release verification script to SHA512 
-    * [PARQUET-1179] - [C++] Support Apache Thrift 0.11 
-    * [PARQUET-1226] - [C++] Fix new build warnings with clang 5.0 
-    * [PARQUET-1233] - [CPP ]Enable option to switch between stl classes and boost classes for thrift header 
-    * [PARQUET-1205] - Fix msvc static build 
-    * [PARQUET-1210] - [C++] Boost 1.66 compilation fails on Windows on linkage stage 
- 
-## Improvement 
-    * [PARQUET-1092] - [C++] Write Arrow tables with chunked columns 
-    * [PARQUET-1086] - [C++] Remove usage of arrow/util/compiler-util.h after 1.3.0 release 
-    * [PARQUET-1097] - [C++] Account for Arrow API deprecation in ARROW-1511 
-    * [PARQUET-1150] - C++: Hide statically linked boost symbols 
-    * [PARQUET-1151] - [C++] Add build options / configuration to use static runtime libraries with MSVC 
-    * [PARQUET-1147] - [C++] Account for API deprecation / change in ARROW-1671 
-    * [PARQUET-1162] - C++: Update dev/README after migration to Gitbox 
-    * [PARQUET-1165] - [C++] Pin clang-format version to 4.0 
-    * [PARQUET-1164] - [C++] Follow API changes in ARROW-1808 
-    * [PARQUET-1177] - [C++] Add more extensive compiler warnings when using Clang 
-    * [PARQUET-1110] - [C++] Release verification script for Windows 
-    * [PARQUET-859] - [C++] Flatten parquet/file directory 
-    * [PARQUET-1220] - [C++] Don't build Thrift examples and tutorials in the ExternalProject 
-    * [PARQUET-1219] - [C++] Update release-candidate script links to gitbox 
-    * [PARQUET-1196] - [C++] Provide a parquet_arrow example project incl. CMake setup 
-    * [PARQUET-1200] - [C++] Support reading a single Arrow column from a Parquet file 
- 
-## New Feature 
-    * [PARQUET-1095] - [C++] Read and write Arrow decimal values 
-    * [PARQUET-970] - Add Add Lz4 and Zstd compression codecs 
- 
-## Task 
-    * [PARQUET-1221] - [C++] Extend release README 
-    * [PARQUET-1225] - NaN values may lead to incorrect filtering under certain circumstances 
- 
- 
-Parquet C++ 1.3.1 
--------------------------------------------------------------------------------- 
-## Bug 
-    * [PARQUET-1105] - [CPP] Remove libboost_system dependency 
-    * [PARQUET-1138] - [C++] Fix compilation with Arrow 0.7.1 
-    * [PARQUET-1123] - [C++] Update parquet-cpp to use Arrow's AssertArraysEqual 
-    * [PARQUET-1121] - C++: DictionaryArrays of NullType cannot be written 
-    * [PARQUET-1139] - Add license to cmake_modules/parquet-cppConfig.cmake.in 
- 
-## Improvement 
-    * [PARQUET-1140] - [C++] Fail on RAT errors in CI 
-    * [PARQUET-1070] - Add CPack support to the build 
- 
- 
-Parquet C++ 1.3.0 
--------------------------------------------------------------------------------- 
-## Bug 
-    * [PARQUET-1098] - [C++] Install new header in parquet/util 
-    * [PARQUET-1085] - [C++] Backwards compatibility from macro cleanup in transitive dependencies in ARROW-1452 
-    * [PARQUET-1074] - [C++] Switch to long key ids in KEYs file 
-    * [PARQUET-1075] - C++: Coverage upload is broken 
-    * [PARQUET-1088] - [CPP] remove parquet_version.h from version control since it gets auto generated 
-    * [PARQUET-1002] - [C++] Compute statistics based on Logical Types 
-    * [PARQUET-1100] - [C++] Reading repeated types should decode number of records rather than number of values 
-    * [PARQUET-1090] - [C++] Fix int32 overflow in Arrow table writer, add max row group size property 
-    * [PARQUET-1108] - [C++] Fix Int96 comparators 
- 
-## Improvement 
-    * [PARQUET-1104] - [C++] Upgrade to Apache Arrow 0.7.0 RC0 
-    * [PARQUET-1072] - [C++] Add ARROW_NO_DEPRECATED_API to CI to check for deprecated API use 
-    * [PARQUET-1096] - C++: Update sha{1, 256, 512} checksums per latest ASF release policy 
-    * [PARQUET-1079] - [C++] Account for Arrow API change in ARROW-1335 
-    * [PARQUET-1087] - [C++] Add wrapper for ScanFileContents in parquet::arrow that catches exceptions 
-    * [PARQUET-1093] - C++: Improve Arrow level generation error message 
-    * [PARQUET-1094] - C++: Add benchmark for boolean Arrow column I/O 
-    * [PARQUET-1083] - [C++] Refactor core logic in parquet-scan.cc so that it can be used as a library function for benchmarking 
-    * [PARQUET-1037] - Allow final RowGroup to be unfilled 
- 
-## New Feature 
-    * [PARQUET-1078] - [C++] Add Arrow writer option to coerce timestamps to milliseconds or microseconds 
-    * [PARQUET-929] - [C++] Handle arrow::DictionaryArray when writing Arrow data 
- 
- 
-Parquet C++ 1.2.0 
--------------------------------------------------------------------------------- 
-## Bug 
-    * [PARQUET-1029] - [C++] TypedColumnReader/TypeColumnWriter symbols are no longer being exported 
-    * [PARQUET-997] - Fix override compiler warnings 
-    * [PARQUET-1033] - Mismatched Read and Write 
-    * [PARQUET-1007] - [C++ ] Update parquet.thrift from https://github.com/apache/parquet-format 
-    * [PARQUET-1039] - PARQUET-911 Breaks Arrow 
-    * [PARQUET-1038] - Key value metadata should be nullptr if not set 
-    * [PARQUET-1018] - [C++] parquet.dll has runtime dependencies on one or more libraries in the build toolchain 
-    * [PARQUET-1003] - [C++] Modify DEFAULT_CREATED_BY value for every new release version 
-    * [PARQUET-1004] - CPP Building fails on windows 
-    * [PARQUET-1040] - Missing writer method implementations 
-    * [PARQUET-1054] - [C++] Account for Arrow API changes in ARROW-1199 
-    * [PARQUET-1042] - C++: Compilation breaks on GCC 4.8 
-    * [PARQUET-1048] - [C++] Static linking of libarrow is no longer supported 
-    * [PARQUET-1013] - Fix ZLIB_INCLUDE_DIR 
-    * [PARQUET-998] - C++: Release script is not usable 
-    * [PARQUET-1023] - [C++] Brotli libraries are not being statically linked on Windows 
-    * [PARQUET-1000] - [C++] Do not build thirdparty Arrow with /WX on MSVC 
-    * [PARQUET-1052] - [C++] add_compiler_export_flags() throws warning with CMake >= 3.3 
-    * [PARQUET-1069] - C++: ./dev/release/verify-release-candidate is broken due to missing Arrow dependencies 
- 
-## Improvement 
-    * [PARQUET-996] - Improve MSVC build - ThirdpartyToolchain - Arrow 
-    * [PARQUET-911] - C++: Support nested structs in parquet_arrow 
-    * [PARQUET-986] - Improve MSVC build - ThirdpartyToolchain - Thrift 
-    * [PARQUET-864] - [C++] Consolidate non-Parquet-specific bit utility code into Apache Arrow 
-    * [PARQUET-1043] - [C++] Raise minimum supported CMake version to 3.2 
-    * [PARQUET-1016] - Upgrade thirdparty Arrow to 0.4.0 
-    * [PARQUET-858] - [C++] Flatten parquet/column directory, consolidate related code 
-    * [PARQUET-978] - [C++] Minimizing footer reads for small(ish) metadata 
-    * [PARQUET-991] - [C++] Fix compiler warnings on MSVC and build with /WX in Appveyor 
-    * [PARQUET-863] - [C++] Move SIMD, CPU info, hashing, and other generic utilities into Apache Arrow 
-    * [PARQUET-1053] - Fix unused result warnings due to unchecked Statuses 
-    * [PARQUET-1067] - C++: Update arrow hash to 0.5.0 
-    * [PARQUET-1041] - C++: Support Arrow's NullArray 
-    * [PARQUET-1008] - Update TypedColumnReader::ReadBatch method to accept batch_size as int64_t 
-    * [PARQUET-1044] - [C++] Use compression libraries from Apache Arrow 
-    * [PARQUET-999] - Improve MSVC build - Enable PARQUET_BUILD_BENCHMARKS 
-    * [PARQUET-967] - [C++] Combine libparquet/libparquet_arrow libraries 
-    * [PARQUET-1045] - [C++] Refactor to account for computational utility code migration in ARROW-1154 
- 
-## New Feature 
-    * [PARQUET-1035] - Write Int96 from Arrow Timestamp(ns) 
- 
-## Task 
-    * [PARQUET-994] - C++: release-candidate script should not push to master 
-    * [PARQUET-902] - [C++] Move compressor interfaces into Apache Arrow 
- 
-## Test 
-    * [PARQUET-706] - [C++] Create test case that uses libparquet as a 3rd party library 
- 
- 
-Parquet C++ 1.1.0 
--------------------------------------------------------------------------------- 
-## Bug 
-    * [PARQUET-898] - [C++] Change Travis CI OS X image to Xcode 6.4 and fix our thirdparty build 
-    * [PARQUET-976] - [C++] Pass unit test suite with MSVC, build in Appveyor 
-    * [PARQUET-963] - [C++] Disallow reading struct types in Arrow reader for now 
-    * [PARQUET-959] - [C++] Arrow thirdparty build fails on multiarch systems 
-    * [PARQUET-962] - [C++] GTEST_MAIN_STATIC_LIB is not defined in FindGTest.cmake 
-    * [PARQUET-958] - [C++] Print Parquet metadata in JSON format 
-    * [PARQUET-956] - C++: BUILD_BYPRODUCTS not specified anymore for gtest 
-    * [PARQUET-948] - [C++] Account for API changes in ARROW-782 
-    * [PARQUET-947] - [C++] Refactor to account for ARROW-795 Arrow core library consolidation 
-    * [PARQUET-965] - [C++] FIXED_LEN_BYTE_ARRAY types are unhandled in the Arrow reader 
-    * [PARQUET-949] - [C++] Arrow version pinning seems to not be working properly 
-    * [PARQUET-955] - [C++] pkg_check_modules will override $ARROW_HOME if it is set in the environment 
-    * [PARQUET-945] - [C++] Thrift static libraries are not used with recent patch 
-    * [PARQUET-943] - [C++] Overflow build error on x86 
-    * [PARQUET-938] - [C++] There is a typo in cmake_modules/FindSnappy.cmake comment 
-    * [PARQUET-936] - [C++] parquet::arrow::WriteTable can enter infinite loop if chunk_size is 0 
-    * [PARQUET-981] - Repair usage of *_HOME 3rd party dependencies environment variables during Windows build 
-    * [PARQUET-992] - [C++] parquet/compression.h leaks zlib.h 
-    * [PARQUET-987] - [C++] Fix regressions caused by PARQUET-981 
-    * [PARQUET-933] - [C++] Account for Arrow Table API changes coming in ARROW-728 
-    * [PARQUET-915] - Support Arrow Time Types in Schema 
-    * [PARQUET-914] - [C++] Throw more informative exception when user writes too many values to a column in a row group 
-    * [PARQUET-923] - [C++] Account for Time metadata changes in ARROW-686 
-    * [PARQUET-918] - FromParquetSchema API crashes on nested schemas 
-    * [PARQUET-925] - [C++] FindArrow.cmake sets the wrong library path after ARROW-648 
-    * [PARQUET-932] - [c++] Add option to build parquet library with minimal dependency 
-    * [PARQUET-919] - [C++] Account for API changes in ARROW-683 
-    * [PARQUET-995] - [C++] Int96 reader in parquet_arrow uses size of Int96Type instead of Int96 
- 
-## Improvement 
-    * [PARQUET-508] - Add ParquetFilePrinter 
-    * [PARQUET-595] - Add API for key-value metadata 
-    * [PARQUET-897] - [C++] Only use designated public headers from libarrow 
-    * [PARQUET-679] - [C++] Build and unit tests support for MSVC on Windows 
-    * [PARQUET-977] - Improve MSVC build 
-    * [PARQUET-957] - [C++] Add optional $PARQUET_BUILD_TOOLCHAIN environment variable option for configuring build environment 
-    * [PARQUET-961] - [C++] Strip debug symbols from libparquet libraries in release builds by default 
-    * [PARQUET-954] - C++: Use Brolti 0.6 release 
-    * [PARQUET-953] - [C++] Change arrow::FileWriter API to be initialized from a Schema, and provide for writing multiple tables 
-    * [PARQUET-941] - [C++] Stop needless Boost static library detection for CentOS 7 support 
-    * [PARQUET-942] - [C++] Fix wrong variabe use in FindSnappy 
-    * [PARQUET-939] - [C++] Support Thrift_HOME CMake variable like FindSnappy does as Snappy_HOME 
-    * [PARQUET-940] - [C++] Fix Arrow library path detection 
-    * [PARQUET-937] - [C++] Support CMake < 3.4 again for Arrow detection 
-    * [PARQUET-935] - [C++] Set shared library version for .deb packages 
-    * [PARQUET-934] - [C++] Support multiarch on Debian 
-    * [PARQUET-984] - C++: Add abi and so version to pkg-config 
-    * [PARQUET-983] - C++: Update Thirdparty hash to Arrow 0.3.0 
-    * [PARQUET-989] - [C++] Link dynamically to libarrow in toolchain build, set LD_LIBRARY_PATH 
-    * [PARQUET-988] - [C++] Add Linux toolchain-based build to Travis CI 
-    * [PARQUET-928] - [C++] Support pkg-config 
-    * [PARQUET-927] - [C++] Specify shared library version of Apache Arrow 
-    * [PARQUET-931] - [C++] Add option to pin thirdparty Arrow version used in ExternalProject 
-    * [PARQUET-926] - [C++] Use pkg-config to find Apache Arrow 
-    * [PARQUET-917] - C++: Build parquet_arrow by default 
-    * [PARQUET-910] - C++: Support TIME logical type in parquet_arrow 
-    * [PARQUET-909] - [CPP]: Reduce buffer allocations (mallocs) on critical path 
- 
-## New Feature 
-    * [PARQUET-853] - [C++] Add option to link with shared boost libraries when building Arrow in the thirdparty toolchain 
-    * [PARQUET-946] - [C++] Refactoring in parquet::arrow::FileReader to be able to read a single row group 
-    * [PARQUET-930] - [C++] Account for all Arrow date/time types 
- 
- 
-Parquet C++ 1.0.0 
--------------------------------------------------------------------------------- 
-## Bug 
-    * [PARQUET-455] - Fix compiler warnings on OS X / Clang 
-    * [PARQUET-558] - Support ZSH in build scripts 
-    * [PARQUET-720] - Parquet-cpp fails to link when included in multiple TUs 
-    * [PARQUET-718] - Reading boolean pages written by parquet-cpp fails 
-    * [PARQUET-640] - [C++] Force the use of gcc 4.9 in conda builds 
-    * [PARQUET-643] - Add const modifier to schema pointer reference in ParquetFileWriter 
-    * [PARQUET-672] - [C++] Build testing conda artifacts in debug mode 
-    * [PARQUET-661] - [C++] Do not assume that perl is found in /usr/bin 
-    * [PARQUET-659] - [C++] Instantiated template visibility is broken on clang / OS X 
-    * [PARQUET-657] - [C++] Don't define DISALLOW_COPY_AND_ASSIGN if already defined 
-    * [PARQUET-656] - [C++] Revert PARQUET-653 
-    * [PARQUET-676] - MAX_VALUES_PER_LITERAL_RUN causes RLE encoding failure 
-    * [PARQUET-614] - C++: Remove unneeded LZ4-related code 
-    * [PARQUET-604] - Install writer.h headers 
-    * [PARQUET-621] - C++: Uninitialised DecimalMetadata is read 
-    * [PARQUET-620] - C++: Duplicate calls to ParquetFileWriter::Close cause duplicate metdata writes 
-    * [PARQUET-599] - ColumnWriter::RleEncodeLevels' size estimation might be wrong 
-    * [PARQUET-617] - C++: Enable conda build to work on systems with non-default C++ toolchains 
-    * [PARQUET-627] - Ensure that thrift headers are generated before source compilation 
-    * [PARQUET-745] - TypedRowGroupStatistics fails to PlainDecode min and max in ByteArrayType 
-    * [PARQUET-738] - Update arrow version that also supports newer Xcode 
-    * [PARQUET-747] - [C++] TypedRowGroupStatistics are not being exported in libparquet.so 
-    * [PARQUET-711] - Use metadata builders in parquet writer 
-    * [PARQUET-732] - Building a subset of dependencies does not work 
-    * [PARQUET-760] - On switching from dictionary to the fallback encoding, an incorrect encoding is set 
-    * [PARQUET-691] - [C++] Write ColumnChunk metadata after each column chunk in the file 
-    * [PARQUET-797] - [C++] Update for API changes in ARROW-418 
-    * [PARQUET-837] - [C++] SerializedFile::ParseMetaData uses Seek, followed by Read, and could have race conditions 
-    * [PARQUET-827] - [C++] Incorporate addition of arrow::MemoryPool::Reallocate 
-    * [PARQUET-502] - Scanner segfaults when its batch size is smaller than the number of rows 
-    * [PARQUET-469] - Roll back Thrift bindings to 0.9.0 
-    * [PARQUET-889] - Fix compilation when PARQUET_USE_SSE is on 
-    * [PARQUET-888] - C++ Memory leak in RowGroupSerializer 
-    * [PARQUET-819] - C++: Trying to install non-existing parquet/arrow/utils.h 
-    * [PARQUET-736] - XCode 8.0 breaks builds 
-    * [PARQUET-505] - Column reader: automatically handle large data pages 
-    * [PARQUET-615] - C++: Building static or shared libparquet should not be mutually exclusive 
-    * [PARQUET-658] - ColumnReader has no virtual destructor 
-    * [PARQUET-799] - concurrent usage of the file reader API 
-    * [PARQUET-513] - Valgrind errors are not failing the Travis CI build 
-    * [PARQUET-841] - [C++] Writing wrong format version when using ParquetVersion::PARQUET_1_0 
-    * [PARQUET-742] - Add missing license headers 
-    * [PARQUET-741] - compression_buffer_ is reused although it shouldn't 
-    * [PARQUET-700] - C++: Disable dictionary encoding for boolean columns 
-    * [PARQUET-662] - [C++] ParquetException must be explicitly exported in dynamic libraries 
-    * [PARQUET-704] - [C++] scan-all.h is not being installed 
-    * [PARQUET-865] - C++: Pass all CXXFLAGS to Thrift ExternalProject 
-    * [PARQUET-875] - [C++] Fix coveralls build given changes to thirdparty build procedure 
-    * [PARQUET-709] - [C++] Fix conda dev binary builds 
-    * [PARQUET-638] - [C++] Revert static linking of libstdc++ in conda builds until symbol visibility addressed 
-    * [PARQUET-606] - Travis coverage is broken 
-    * [PARQUET-880] - [CPP] Prevent destructors from throwing 
-    * [PARQUET-886] - [C++] Revise build documentation and requirements in README.md 
-    * [PARQUET-900] - C++: Fix NOTICE / LICENSE issues 
-    * [PARQUET-885] - [C++] Do not search for Thrift in default system paths 
-    * [PARQUET-879] - C++: ExternalProject compilation for Thrift fails on older CMake versions 
-    * [PARQUET-635] - [C++] Statically link libstdc++ on Linux in conda recipe 
-    * [PARQUET-710] - Remove unneeded private member variables from RowGroupReader ABI 
-    * [PARQUET-766] - C++: Expose ParquetFileReader through Arrow reader as const 
-    * [PARQUET-876] - C++: Correct snapshot version 
-    * [PARQUET-821] - [C++] zlib download link is broken 
-    * [PARQUET-818] - [C++] Refactor library to share IO, Buffer, and memory management abstractions with Apache Arrow 
-    * [PARQUET-537] - LocalFileSource leaks resources 
-    * [PARQUET-764] - [CPP] Parquet Writer does not write Boolean values correctly 
-    * [PARQUET-812] - [C++] Failure reading BYTE_ARRAY data from file in parquet-compatibility project 
-    * [PARQUET-759] - Cannot store columns consisting of empty strings 
-    * [PARQUET-846] - [CPP] CpuInfo::Init() is not thread safe 
-    * [PARQUET-694] - C++: Revert default data page size back to 1M 
-    * [PARQUET-842] - [C++] Impala rejects DOUBLE columns if decimal metadata is set 
-    * [PARQUET-708] - [C++] RleEncoder does not account for "worst case scenario" in MaxBufferSize for bit_width > 1 
-    * [PARQUET-639] - Do not export DCHECK in public headers 
-    * [PARQUET-828] - [C++] "version" field set improperly in file metadata 
-    * [PARQUET-891] - [C++] Do not search for Snappy in default system paths 
-    * [PARQUET-626] - Fix builds due to unavailable llvm.org apt mirror 
-    * [PARQUET-629] - RowGroupSerializer should only close itself once 
-    * [PARQUET-472] - Clean up InputStream ownership semantics in ColumnReader 
-    * [PARQUET-739] - Rle-decoding uses static buffer that is shared accross threads 
-    * [PARQUET-561] - ParquetFileReader::Contents PIMPL missing a virtual destructor 
-    * [PARQUET-892] - [C++] Clean up link library targets in CMake files 
-    * [PARQUET-454] - Address inconsistencies in boolean decoding 
-    * [PARQUET-816] - [C++] Failure decoding sample dict-encoded file from parquet-compatibility project 
-    * [PARQUET-565] - Use PATH instead of DIRECTORY in get_filename_component to support CMake<2.8.12 
-    * [PARQUET-446] - Hide thrift dependency in parquet-cpp 
-    * [PARQUET-843] - [C++] Impala unable to read files created by parquet-cpp 
-    * [PARQUET-555] - Dictionary page metadata handling inconsistencies 
-    * [PARQUET-908] - Fix for PARQUET-890 introduces undefined symbol in libparquet_arrow.so 
-    * [PARQUET-793] - [CPP] Do not return incorrect statistics 
-    * [PARQUET-887] - C++: Fix issues in release scripts arise in RC1 
- 
-## Improvement 
-    * [PARQUET-277] - Remove boost dependency 
-    * [PARQUET-500] - Enable coveralls.io for apache/parquet-cpp 
-    * [PARQUET-497] - Decouple Parquet physical file structure from FileReader class 
-    * [PARQUET-597] - Add data rates to benchmark output 
-    * [PARQUET-522] - #include cleanup with include-what-you-use 
-    * [PARQUET-515] - Add "Reset" to LevelEncoder and LevelDecoder 
-    * [PARQUET-514] - Automate coveralls.io updates in Travis CI 
-    * [PARQUET-551] - Handle compiler warnings due to disabled DCHECKs in release builds 
-    * [PARQUET-559] - Enable InputStream as a source to the ParquetFileReader 
-    * [PARQUET-562] - Simplified ZSH support in build scripts 
-    * [PARQUET-538] - Improve ColumnReader Tests 
-    * [PARQUET-541] - Portable build scripts 
-    * [PARQUET-724] - Test more advanced properties setting 
-    * [PARQUET-641] - Instantiate stringstream only if needed in SerializedPageReader::NextPage 
-    * [PARQUET-636] - Expose selection for different encodings 
-    * [PARQUET-603] - Implement missing information in schema descriptor 
-    * [PARQUET-610] - Print ColumnMetaData for each RowGroup 
-    * [PARQUET-600] - Add benchmarks for RLE-Level encoding 
-    * [PARQUET-592] - Support compressed writes 
-    * [PARQUET-593] - Add API for writing Page statistics 
-    * [PARQUET-589] - Implement Chunked InMemoryInputStream for better memory usage 
-    * [PARQUET-587] - Implement BufferReader::Read(int64_t,uint8_t*) 
-    * [PARQUET-616] - C++: WriteBatch should accept const arrays 
-    * [PARQUET-630] - C++: Support link flags for older CMake versions 
-    * [PARQUET-634] - Consistent private linking of dependencies 
-    * [PARQUET-633] - Add version to WriterProperties 
-    * [PARQUET-625] - Improve RLE read performance 
-    * [PARQUET-737] - Use absolute namespace in macros 
-    * [PARQUET-762] - C++: Use optimistic allocation instead of Arrow Builders 
-    * [PARQUET-773] - C++: Check licenses with RAT in CI 
-    * [PARQUET-687] - C++: Switch to PLAIN encoding if dictionary grows too large 
-    * [PARQUET-784] - C++: Reference Spark, Kudu and FrameOfReference in LICENSE 
-    * [PARQUET-809] - [C++] Add API to determine if two files' schemas are compatible 
-    * [PARQUET-778] - Standardize the schema output to match the parquet-mr format 
-    * [PARQUET-463] - Add DCHECK* macros for assertions in debug builds 
-    * [PARQUET-471] - Use the same environment setup script for Travis CI as local sandbox development 
-    * [PARQUET-449] - Update to latest parquet.thrift 
-    * [PARQUET-496] - Fix cpplint configuration to be more restrictive 
-    * [PARQUET-468] - Add a cmake option to generate the Parquet thrift headers with the thriftc in the environment 
-    * [PARQUET-482] - Organize src code file structure to have a very clear folder with public headers. 
-    * [PARQUET-591] - Page size estimation during writes 
-    * [PARQUET-518] - Review usages of size_t and unsigned integers generally per Google style guide 
-    * [PARQUET-533] - Simplify RandomAccessSource API to combine Seek/Read 
-    * [PARQUET-767] - Add release scripts for parquet-cpp 
-    * [PARQUET-699] - Update parquet.thrift from https://github.com/apache/parquet-format 
-    * [PARQUET-653] - [C++] Re-enable -static-libstdc++ in dev artifact builds 
-    * [PARQUET-763] - C++: Expose ParquetFileReader through Arrow reader 
-    * [PARQUET-857] - [C++] Flatten parquet/encodings directory 
-    * [PARQUET-862] - Provide defaut cache size values if CPU info probing is not available 
-    * [PARQUET-689] - C++: Compress DataPages eagerly 
-    * [PARQUET-874] - [C++] Use default memory allocator from Arrow 
-    * [PARQUET-267] - Detach thirdparty code from build configuration. 
-    * [PARQUET-418] - Add a utility to print contents of a Parquet file to stdout 
-    * [PARQUET-519] - Disable compiler warning supressions and fix all DEBUG build warnings 
-    * [PARQUET-447] - Add Debug and Release build types and associated compiler flags 
-    * [PARQUET-868] - C++: Build snappy with optimizations 
-    * [PARQUET-894] - Fix compilation warning 
-    * [PARQUET-883] - C++: Support non-standard gcc version strings 
-    * [PARQUET-607] - Public Writer header 
-    * [PARQUET-731] - [CPP] Add API to return metadata size and Skip reading values 
-    * [PARQUET-628] - Link thrift privately 
-    * [PARQUET-877] - C++: Update Arrow Hash, update Version in metadata. 
-    * [PARQUET-547] - Refactor most templates to use DataType structs rather than the Type::type enum 
-    * [PARQUET-882] - [CPP] Improve Application Version parsing 
-    * [PARQUET-448] - Add cmake option to skip building the unit tests 
-    * [PARQUET-721] - Performance benchmarks for reading into Arrow structures 
-    * [PARQUET-820] - C++: Decoders should directly emit arrays with spacing for null entries 
-    * [PARQUET-813] - C++: Build dependencies using CMake External project 
-    * [PARQUET-488] - Add SSE-related cmake options to manage compiler flags 
-    * [PARQUET-564] - Add option to run unit tests with valgrind --tool=memcheck 
-    * [PARQUET-572] - Rename parquet_cpp namespace to parquet 
-    * [PARQUET-829] - C++: Make use of ARROW-469 
-    * [PARQUET-501] - Add an OutputStream abstraction (capable of memory allocation) for Encoder public API 
-    * [PARQUET-744] - Clarifications on build instructions 
-    * [PARQUET-520] - Add version of LocalFileSource that uses memory-mapping for zero-copy reads 
-    * [PARQUET-556] - Extend RowGroupStatistics to include "min" "max" statistics 
-    * [PARQUET-671] - Improve performance of RLE/bit-packed decoding in parquet-cpp 
-    * [PARQUET-681] - Add tool to scan a parquet file 
- 
-## New Feature 
-    * [PARQUET-499] - Complete PlainEncoder implementation for all primitive types and test end to end 
-    * [PARQUET-439] - Conform all copyright headers to ASF requirements 
-    * [PARQUET-436] - Implement ParquetFileWriter class entry point for generating new Parquet files 
-    * [PARQUET-435] - Provide vectorized ColumnReader interface 
-    * [PARQUET-438] - Update RLE encoder/decoder modules from Impala upstream changes and adapt unit tests 
-    * [PARQUET-512] - Add optional google/benchmark 3rd-party dependency for performance testing 
-    * [PARQUET-566] - Add method to retrieve the full column path 
-    * [PARQUET-613] - C++: Add conda packaging recipe 
-    * [PARQUET-605] - Expose schema node in ColumnDescriptor 
-    * [PARQUET-619] - C++: Add OutputStream for local files 
-    * [PARQUET-583] - Implement Parquet to Thrift schema conversion 
-    * [PARQUET-582] - Conversion functions for Parquet enums to Thrift enums 
-    * [PARQUET-728] - [C++] Bring parquet::arrow up to date with API changes in arrow::io 
-    * [PARQUET-752] - [C++] Conform parquet_arrow to upstream API changes 
-    * [PARQUET-788] - [C++] Reference Impala / Apache Impala (incubating) in LICENSE 
-    * [PARQUET-808] - [C++] Add API to read file given externally-provided FileMetadata 
-    * [PARQUET-807] - [C++] Add API to read file metadata only from a file handle 
-    * [PARQUET-805] - C++: Read Int96 into Arrow Timestamp(ns) 
-    * [PARQUET-836] - [C++] Add column selection to parquet::arrow::FileReader 
-    * [PARQUET-835] - [C++] Add option to parquet::arrow to read columns in parallel using a thread pool 
-    * [PARQUET-830] - [C++] Add additional configuration options to parquet::arrow::OpenFIle 
-    * [PARQUET-769] - C++: Add support for Brotli Compression 
-    * [PARQUET-489] - Add visibility macros to be used for public and internal APIs of libparquet 
-    * [PARQUET-542] - Support memory allocation from external memory 
-    * [PARQUET-844] - [C++] Consolidate encodings, schema, and compression subdirectories into fewer files 
-    * [PARQUET-848] - [C++] Consolidate libparquet_thrift subcomponent 
-    * [PARQUET-646] - [C++] Enable easier 3rd-party toolchain clang builds on Linux 
-    * [PARQUET-598] - [C++] Test writing all primitive data types 
-    * [PARQUET-442] - Convert flat SchemaElement vector to implied nested schema data structure 
-    * [PARQUET-867] - [C++] Support writing sliced Arrow arrays 
-    * [PARQUET-456] - Add zlib codec support 
-    * [PARQUET-834] - C++: Support r/w of arrow::ListArray 
-    * [PARQUET-485] - Decouple data page delimiting from column reader / scanner classes, create test fixtures 
-    * [PARQUET-434] - Add a ParquetFileReader class to encapsulate some low-level details of interacting with Parquet files 
-    * [PARQUET-666] - PLAIN_DICTIONARY write support 
-    * [PARQUET-437] - Incorporate googletest thirdparty dependency and add cmake tools (ADD_PARQUET_TEST) to simplify adding new unit tests 
-    * [PARQUET-866] - [C++] Account for API changes in ARROW-33 
-    * [PARQUET-545] - Improve API to support Decimal type 
-    * [PARQUET-579] - Add API for writing Column statistics 
-    * [PARQUET-494] - Implement PLAIN_DICTIONARY encoding and decoding 
-    * [PARQUET-618] - C++: Automatically upload conda build artifacts on commits to master 
-    * [PARQUET-833] - C++: Provide API to write spaced arrays (e.g. Arrow) 
-    * [PARQUET-903] - C++: Add option to set RPATH to ORIGIN 
-    * [PARQUET-451] - Add a RowGroup reader interface class 
-    * [PARQUET-785] - C++: List conversion for Arrow Schemas 
-    * [PARQUET-712] - C++: Read into Arrow memory 
-    * [PARQUET-890] - C++: Support I/O of DATE columns in parquet_arrow 
-    * [PARQUET-782] - C++: Support writing to Arrow sinks 
-    * [PARQUET-849] - [C++] Upgrade default Thrift in thirdparty toolchain to 0.9.3 or 0.10 
-    * [PARQUET-573] - C++: Create a public API for reading and writing file metadata 
- 
-## Task 
-    * [PARQUET-814] - C++: Remove Conda recipes 
-    * [PARQUET-503] - Re-enable parquet 2.0 encodings 
-    * [PARQUET-169] - Parquet-cpp: Implement support for bulk reading and writing repetition/definition levels. 
-    * [PARQUET-878] - C++: Remove setup_build_env from rc-verification script 
-    * [PARQUET-881] - C++: Update Arrow hash to 0.2.0-rc2 
-    * [PARQUET-771] - C++: Sync KEYS file 
-    * [PARQUET-901] - C++: Publish RCs in apache-parquet-VERSION in SVN 
- 
-## Test 
-    * [PARQUET-525] - Test coverage for malformed file failure modes on the read path 
-    * [PARQUET-703] - [C++] Validate num_values metadata for columns with nulls 
-    * [PARQUET-507] - Improve runtime of rle-test.cc 
-    * [PARQUET-549] - Add scanner and column reader tests for dictionary data pages 
-    * [PARQUET-457] - Add compressed data page unit tests 
+Parquet C++ 1.5.0
+--------------------------------------------------------------------------------
+## Bug
+    * [PARQUET-979] - [C++] Limit size of min, max or disable stats for long binary types
+    * [PARQUET-1071] - [C++] parquet::arrow::FileWriter::Close is not idempotent
+    * [PARQUET-1349] - [C++] PARQUET_RPATH_ORIGIN is not picked by the build
+    * [PARQUET-1334] - [C++] memory_map parameter seems missleading in parquet file opener
+    * [PARQUET-1333] - [C++] Reading of files with dictionary size 0 fails on Windows with bad_alloc
+    * [PARQUET-1283] - [C++] FormatStatValue appends trailing space to string and int96
+    * [PARQUET-1270] - [C++] Executable tools do not get installed
+    * [PARQUET-1272] - [C++] ScanFileContents reports wrong row count for nested columns
+    * [PARQUET-1268] - [C++] Conversion of Arrow null list columns fails
+    * [PARQUET-1255] - [C++] Exceptions thrown in some tests
+    * [PARQUET-1358] - [C++] index_page_offset should be unset as it is not supported.
+    * [PARQUET-1357] - [C++] FormatStatValue truncates binary statistics on zero character
+    * [PARQUET-1319] - [C++] Pass BISON_EXECUTABLE to Thrift EP for MacOS
+    * [PARQUET-1313] - [C++] Compilation failure with VS2017
+    * [PARQUET-1315] - [C++] ColumnChunkMetaData.has_dictionary_page() should return bool, not int64_t
+    * [PARQUET-1307] - [C++] memory-test fails with latest Arrow
+    * [PARQUET-1274] - [Python] SegFault in pyarrow.parquet.write_table with specific options
+    * [PARQUET-1209] - locally defined symbol ... imported in function ..
+    * [PARQUET-1245] - [C++] Segfault when writing Arrow table with duplicate columns
+    * [PARQUET-1273] - [Python] Error writing to partitioned Parquet dataset
+    * [PARQUET-1384] - [C++] Clang compiler warnings in bloom_filter-test.cc
+
+## Improvement
+    * [PARQUET-1348] - [C++] Allow Arrow FileWriter To Write FileMetaData
+    * [PARQUET-1346] - [C++] Protect against null values data in empty Arrow array
+    * [PARQUET-1340] - [C++] Fix Travis Ci valgrind errors related to std::random_device
+    * [PARQUET-1323] - [C++] Fix compiler warnings with clang-6.0
+    * [PARQUET-1279] - Use ASSERT_NO_FATAIL_FAILURE in C++ unit tests
+    * [PARQUET-1262] - [C++] Use the same BOOST_ROOT and Boost_NAMESPACE for Thrift
+    * [PARQUET-1267] - replace "unsafe" std::equal by std::memcmp
+    * [PARQUET-1360] - [C++] Minor API + style changes follow up to PARQUET-1348
+    * [PARQUET-1166] - [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h
+    * [PARQUET-1378] - [c++] Allow RowGroups with zero rows to be written
+    * [PARQUET-1256] - [C++] Add --print-key-value-metadata option to parquet_reader tool
+    * [PARQUET-1276] - [C++] Reduce the amount of memory used for writing null decimal values
+
+## New Feature
+    * [PARQUET-1392] - [C++] Supply row group indices to parquet::arrow::FileReader::ReadTable
+
+## Sub-task
+    * [PARQUET-1227] - Thrift crypto metadata structures
+    * [PARQUET-1332] - [C++] Add bloom filter utility class
+
+## Task
+    * [PARQUET-1350] - [C++] Use abstract ResizableBuffer instead of concrete PoolBuffer
+    * [PARQUET-1366] - [C++] Streamline use of Arrow bit-util.h
+    * [PARQUET-1308] - [C++] parquet::arrow should use thread pool, not ParallelFor
+    * [PARQUET-1382] - [C++] Prepare for arrow::test namespace removal
+    * [PARQUET-1372] - [C++] Add an API to allow writing RowGroups based on their size rather than num_rows
+
+
+Parquet C++ 1.4.0
+--------------------------------------------------------------------------------
+## Bug
+    * [PARQUET-1193] - [CPP] Implement ColumnOrder to support min_value and max_value
+    * [PARQUET-1180] - C++: Fix behaviour of num_children element of primitive nodes
+    * [PARQUET-1146] - C++: Add macOS-compatible sha512sum call to release verify script
+    * [PARQUET-1167] - [C++] FieldToNode function should return a status when throwing an exception
+    * [PARQUET-1175] - [C++] Fix usage of deprecated Arrow API
+    * [PARQUET-1113] - [C++] Incorporate fix from ARROW-1601 on bitmap read path
+    * [PARQUET-1111] - dev/release/verify-release-candidate has stale help
+    * [PARQUET-1109] - C++: Update release verification script to SHA512
+    * [PARQUET-1179] - [C++] Support Apache Thrift 0.11
+    * [PARQUET-1226] - [C++] Fix new build warnings with clang 5.0
+    * [PARQUET-1233] - [CPP ]Enable option to switch between stl classes and boost classes for thrift header
+    * [PARQUET-1205] - Fix msvc static build
+    * [PARQUET-1210] - [C++] Boost 1.66 compilation fails on Windows on linkage stage
+
+## Improvement
+    * [PARQUET-1092] - [C++] Write Arrow tables with chunked columns
+    * [PARQUET-1086] - [C++] Remove usage of arrow/util/compiler-util.h after 1.3.0 release
+    * [PARQUET-1097] - [C++] Account for Arrow API deprecation in ARROW-1511
+    * [PARQUET-1150] - C++: Hide statically linked boost symbols
+    * [PARQUET-1151] - [C++] Add build options / configuration to use static runtime libraries with MSVC
+    * [PARQUET-1147] - [C++] Account for API deprecation / change in ARROW-1671
+    * [PARQUET-1162] - C++: Update dev/README after migration to Gitbox
+    * [PARQUET-1165] - [C++] Pin clang-format version to 4.0
+    * [PARQUET-1164] - [C++] Follow API changes in ARROW-1808
+    * [PARQUET-1177] - [C++] Add more extensive compiler warnings when using Clang
+    * [PARQUET-1110] - [C++] Release verification script for Windows
+    * [PARQUET-859] - [C++] Flatten parquet/file directory
+    * [PARQUET-1220] - [C++] Don't build Thrift examples and tutorials in the ExternalProject
+    * [PARQUET-1219] - [C++] Update release-candidate script links to gitbox
+    * [PARQUET-1196] - [C++] Provide a parquet_arrow example project incl. CMake setup
+    * [PARQUET-1200] - [C++] Support reading a single Arrow column from a Parquet file
+
+## New Feature
+    * [PARQUET-1095] - [C++] Read and write Arrow decimal values
+    * [PARQUET-970] - Add Add Lz4 and Zstd compression codecs
+
+## Task
+    * [PARQUET-1221] - [C++] Extend release README
+    * [PARQUET-1225] - NaN values may lead to incorrect filtering under certain circumstances
+
+
+Parquet C++ 1.3.1
+--------------------------------------------------------------------------------
+## Bug
+    * [PARQUET-1105] - [CPP] Remove libboost_system dependency
+    * [PARQUET-1138] - [C++] Fix compilation with Arrow 0.7.1
+    * [PARQUET-1123] - [C++] Update parquet-cpp to use Arrow's AssertArraysEqual
+    * [PARQUET-1121] - C++: DictionaryArrays of NullType cannot be written
+    * [PARQUET-1139] - Add license to cmake_modules/parquet-cppConfig.cmake.in
+
+## Improvement
+    * [PARQUET-1140] - [C++] Fail on RAT errors in CI
+    * [PARQUET-1070] - Add CPack support to the build
+
+
+Parquet C++ 1.3.0
+--------------------------------------------------------------------------------
+## Bug
+    * [PARQUET-1098] - [C++] Install new header in parquet/util
+    * [PARQUET-1085] - [C++] Backwards compatibility from macro cleanup in transitive dependencies in ARROW-1452
+    * [PARQUET-1074] - [C++] Switch to long key ids in KEYs file
+    * [PARQUET-1075] - C++: Coverage upload is broken
+    * [PARQUET-1088] - [CPP] remove parquet_version.h from version control since it gets auto generated
+    * [PARQUET-1002] - [C++] Compute statistics based on Logical Types
+    * [PARQUET-1100] - [C++] Reading repeated types should decode number of records rather than number of values
+    * [PARQUET-1090] - [C++] Fix int32 overflow in Arrow table writer, add max row group size property
+    * [PARQUET-1108] - [C++] Fix Int96 comparators
+
+## Improvement
+    * [PARQUET-1104] - [C++] Upgrade to Apache Arrow 0.7.0 RC0
+    * [PARQUET-1072] - [C++] Add ARROW_NO_DEPRECATED_API to CI to check for deprecated API use
+    * [PARQUET-1096] - C++: Update sha{1, 256, 512} checksums per latest ASF release policy
+    * [PARQUET-1079] - [C++] Account for Arrow API change in ARROW-1335
+    * [PARQUET-1087] - [C++] Add wrapper for ScanFileContents in parquet::arrow that catches exceptions
+    * [PARQUET-1093] - C++: Improve Arrow level generation error message
+    * [PARQUET-1094] - C++: Add benchmark for boolean Arrow column I/O
+    * [PARQUET-1083] - [C++] Refactor core logic in parquet-scan.cc so that it can be used as a library function for benchmarking
+    * [PARQUET-1037] - Allow final RowGroup to be unfilled
+
+## New Feature
+    * [PARQUET-1078] - [C++] Add Arrow writer option to coerce timestamps to milliseconds or microseconds
+    * [PARQUET-929] - [C++] Handle arrow::DictionaryArray when writing Arrow data
+
+
+Parquet C++ 1.2.0
+--------------------------------------------------------------------------------
+## Bug
+    * [PARQUET-1029] - [C++] TypedColumnReader/TypeColumnWriter symbols are no longer being exported
+    * [PARQUET-997] - Fix override compiler warnings
+    * [PARQUET-1033] - Mismatched Read and Write
+    * [PARQUET-1007] - [C++ ] Update parquet.thrift from https://github.com/apache/parquet-format
+    * [PARQUET-1039] - PARQUET-911 Breaks Arrow
+    * [PARQUET-1038] - Key value metadata should be nullptr if not set
+    * [PARQUET-1018] - [C++] parquet.dll has runtime dependencies on one or more libraries in the build toolchain
+    * [PARQUET-1003] - [C++] Modify DEFAULT_CREATED_BY value for every new release version
+    * [PARQUET-1004] - CPP Building fails on windows
+    * [PARQUET-1040] - Missing writer method implementations
+    * [PARQUET-1054] - [C++] Account for Arrow API changes in ARROW-1199
+    * [PARQUET-1042] - C++: Compilation breaks on GCC 4.8
+    * [PARQUET-1048] - [C++] Static linking of libarrow is no longer supported
+    * [PARQUET-1013] - Fix ZLIB_INCLUDE_DIR
+    * [PARQUET-998] - C++: Release script is not usable
+    * [PARQUET-1023] - [C++] Brotli libraries are not being statically linked on Windows
+    * [PARQUET-1000] - [C++] Do not build thirdparty Arrow with /WX on MSVC
+    * [PARQUET-1052] - [C++] add_compiler_export_flags() throws warning with CMake >= 3.3
+    * [PARQUET-1069] - C++: ./dev/release/verify-release-candidate is broken due to missing Arrow dependencies
+
+## Improvement
+    * [PARQUET-996] - Improve MSVC build - ThirdpartyToolchain - Arrow
+    * [PARQUET-911] - C++: Support nested structs in parquet_arrow
+    * [PARQUET-986] - Improve MSVC build - ThirdpartyToolchain - Thrift
+    * [PARQUET-864] - [C++] Consolidate non-Parquet-specific bit utility code into Apache Arrow
+    * [PARQUET-1043] - [C++] Raise minimum supported CMake version to 3.2
+    * [PARQUET-1016] - Upgrade thirdparty Arrow to 0.4.0
+    * [PARQUET-858] - [C++] Flatten parquet/column directory, consolidate related code
+    * [PARQUET-978] - [C++] Minimizing footer reads for small(ish) metadata
+    * [PARQUET-991] - [C++] Fix compiler warnings on MSVC and build with /WX in Appveyor
+    * [PARQUET-863] - [C++] Move SIMD, CPU info, hashing, and other generic utilities into Apache Arrow
+    * [PARQUET-1053] - Fix unused result warnings due to unchecked Statuses
+    * [PARQUET-1067] - C++: Update arrow hash to 0.5.0
+    * [PARQUET-1041] - C++: Support Arrow's NullArray
+    * [PARQUET-1008] - Update TypedColumnReader::ReadBatch method to accept batch_size as int64_t
+    * [PARQUET-1044] - [C++] Use compression libraries from Apache Arrow
+    * [PARQUET-999] - Improve MSVC build - Enable PARQUET_BUILD_BENCHMARKS
+    * [PARQUET-967] - [C++] Combine libparquet/libparquet_arrow libraries
+    * [PARQUET-1045] - [C++] Refactor to account for computational utility code migration in ARROW-1154
+
+## New Feature
+    * [PARQUET-1035] - Write Int96 from Arrow Timestamp(ns)
+
+## Task
+    * [PARQUET-994] - C++: release-candidate script should not push to master
+    * [PARQUET-902] - [C++] Move compressor interfaces into Apache Arrow
+
+## Test
+    * [PARQUET-706] - [C++] Create test case that uses libparquet as a 3rd party library
+
+
+Parquet C++ 1.1.0
+--------------------------------------------------------------------------------
+## Bug
+    * [PARQUET-898] - [C++] Change Travis CI OS X image to Xcode 6.4 and fix our thirdparty build
+    * [PARQUET-976] - [C++] Pass unit test suite with MSVC, build in Appveyor
+    * [PARQUET-963] - [C++] Disallow reading struct types in Arrow reader for now
+    * [PARQUET-959] - [C++] Arrow thirdparty build fails on multiarch systems
+    * [PARQUET-962] - [C++] GTEST_MAIN_STATIC_LIB is not defined in FindGTest.cmake
+    * [PARQUET-958] - [C++] Print Parquet metadata in JSON format
+    * [PARQUET-956] - C++: BUILD_BYPRODUCTS not specified anymore for gtest
+    * [PARQUET-948] - [C++] Account for API changes in ARROW-782
+    * [PARQUET-947] - [C++] Refactor to account for ARROW-795 Arrow core library consolidation
+    * [PARQUET-965] - [C++] FIXED_LEN_BYTE_ARRAY types are unhandled in the Arrow reader
+    * [PARQUET-949] - [C++] Arrow version pinning seems to not be working properly
+    * [PARQUET-955] - [C++] pkg_check_modules will override $ARROW_HOME if it is set in the environment
+    * [PARQUET-945] - [C++] Thrift static libraries are not used with recent patch
+    * [PARQUET-943] - [C++] Overflow build error on x86
+    * [PARQUET-938] - [C++] There is a typo in cmake_modules/FindSnappy.cmake comment
+    * [PARQUET-936] - [C++] parquet::arrow::WriteTable can enter infinite loop if chunk_size is 0
+    * [PARQUET-981] - Repair usage of *_HOME 3rd party dependencies environment variables during Windows build
+    * [PARQUET-992] - [C++] parquet/compression.h leaks zlib.h
+    * [PARQUET-987] - [C++] Fix regressions caused by PARQUET-981
+    * [PARQUET-933] - [C++] Account for Arrow Table API changes coming in ARROW-728
+    * [PARQUET-915] - Support Arrow Time Types in Schema
+    * [PARQUET-914] - [C++] Throw more informative exception when user writes too many values to a column in a row group
+    * [PARQUET-923] - [C++] Account for Time metadata changes in ARROW-686
+    * [PARQUET-918] - FromParquetSchema API crashes on nested schemas
+    * [PARQUET-925] - [C++] FindArrow.cmake sets the wrong library path after ARROW-648
+    * [PARQUET-932] - [c++] Add option to build parquet library with minimal dependency
+    * [PARQUET-919] - [C++] Account for API changes in ARROW-683
+    * [PARQUET-995] - [C++] Int96 reader in parquet_arrow uses size of Int96Type instead of Int96
+
+## Improvement
+    * [PARQUET-508] - Add ParquetFilePrinter
+    * [PARQUET-595] - Add API for key-value metadata
+    * [PARQUET-897] - [C++] Only use designated public headers from libarrow
+    * [PARQUET-679] - [C++] Build and unit tests support for MSVC on Windows
+    * [PARQUET-977] - Improve MSVC build
+    * [PARQUET-957] - [C++] Add optional $PARQUET_BUILD_TOOLCHAIN environment variable option for configuring build environment
+    * [PARQUET-961] - [C++] Strip debug symbols from libparquet libraries in release builds by default
+    * [PARQUET-954] - C++: Use Brolti 0.6 release
+    * [PARQUET-953] - [C++] Change arrow::FileWriter API to be initialized from a Schema, and provide for writing multiple tables
+    * [PARQUET-941] - [C++] Stop needless Boost static library detection for CentOS 7 support
+    * [PARQUET-942] - [C++] Fix wrong variabe use in FindSnappy
+    * [PARQUET-939] - [C++] Support Thrift_HOME CMake variable like FindSnappy does as Snappy_HOME
+    * [PARQUET-940] - [C++] Fix Arrow library path detection
+    * [PARQUET-937] - [C++] Support CMake < 3.4 again for Arrow detection
+    * [PARQUET-935] - [C++] Set shared library version for .deb packages
+    * [PARQUET-934] - [C++] Support multiarch on Debian
+    * [PARQUET-984] - C++: Add abi and so version to pkg-config
+    * [PARQUET-983] - C++: Update Thirdparty hash to Arrow 0.3.0
+    * [PARQUET-989] - [C++] Link dynamically to libarrow in toolchain build, set LD_LIBRARY_PATH
+    * [PARQUET-988] - [C++] Add Linux toolchain-based build to Travis CI
+    * [PARQUET-928] - [C++] Support pkg-config
+    * [PARQUET-927] - [C++] Specify shared library version of Apache Arrow
+    * [PARQUET-931] - [C++] Add option to pin thirdparty Arrow version used in ExternalProject
+    * [PARQUET-926] - [C++] Use pkg-config to find Apache Arrow
+    * [PARQUET-917] - C++: Build parquet_arrow by default
+    * [PARQUET-910] - C++: Support TIME logical type in parquet_arrow
+    * [PARQUET-909] - [CPP]: Reduce buffer allocations (mallocs) on critical path
+
+## New Feature
+    * [PARQUET-853] - [C++] Add option to link with shared boost libraries when building Arrow in the thirdparty toolchain
+    * [PARQUET-946] - [C++] Refactoring in parquet::arrow::FileReader to be able to read a single row group
+    * [PARQUET-930] - [C++] Account for all Arrow date/time types
+
+
+Parquet C++ 1.0.0
+--------------------------------------------------------------------------------
+## Bug
+    * [PARQUET-455] - Fix compiler warnings on OS X / Clang
+    * [PARQUET-558] - Support ZSH in build scripts
+    * [PARQUET-720] - Parquet-cpp fails to link when included in multiple TUs
+    * [PARQUET-718] - Reading boolean pages written by parquet-cpp fails
+    * [PARQUET-640] - [C++] Force the use of gcc 4.9 in conda builds
+    * [PARQUET-643] - Add const modifier to schema pointer reference in ParquetFileWriter
+    * [PARQUET-672] - [C++] Build testing conda artifacts in debug mode
+    * [PARQUET-661] - [C++] Do not assume that perl is found in /usr/bin
+    * [PARQUET-659] - [C++] Instantiated template visibility is broken on clang / OS X
+    * [PARQUET-657] - [C++] Don't define DISALLOW_COPY_AND_ASSIGN if already defined
+    * [PARQUET-656] - [C++] Revert PARQUET-653
+    * [PARQUET-676] - MAX_VALUES_PER_LITERAL_RUN causes RLE encoding failure
+    * [PARQUET-614] - C++: Remove unneeded LZ4-related code
+    * [PARQUET-604] - Install writer.h headers
+    * [PARQUET-621] - C++: Uninitialised DecimalMetadata is read
+    * [PARQUET-620] - C++: Duplicate calls to ParquetFileWriter::Close cause duplicate metdata writes
+    * [PARQUET-599] - ColumnWriter::RleEncodeLevels' size estimation might be wrong
+    * [PARQUET-617] - C++: Enable conda build to work on systems with non-default C++ toolchains
+    * [PARQUET-627] - Ensure that thrift headers are generated before source compilation
+    * [PARQUET-745] - TypedRowGroupStatistics fails to PlainDecode min and max in ByteArrayType
+    * [PARQUET-738] - Update arrow version that also supports newer Xcode
+    * [PARQUET-747] - [C++] TypedRowGroupStatistics are not being exported in libparquet.so
+    * [PARQUET-711] - Use metadata builders in parquet writer
+    * [PARQUET-732] - Building a subset of dependencies does not work
+    * [PARQUET-760] - On switching from dictionary to the fallback encoding, an incorrect encoding is set
+    * [PARQUET-691] - [C++] Write ColumnChunk metadata after each column chunk in the file
+    * [PARQUET-797] - [C++] Update for API changes in ARROW-418
+    * [PARQUET-837] - [C++] SerializedFile::ParseMetaData uses Seek, followed by Read, and could have race conditions
+    * [PARQUET-827] - [C++] Incorporate addition of arrow::MemoryPool::Reallocate
+    * [PARQUET-502] - Scanner segfaults when its batch size is smaller than the number of rows
+    * [PARQUET-469] - Roll back Thrift bindings to 0.9.0
+    * [PARQUET-889] - Fix compilation when PARQUET_USE_SSE is on
+    * [PARQUET-888] - C++ Memory leak in RowGroupSerializer
+    * [PARQUET-819] - C++: Trying to install non-existing parquet/arrow/utils.h
+    * [PARQUET-736] - XCode 8.0 breaks builds
+    * [PARQUET-505] - Column reader: automatically handle large data pages
+    * [PARQUET-615] - C++: Building static or shared libparquet should not be mutually exclusive
+    * [PARQUET-658] - ColumnReader has no virtual destructor
+    * [PARQUET-799] - concurrent usage of the file reader API
+    * [PARQUET-513] - Valgrind errors are not failing the Travis CI build
+    * [PARQUET-841] - [C++] Writing wrong format version when using ParquetVersion::PARQUET_1_0
+    * [PARQUET-742] - Add missing license headers
+    * [PARQUET-741] - compression_buffer_ is reused although it shouldn't
+    * [PARQUET-700] - C++: Disable dictionary encoding for boolean columns
+    * [PARQUET-662] - [C++] ParquetException must be explicitly exported in dynamic libraries
+    * [PARQUET-704] - [C++] scan-all.h is not being installed
+    * [PARQUET-865] - C++: Pass all CXXFLAGS to Thrift ExternalProject
+    * [PARQUET-875] - [C++] Fix coveralls build given changes to thirdparty build procedure
+    * [PARQUET-709] - [C++] Fix conda dev binary builds
+    * [PARQUET-638] - [C++] Revert static linking of libstdc++ in conda builds until symbol visibility addressed
+    * [PARQUET-606] - Travis coverage is broken
+    * [PARQUET-880] - [CPP] Prevent destructors from throwing
+    * [PARQUET-886] - [C++] Revise build documentation and requirements in README.md
+    * [PARQUET-900] - C++: Fix NOTICE / LICENSE issues
+    * [PARQUET-885] - [C++] Do not search for Thrift in default system paths
+    * [PARQUET-879] - C++: ExternalProject compilation for Thrift fails on older CMake versions
+    * [PARQUET-635] - [C++] Statically link libstdc++ on Linux in conda recipe
+    * [PARQUET-710] - Remove unneeded private member variables from RowGroupReader ABI
+    * [PARQUET-766] - C++: Expose ParquetFileReader through Arrow reader as const
+    * [PARQUET-876] - C++: Correct snapshot version
+    * [PARQUET-821] - [C++] zlib download link is broken
+    * [PARQUET-818] - [C++] Refactor library to share IO, Buffer, and memory management abstractions with Apache Arrow
+    * [PARQUET-537] - LocalFileSource leaks resources
+    * [PARQUET-764] - [CPP] Parquet Writer does not write Boolean values correctly
+    * [PARQUET-812] - [C++] Failure reading BYTE_ARRAY data from file in parquet-compatibility project
+    * [PARQUET-759] - Cannot store columns consisting of empty strings
+    * [PARQUET-846] - [CPP] CpuInfo::Init() is not thread safe
+    * [PARQUET-694] - C++: Revert default data page size back to 1M
+    * [PARQUET-842] - [C++] Impala rejects DOUBLE columns if decimal metadata is set
+    * [PARQUET-708] - [C++] RleEncoder does not account for "worst case scenario" in MaxBufferSize for bit_width > 1
+    * [PARQUET-639] - Do not export DCHECK in public headers
+    * [PARQUET-828] - [C++] "version" field set improperly in file metadata
+    * [PARQUET-891] - [C++] Do not search for Snappy in default system paths
+    * [PARQUET-626] - Fix builds due to unavailable llvm.org apt mirror
+    * [PARQUET-629] - RowGroupSerializer should only close itself once
+    * [PARQUET-472] - Clean up InputStream ownership semantics in ColumnReader
+    * [PARQUET-739] - Rle-decoding uses static buffer that is shared accross threads
+    * [PARQUET-561] - ParquetFileReader::Contents PIMPL missing a virtual destructor
+    * [PARQUET-892] - [C++] Clean up link library targets in CMake files
+    * [PARQUET-454] - Address inconsistencies in boolean decoding
+    * [PARQUET-816] - [C++] Failure decoding sample dict-encoded file from parquet-compatibility project
+    * [PARQUET-565] - Use PATH instead of DIRECTORY in get_filename_component to support CMake<2.8.12
+    * [PARQUET-446] - Hide thrift dependency in parquet-cpp
+    * [PARQUET-843] - [C++] Impala unable to read files created by parquet-cpp
+    * [PARQUET-555] - Dictionary page metadata handling inconsistencies
+    * [PARQUET-908] - Fix for PARQUET-890 introduces undefined symbol in libparquet_arrow.so
+    * [PARQUET-793] - [CPP] Do not return incorrect statistics
+    * [PARQUET-887] - C++: Fix issues in release scripts arise in RC1
+
+## Improvement
+    * [PARQUET-277] - Remove boost dependency
+    * [PARQUET-500] - Enable coveralls.io for apache/parquet-cpp
+    * [PARQUET-497] - Decouple Parquet physical file structure from FileReader class
+    * [PARQUET-597] - Add data rates to benchmark output
+    * [PARQUET-522] - #include cleanup with include-what-you-use
+    * [PARQUET-515] - Add "Reset" to LevelEncoder and LevelDecoder
+    * [PARQUET-514] - Automate coveralls.io updates in Travis CI
+    * [PARQUET-551] - Handle compiler warnings due to disabled DCHECKs in release builds
+    * [PARQUET-559] - Enable InputStream as a source to the ParquetFileReader
+    * [PARQUET-562] - Simplified ZSH support in build scripts
+    * [PARQUET-538] - Improve ColumnReader Tests
+    * [PARQUET-541] - Portable build scripts
+    * [PARQUET-724] - Test more advanced properties setting
+    * [PARQUET-641] - Instantiate stringstream only if needed in SerializedPageReader::NextPage
+    * [PARQUET-636] - Expose selection for different encodings
+    * [PARQUET-603] - Implement missing information in schema descriptor
+    * [PARQUET-610] - Print ColumnMetaData for each RowGroup
+    * [PARQUET-600] - Add benchmarks for RLE-Level encoding
+    * [PARQUET-592] - Support compressed writes
+    * [PARQUET-593] - Add API for writing Page statistics
+    * [PARQUET-589] - Implement Chunked InMemoryInputStream for better memory usage
+    * [PARQUET-587] - Implement BufferReader::Read(int64_t,uint8_t*)
+    * [PARQUET-616] - C++: WriteBatch should accept const arrays
+    * [PARQUET-630] - C++: Support link flags for older CMake versions
+    * [PARQUET-634] - Consistent private linking of dependencies
+    * [PARQUET-633] - Add version to WriterProperties
+    * [PARQUET-625] - Improve RLE read performance
+    * [PARQUET-737] - Use absolute namespace in macros
+    * [PARQUET-762] - C++: Use optimistic allocation instead of Arrow Builders
+    * [PARQUET-773] - C++: Check licenses with RAT in CI
+    * [PARQUET-687] - C++: Switch to PLAIN encoding if dictionary grows too large
+    * [PARQUET-784] - C++: Reference Spark, Kudu and FrameOfReference in LICENSE
+    * [PARQUET-809] - [C++] Add API to determine if two files' schemas are compatible
+    * [PARQUET-778] - Standardize the schema output to match the parquet-mr format
+    * [PARQUET-463] - Add DCHECK* macros for assertions in debug builds
+    * [PARQUET-471] - Use the same environment setup script for Travis CI as local sandbox development
+    * [PARQUET-449] - Update to latest parquet.thrift
+    * [PARQUET-496] - Fix cpplint configuration to be more restrictive
+    * [PARQUET-468] - Add a cmake option to generate the Parquet thrift headers with the thriftc in the environment
+    * [PARQUET-482] - Organize src code file structure to have a very clear folder with public headers.
+    * [PARQUET-591] - Page size estimation during writes
+    * [PARQUET-518] - Review usages of size_t and unsigned integers generally per Google style guide
+    * [PARQUET-533] - Simplify RandomAccessSource API to combine Seek/Read
+    * [PARQUET-767] - Add release scripts for parquet-cpp
+    * [PARQUET-699] - Update parquet.thrift from https://github.com/apache/parquet-format
+    * [PARQUET-653] - [C++] Re-enable -static-libstdc++ in dev artifact builds
+    * [PARQUET-763] - C++: Expose ParquetFileReader through Arrow reader
+    * [PARQUET-857] - [C++] Flatten parquet/encodings directory
+    * [PARQUET-862] - Provide defaut cache size values if CPU info probing is not available
+    * [PARQUET-689] - C++: Compress DataPages eagerly
+    * [PARQUET-874] - [C++] Use default memory allocator from Arrow
+    * [PARQUET-267] - Detach thirdparty code from build configuration.
+    * [PARQUET-418] - Add a utility to print contents of a Parquet file to stdout
+    * [PARQUET-519] - Disable compiler warning supressions and fix all DEBUG build warnings
+    * [PARQUET-447] - Add Debug and Release build types and associated compiler flags
+    * [PARQUET-868] - C++: Build snappy with optimizations
+    * [PARQUET-894] - Fix compilation warning
+    * [PARQUET-883] - C++: Support non-standard gcc version strings
+    * [PARQUET-607] - Public Writer header
+    * [PARQUET-731] - [CPP] Add API to return metadata size and Skip reading values
+    * [PARQUET-628] - Link thrift privately
+    * [PARQUET-877] - C++: Update Arrow Hash, update Version in metadata.
+    * [PARQUET-547] - Refactor most templates to use DataType structs rather than the Type::type enum
+    * [PARQUET-882] - [CPP] Improve Application Version parsing
+    * [PARQUET-448] - Add cmake option to skip building the unit tests
+    * [PARQUET-721] - Performance benchmarks for reading into Arrow structures
+    * [PARQUET-820] - C++: Decoders should directly emit arrays with spacing for null entries
+    * [PARQUET-813] - C++: Build dependencies using CMake External project
+    * [PARQUET-488] - Add SSE-related cmake options to manage compiler flags
+    * [PARQUET-564] - Add option to run unit tests with valgrind --tool=memcheck
+    * [PARQUET-572] - Rename parquet_cpp namespace to parquet
+    * [PARQUET-829] - C++: Make use of ARROW-469
+    * [PARQUET-501] - Add an OutputStream abstraction (capable of memory allocation) for Encoder public API
+    * [PARQUET-744] - Clarifications on build instructions
+    * [PARQUET-520] - Add version of LocalFileSource that uses memory-mapping for zero-copy reads
+    * [PARQUET-556] - Extend RowGroupStatistics to include "min" "max" statistics
+    * [PARQUET-671] - Improve performance of RLE/bit-packed decoding in parquet-cpp
+    * [PARQUET-681] - Add tool to scan a parquet file
+
+## New Feature
+    * [PARQUET-499] - Complete PlainEncoder implementation for all primitive types and test end to end
+    * [PARQUET-439] - Conform all copyright headers to ASF requirements
+    * [PARQUET-436] - Implement ParquetFileWriter class entry point for generating new Parquet files
+    * [PARQUET-435] - Provide vectorized ColumnReader interface
+    * [PARQUET-438] - Update RLE encoder/decoder modules from Impala upstream changes and adapt unit tests
+    * [PARQUET-512] - Add optional google/benchmark 3rd-party dependency for performance testing
+    * [PARQUET-566] - Add method to retrieve the full column path
+    * [PARQUET-613] - C++: Add conda packaging recipe
+    * [PARQUET-605] - Expose schema node in ColumnDescriptor
+    * [PARQUET-619] - C++: Add OutputStream for local files
+    * [PARQUET-583] - Implement Parquet to Thrift schema conversion
+    * [PARQUET-582] - Conversion functions for Parquet enums to Thrift enums
+    * [PARQUET-728] - [C++] Bring parquet::arrow up to date with API changes in arrow::io
+    * [PARQUET-752] - [C++] Conform parquet_arrow to upstream API changes
+    * [PARQUET-788] - [C++] Reference Impala / Apache Impala (incubating) in LICENSE
+    * [PARQUET-808] - [C++] Add API to read file given externally-provided FileMetadata
+    * [PARQUET-807] - [C++] Add API to read file metadata only from a file handle
+    * [PARQUET-805] - C++: Read Int96 into Arrow Timestamp(ns)
+    * [PARQUET-836] - [C++] Add column selection to parquet::arrow::FileReader
+    * [PARQUET-835] - [C++] Add option to parquet::arrow to read columns in parallel using a thread pool
+    * [PARQUET-830] - [C++] Add additional configuration options to parquet::arrow::OpenFIle
+    * [PARQUET-769] - C++: Add support for Brotli Compression
+    * [PARQUET-489] - Add visibility macros to be used for public and internal APIs of libparquet
+    * [PARQUET-542] - Support memory allocation from external memory
+    * [PARQUET-844] - [C++] Consolidate encodings, schema, and compression subdirectories into fewer files
+    * [PARQUET-848] - [C++] Consolidate libparquet_thrift subcomponent
+    * [PARQUET-646] - [C++] Enable easier 3rd-party toolchain clang builds on Linux
+    * [PARQUET-598] - [C++] Test writing all primitive data types
+    * [PARQUET-442] - Convert flat SchemaElement vector to implied nested schema data structure
+    * [PARQUET-867] - [C++] Support writing sliced Arrow arrays
+    * [PARQUET-456] - Add zlib codec support
+    * [PARQUET-834] - C++: Support r/w of arrow::ListArray
+    * [PARQUET-485] - Decouple data page delimiting from column reader / scanner classes, create test fixtures
+    * [PARQUET-434] - Add a ParquetFileReader class to encapsulate some low-level details of interacting with Parquet files
+    * [PARQUET-666] - PLAIN_DICTIONARY write support
+    * [PARQUET-437] - Incorporate googletest thirdparty dependency and add cmake tools (ADD_PARQUET_TEST) to simplify adding new unit tests
+    * [PARQUET-866] - [C++] Account for API changes in ARROW-33
+    * [PARQUET-545] - Improve API to support Decimal type
+    * [PARQUET-579] - Add API for writing Column statistics
+    * [PARQUET-494] - Implement PLAIN_DICTIONARY encoding and decoding
+    * [PARQUET-618] - C++: Automatically upload conda build artifacts on commits to master
+    * [PARQUET-833] - C++: Provide API to write spaced arrays (e.g. Arrow)
+    * [PARQUET-903] - C++: Add option to set RPATH to ORIGIN
+    * [PARQUET-451] - Add a RowGroup reader interface class
+    * [PARQUET-785] - C++: List conversion for Arrow Schemas
+    * [PARQUET-712] - C++: Read into Arrow memory
+    * [PARQUET-890] - C++: Support I/O of DATE columns in parquet_arrow
+    * [PARQUET-782] - C++: Support writing to Arrow sinks
+    * [PARQUET-849] - [C++] Upgrade default Thrift in thirdparty toolchain to 0.9.3 or 0.10
+    * [PARQUET-573] - C++: Create a public API for reading and writing file metadata
+
+## Task
+    * [PARQUET-814] - C++: Remove Conda recipes
+    * [PARQUET-503] - Re-enable parquet 2.0 encodings
+    * [PARQUET-169] - Parquet-cpp: Implement support for bulk reading and writing repetition/definition levels.
+    * [PARQUET-878] - C++: Remove setup_build_env from rc-verification script
+    * [PARQUET-881] - C++: Update Arrow hash to 0.2.0-rc2
+    * [PARQUET-771] - C++: Sync KEYS file
+    * [PARQUET-901] - C++: Publish RCs in apache-parquet-VERSION in SVN
+
+## Test
+    * [PARQUET-525] - Test coverage for malformed file failure modes on the read path
+    * [PARQUET-703] - [C++] Validate num_values metadata for columns with nulls
+    * [PARQUET-507] - Improve runtime of rle-test.cc
+    * [PARQUET-549] - Add scanner and column reader tests for dictionary data pages
+    * [PARQUET-457] - Add compressed data page unit tests

+ 34 - 34
contrib/libs/apache/arrow/cpp/README.md

@@ -1,34 +1,34 @@
-<!--- 
-  Licensed to the Apache Software Foundation (ASF) under one 
-  or more contributor license agreements.  See the NOTICE file 
-  distributed with this work for additional information 
-  regarding copyright ownership.  The ASF licenses this file 
-  to you under the Apache License, Version 2.0 (the 
-  "License"); you may not use this file except in compliance 
-  with the License.  You may obtain a copy of the License at 
- 
-    http://www.apache.org/licenses/LICENSE-2.0 
- 
-  Unless required by applicable law or agreed to in writing, 
-  software distributed under the License is distributed on an 
-  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
-  KIND, either express or implied.  See the License for the 
-  specific language governing permissions and limitations 
-  under the License. 
---> 
- 
-# Apache Arrow C++ 
- 
-This directory contains the code and build system for the Arrow C++ libraries, 
-as well as for the C++ libraries for Apache Parquet. 
- 
-## Installation 
- 
-See https://arrow.apache.org/install/ for the latest instructions how 
-to install pre-compiled binary versions of the library. 
- 
-## Source Builds and Development 
- 
-Please refer to our latest [C++ Development Documentation][1]. 
- 
-[1]: https://github.com/apache/arrow/blob/master/docs/source/developers/cpp 
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Apache Arrow C++
+
+This directory contains the code and build system for the Arrow C++ libraries,
+as well as for the C++ libraries for Apache Parquet.
+
+## Installation
+
+See https://arrow.apache.org/install/ for the latest instructions how
+to install pre-compiled binary versions of the library.
+
+## Source Builds and Development
+
+Please refer to our latest [C++ Development Documentation][1].
+
+[1]: https://github.com/apache/arrow/blob/master/docs/source/developers/cpp

+ 32 - 32
contrib/libs/apache/arrow/cpp/src/arrow/array.h

@@ -1,32 +1,32 @@
-// Licensed to the Apache Software Foundation (ASF) under one 
-// or more contributor license agreements.  See the NOTICE file 
-// distributed with this work for additional information 
-// regarding copyright ownership.  The ASF licenses this file 
-// to you under the Apache License, Version 2.0 (the 
-// "License"); you may not use this file except in compliance 
-// with the License.  You may obtain a copy of the License at 
-// 
-//   http://www.apache.org/licenses/LICENSE-2.0 
-// 
-// Unless required by applicable law or agreed to in writing, 
-// software distributed under the License is distributed on an 
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
-// KIND, either express or implied.  See the License for the 
-// specific language governing permissions and limitations 
-// under the License. 
- 
-// Kitchen-sink public API for arrow::Array data structures. C++ library code 
-// (especially header files) in Apache Arrow should use more specific headers 
-// unless it's a file that uses most or all Array types in which case using 
-// arrow/array.h is fine. 
- 
-#pragma once 
- 
-#include "arrow/array/array_base.h"       // IWYU pragma: keep 
-#include "arrow/array/array_binary.h"     // IWYU pragma: keep 
-#include "arrow/array/array_decimal.h"    // IWYU pragma: keep 
-#include "arrow/array/array_dict.h"       // IWYU pragma: keep 
-#include "arrow/array/array_nested.h"     // IWYU pragma: keep 
-#include "arrow/array/array_primitive.h"  // IWYU pragma: keep 
-#include "arrow/array/data.h"             // IWYU pragma: keep 
-#include "arrow/array/util.h"             // IWYU pragma: keep 
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Kitchen-sink public API for arrow::Array data structures. C++ library code
+// (especially header files) in Apache Arrow should use more specific headers
+// unless it's a file that uses most or all Array types in which case using
+// arrow/array.h is fine.
+
+#pragma once
+
+#include "arrow/array/array_base.h"       // IWYU pragma: keep
+#include "arrow/array/array_binary.h"     // IWYU pragma: keep
+#include "arrow/array/array_decimal.h"    // IWYU pragma: keep
+#include "arrow/array/array_dict.h"       // IWYU pragma: keep
+#include "arrow/array/array_nested.h"     // IWYU pragma: keep
+#include "arrow/array/array_primitive.h"  // IWYU pragma: keep
+#include "arrow/array/data.h"             // IWYU pragma: keep
+#include "arrow/array/util.h"             // IWYU pragma: keep

+ 293 - 293
contrib/libs/apache/arrow/cpp/src/arrow/array/array_base.cc

@@ -1,308 +1,308 @@
-// Licensed to the Apache Software Foundation (ASF) under one 
-// or more contributor license agreements.  See the NOTICE file 
-// distributed with this work for additional information 
-// regarding copyright ownership.  The ASF licenses this file 
-// to you under the Apache License, Version 2.0 (the 
-// "License"); you may not use this file except in compliance 
-// with the License.  You may obtain a copy of the License at 
-// 
-//   http://www.apache.org/licenses/LICENSE-2.0 
-// 
-// Unless required by applicable law or agreed to in writing, 
-// software distributed under the License is distributed on an 
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
-// KIND, either express or implied.  See the License for the 
-// specific language governing permissions and limitations 
-// under the License. 
- 
-#include "arrow/array/array_base.h" 
- 
-#include <cstdint> 
-#include <memory> 
-#include <sstream>  // IWYU pragma: keep 
-#include <string> 
-#include <type_traits> 
-#include <utility> 
- 
-#include "arrow/array/array_binary.h" 
-#include "arrow/array/array_dict.h" 
-#include "arrow/array/array_nested.h" 
-#include "arrow/array/array_primitive.h" 
-#include "arrow/array/util.h" 
-#include "arrow/array/validate.h" 
-#include "arrow/buffer.h" 
-#include "arrow/compare.h" 
-#include "arrow/pretty_print.h" 
-#include "arrow/scalar.h" 
-#include "arrow/status.h" 
-#include "arrow/type.h" 
-#include "arrow/type_fwd.h" 
-#include "arrow/type_traits.h" 
-#include "arrow/util/logging.h" 
-#include "arrow/visitor.h" 
-#include "arrow/visitor_inline.h" 
- 
-namespace arrow { 
- 
-class ExtensionArray; 
- 
-// ---------------------------------------------------------------------- 
-// Base array class 
- 
-int64_t Array::null_count() const { return data_->GetNullCount(); } 
- 
-namespace internal { 
- 
-struct ScalarFromArraySlotImpl { 
-  template <typename T> 
-  using ScalarType = typename TypeTraits<T>::ScalarType; 
- 
-  Status Visit(const NullArray& a) { 
-    out_ = std::make_shared<NullScalar>(); 
-    return Status::OK(); 
-  } 
- 
-  Status Visit(const BooleanArray& a) { return Finish(a.Value(index_)); } 
- 
-  template <typename T> 
-  Status Visit(const NumericArray<T>& a) { 
-    return Finish(a.Value(index_)); 
-  } 
- 
-  Status Visit(const Decimal128Array& a) { 
-    return Finish(Decimal128(a.GetValue(index_))); 
-  } 
- 
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/array/array_base.h"
+
+#include <cstdint>
+#include <memory>
+#include <sstream>  // IWYU pragma: keep
+#include <string>
+#include <type_traits>
+#include <utility>
+
+#include "arrow/array/array_binary.h"
+#include "arrow/array/array_dict.h"
+#include "arrow/array/array_nested.h"
+#include "arrow/array/array_primitive.h"
+#include "arrow/array/util.h"
+#include "arrow/array/validate.h"
+#include "arrow/buffer.h"
+#include "arrow/compare.h"
+#include "arrow/pretty_print.h"
+#include "arrow/scalar.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/logging.h"
+#include "arrow/visitor.h"
+#include "arrow/visitor_inline.h"
+
+namespace arrow {
+
+class ExtensionArray;
+
+// ----------------------------------------------------------------------
+// Base array class
+
+int64_t Array::null_count() const { return data_->GetNullCount(); }
+
+namespace internal {
+
+struct ScalarFromArraySlotImpl {
+  template <typename T>
+  using ScalarType = typename TypeTraits<T>::ScalarType;
+
+  Status Visit(const NullArray& a) {
+    out_ = std::make_shared<NullScalar>();
+    return Status::OK();
+  }
+
+  Status Visit(const BooleanArray& a) { return Finish(a.Value(index_)); }
+
+  template <typename T>
+  Status Visit(const NumericArray<T>& a) {
+    return Finish(a.Value(index_));
+  }
+
+  Status Visit(const Decimal128Array& a) {
+    return Finish(Decimal128(a.GetValue(index_)));
+  }
+
   Status Visit(const Decimal256Array& a) {
     return Finish(Decimal256(a.GetValue(index_)));
   }
 
-  template <typename T> 
-  Status Visit(const BaseBinaryArray<T>& a) { 
-    return Finish(a.GetString(index_)); 
-  } 
- 
-  Status Visit(const FixedSizeBinaryArray& a) { return Finish(a.GetString(index_)); } 
- 
-  Status Visit(const DayTimeIntervalArray& a) { return Finish(a.Value(index_)); } 
- 
-  template <typename T> 
-  Status Visit(const BaseListArray<T>& a) { 
-    return Finish(a.value_slice(index_)); 
-  } 
- 
-  Status Visit(const FixedSizeListArray& a) { return Finish(a.value_slice(index_)); } 
- 
-  Status Visit(const StructArray& a) { 
-    ScalarVector children; 
-    for (const auto& child : a.fields()) { 
-      children.emplace_back(); 
-      ARROW_ASSIGN_OR_RAISE(children.back(), child->GetScalar(index_)); 
-    } 
-    return Finish(std::move(children)); 
-  } 
- 
-  Status Visit(const SparseUnionArray& a) { 
-    // child array which stores the actual value 
-    auto arr = a.field(a.child_id(index_)); 
-    // no need to adjust the index 
-    ARROW_ASSIGN_OR_RAISE(auto value, arr->GetScalar(index_)); 
-    if (value->is_valid) { 
-      out_ = std::shared_ptr<Scalar>(new SparseUnionScalar(value, a.type())); 
-    } else { 
-      out_ = MakeNullScalar(a.type()); 
-    } 
-    return Status::OK(); 
-  } 
- 
-  Status Visit(const DenseUnionArray& a) { 
-    // child array which stores the actual value 
-    auto arr = a.field(a.child_id(index_)); 
-    // need to look up the value based on offsets 
-    auto offset = a.value_offset(index_); 
-    ARROW_ASSIGN_OR_RAISE(auto value, arr->GetScalar(offset)); 
-    if (value->is_valid) { 
-      out_ = std::shared_ptr<Scalar>(new DenseUnionScalar(value, a.type())); 
-    } else { 
-      out_ = MakeNullScalar(a.type()); 
-    } 
-    return Status::OK(); 
-  } 
- 
-  Status Visit(const DictionaryArray& a) { 
-    auto ty = a.type(); 
- 
-    ARROW_ASSIGN_OR_RAISE(auto index, 
-                          MakeScalar(checked_cast<DictionaryType&>(*ty).index_type(), 
-                                     a.GetValueIndex(index_))); 
- 
-    auto scalar = DictionaryScalar(ty); 
-    scalar.is_valid = a.IsValid(index_); 
-    scalar.value.index = index; 
-    scalar.value.dictionary = a.dictionary(); 
- 
-    out_ = std::make_shared<DictionaryScalar>(std::move(scalar)); 
-    return Status::OK(); 
-  } 
- 
-  Status Visit(const ExtensionArray& a) { 
-    return Status::NotImplemented("Non-null ExtensionScalar"); 
-  } 
- 
-  template <typename Arg> 
-  Status Finish(Arg&& arg) { 
-    return MakeScalar(array_.type(), std::forward<Arg>(arg)).Value(&out_); 
-  } 
- 
-  Status Finish(std::string arg) { 
-    return MakeScalar(array_.type(), Buffer::FromString(std::move(arg))).Value(&out_); 
-  } 
- 
-  Result<std::shared_ptr<Scalar>> Finish() && { 
-    if (index_ >= array_.length()) { 
-      return Status::IndexError("tried to refer to element ", index_, 
-                                " but array is only ", array_.length(), " long"); 
-    } 
- 
-    if (array_.IsNull(index_)) { 
-      auto null = MakeNullScalar(array_.type()); 
-      if (is_dictionary(array_.type()->id())) { 
-        auto& dict_null = checked_cast<DictionaryScalar&>(*null); 
-        const auto& dict_array = checked_cast<const DictionaryArray&>(array_); 
-        dict_null.value.dictionary = dict_array.dictionary(); 
-      } 
-      return null; 
-    } 
- 
-    RETURN_NOT_OK(VisitArrayInline(array_, this)); 
-    return std::move(out_); 
-  } 
- 
-  ScalarFromArraySlotImpl(const Array& array, int64_t index) 
-      : array_(array), index_(index) {} 
- 
-  const Array& array_; 
-  int64_t index_; 
-  std::shared_ptr<Scalar> out_; 
-}; 
- 
-}  // namespace internal 
- 
-Result<std::shared_ptr<Scalar>> Array::GetScalar(int64_t i) const { 
-  return internal::ScalarFromArraySlotImpl{*this, i}.Finish(); 
-} 
- 
-std::string Array::Diff(const Array& other) const { 
-  std::stringstream diff; 
-  ARROW_IGNORE_EXPR(Equals(other, EqualOptions().diff_sink(&diff))); 
-  return diff.str(); 
-} 
- 
-bool Array::Equals(const Array& arr, const EqualOptions& opts) const { 
-  return ArrayEquals(*this, arr, opts); 
-} 
- 
-bool Array::Equals(const std::shared_ptr<Array>& arr, const EqualOptions& opts) const { 
-  if (!arr) { 
-    return false; 
-  } 
-  return Equals(*arr, opts); 
-} 
- 
-bool Array::ApproxEquals(const Array& arr, const EqualOptions& opts) const { 
-  return ArrayApproxEquals(*this, arr, opts); 
-} 
- 
-bool Array::ApproxEquals(const std::shared_ptr<Array>& arr, 
-                         const EqualOptions& opts) const { 
-  if (!arr) { 
-    return false; 
-  } 
-  return ApproxEquals(*arr, opts); 
-} 
- 
-bool Array::RangeEquals(const Array& other, int64_t start_idx, int64_t end_idx, 
+  template <typename T>
+  Status Visit(const BaseBinaryArray<T>& a) {
+    return Finish(a.GetString(index_));
+  }
+
+  Status Visit(const FixedSizeBinaryArray& a) { return Finish(a.GetString(index_)); }
+
+  Status Visit(const DayTimeIntervalArray& a) { return Finish(a.Value(index_)); }
+
+  template <typename T>
+  Status Visit(const BaseListArray<T>& a) {
+    return Finish(a.value_slice(index_));
+  }
+
+  Status Visit(const FixedSizeListArray& a) { return Finish(a.value_slice(index_)); }
+
+  Status Visit(const StructArray& a) {
+    ScalarVector children;
+    for (const auto& child : a.fields()) {
+      children.emplace_back();
+      ARROW_ASSIGN_OR_RAISE(children.back(), child->GetScalar(index_));
+    }
+    return Finish(std::move(children));
+  }
+
+  Status Visit(const SparseUnionArray& a) {
+    // child array which stores the actual value
+    auto arr = a.field(a.child_id(index_));
+    // no need to adjust the index
+    ARROW_ASSIGN_OR_RAISE(auto value, arr->GetScalar(index_));
+    if (value->is_valid) {
+      out_ = std::shared_ptr<Scalar>(new SparseUnionScalar(value, a.type()));
+    } else {
+      out_ = MakeNullScalar(a.type());
+    }
+    return Status::OK();
+  }
+
+  Status Visit(const DenseUnionArray& a) {
+    // child array which stores the actual value
+    auto arr = a.field(a.child_id(index_));
+    // need to look up the value based on offsets
+    auto offset = a.value_offset(index_);
+    ARROW_ASSIGN_OR_RAISE(auto value, arr->GetScalar(offset));
+    if (value->is_valid) {
+      out_ = std::shared_ptr<Scalar>(new DenseUnionScalar(value, a.type()));
+    } else {
+      out_ = MakeNullScalar(a.type());
+    }
+    return Status::OK();
+  }
+
+  Status Visit(const DictionaryArray& a) {
+    auto ty = a.type();
+
+    ARROW_ASSIGN_OR_RAISE(auto index,
+                          MakeScalar(checked_cast<DictionaryType&>(*ty).index_type(),
+                                     a.GetValueIndex(index_)));
+
+    auto scalar = DictionaryScalar(ty);
+    scalar.is_valid = a.IsValid(index_);
+    scalar.value.index = index;
+    scalar.value.dictionary = a.dictionary();
+
+    out_ = std::make_shared<DictionaryScalar>(std::move(scalar));
+    return Status::OK();
+  }
+
+  Status Visit(const ExtensionArray& a) {
+    return Status::NotImplemented("Non-null ExtensionScalar");
+  }
+
+  template <typename Arg>
+  Status Finish(Arg&& arg) {
+    return MakeScalar(array_.type(), std::forward<Arg>(arg)).Value(&out_);
+  }
+
+  Status Finish(std::string arg) {
+    return MakeScalar(array_.type(), Buffer::FromString(std::move(arg))).Value(&out_);
+  }
+
+  Result<std::shared_ptr<Scalar>> Finish() && {
+    if (index_ >= array_.length()) {
+      return Status::IndexError("tried to refer to element ", index_,
+                                " but array is only ", array_.length(), " long");
+    }
+
+    if (array_.IsNull(index_)) {
+      auto null = MakeNullScalar(array_.type());
+      if (is_dictionary(array_.type()->id())) {
+        auto& dict_null = checked_cast<DictionaryScalar&>(*null);
+        const auto& dict_array = checked_cast<const DictionaryArray&>(array_);
+        dict_null.value.dictionary = dict_array.dictionary();
+      }
+      return null;
+    }
+
+    RETURN_NOT_OK(VisitArrayInline(array_, this));
+    return std::move(out_);
+  }
+
+  ScalarFromArraySlotImpl(const Array& array, int64_t index)
+      : array_(array), index_(index) {}
+
+  const Array& array_;
+  int64_t index_;
+  std::shared_ptr<Scalar> out_;
+};
+
+}  // namespace internal
+
+Result<std::shared_ptr<Scalar>> Array::GetScalar(int64_t i) const {
+  return internal::ScalarFromArraySlotImpl{*this, i}.Finish();
+}
+
+std::string Array::Diff(const Array& other) const {
+  std::stringstream diff;
+  ARROW_IGNORE_EXPR(Equals(other, EqualOptions().diff_sink(&diff)));
+  return diff.str();
+}
+
+bool Array::Equals(const Array& arr, const EqualOptions& opts) const {
+  return ArrayEquals(*this, arr, opts);
+}
+
+bool Array::Equals(const std::shared_ptr<Array>& arr, const EqualOptions& opts) const {
+  if (!arr) {
+    return false;
+  }
+  return Equals(*arr, opts);
+}
+
+bool Array::ApproxEquals(const Array& arr, const EqualOptions& opts) const {
+  return ArrayApproxEquals(*this, arr, opts);
+}
+
+bool Array::ApproxEquals(const std::shared_ptr<Array>& arr,
+                         const EqualOptions& opts) const {
+  if (!arr) {
+    return false;
+  }
+  return ApproxEquals(*arr, opts);
+}
+
+bool Array::RangeEquals(const Array& other, int64_t start_idx, int64_t end_idx,
                         int64_t other_start_idx, const EqualOptions& opts) const {
   return ArrayRangeEquals(*this, other, start_idx, end_idx, other_start_idx, opts);
-} 
- 
-bool Array::RangeEquals(const std::shared_ptr<Array>& other, int64_t start_idx, 
+}
+
+bool Array::RangeEquals(const std::shared_ptr<Array>& other, int64_t start_idx,
                         int64_t end_idx, int64_t other_start_idx,
                         const EqualOptions& opts) const {
-  if (!other) { 
-    return false; 
-  } 
+  if (!other) {
+    return false;
+  }
   return ArrayRangeEquals(*this, *other, start_idx, end_idx, other_start_idx, opts);
-} 
- 
-bool Array::RangeEquals(int64_t start_idx, int64_t end_idx, int64_t other_start_idx, 
+}
+
+bool Array::RangeEquals(int64_t start_idx, int64_t end_idx, int64_t other_start_idx,
                         const Array& other, const EqualOptions& opts) const {
   return ArrayRangeEquals(*this, other, start_idx, end_idx, other_start_idx, opts);
-} 
- 
-bool Array::RangeEquals(int64_t start_idx, int64_t end_idx, int64_t other_start_idx, 
+}
+
+bool Array::RangeEquals(int64_t start_idx, int64_t end_idx, int64_t other_start_idx,
                         const std::shared_ptr<Array>& other,
                         const EqualOptions& opts) const {
-  if (!other) { 
-    return false; 
-  } 
+  if (!other) {
+    return false;
+  }
   return ArrayRangeEquals(*this, *other, start_idx, end_idx, other_start_idx, opts);
-} 
- 
-std::shared_ptr<Array> Array::Slice(int64_t offset, int64_t length) const { 
-  return MakeArray(data_->Slice(offset, length)); 
-} 
- 
-std::shared_ptr<Array> Array::Slice(int64_t offset) const { 
-  int64_t slice_length = data_->length - offset; 
-  return Slice(offset, slice_length); 
-} 
- 
-Result<std::shared_ptr<Array>> Array::SliceSafe(int64_t offset, int64_t length) const { 
-  ARROW_ASSIGN_OR_RAISE(auto sliced_data, data_->SliceSafe(offset, length)); 
-  return MakeArray(std::move(sliced_data)); 
-} 
- 
-Result<std::shared_ptr<Array>> Array::SliceSafe(int64_t offset) const { 
-  if (offset < 0) { 
-    // Avoid UBSAN in subtraction below 
-    return Status::Invalid("Negative buffer slice offset"); 
-  } 
-  return SliceSafe(offset, data_->length - offset); 
-} 
- 
-std::string Array::ToString() const { 
-  std::stringstream ss; 
-  ARROW_CHECK_OK(PrettyPrint(*this, 0, &ss)); 
-  return ss.str(); 
-} 
- 
-Result<std::shared_ptr<Array>> Array::View( 
-    const std::shared_ptr<DataType>& out_type) const { 
-  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<ArrayData> result, 
-                        internal::GetArrayView(data_, out_type)); 
-  return MakeArray(result); 
-} 
- 
-// ---------------------------------------------------------------------- 
-// NullArray 
- 
-NullArray::NullArray(int64_t length) { 
-  SetData(ArrayData::Make(null(), length, {nullptr}, length)); 
-} 
- 
-// ---------------------------------------------------------------------- 
-// Implement Array::Accept as inline visitor 
- 
-Status Array::Accept(ArrayVisitor* visitor) const { 
-  return VisitArrayInline(*this, visitor); 
-} 
- 
-Status Array::Validate() const { return internal::ValidateArray(*this); } 
- 
-Status Array::ValidateFull() const { 
-  RETURN_NOT_OK(internal::ValidateArray(*this)); 
+}
+
+std::shared_ptr<Array> Array::Slice(int64_t offset, int64_t length) const {
+  return MakeArray(data_->Slice(offset, length));
+}
+
+std::shared_ptr<Array> Array::Slice(int64_t offset) const {
+  int64_t slice_length = data_->length - offset;
+  return Slice(offset, slice_length);
+}
+
+Result<std::shared_ptr<Array>> Array::SliceSafe(int64_t offset, int64_t length) const {
+  ARROW_ASSIGN_OR_RAISE(auto sliced_data, data_->SliceSafe(offset, length));
+  return MakeArray(std::move(sliced_data));
+}
+
+Result<std::shared_ptr<Array>> Array::SliceSafe(int64_t offset) const {
+  if (offset < 0) {
+    // Avoid UBSAN in subtraction below
+    return Status::Invalid("Negative buffer slice offset");
+  }
+  return SliceSafe(offset, data_->length - offset);
+}
+
+std::string Array::ToString() const {
+  std::stringstream ss;
+  ARROW_CHECK_OK(PrettyPrint(*this, 0, &ss));
+  return ss.str();
+}
+
+Result<std::shared_ptr<Array>> Array::View(
+    const std::shared_ptr<DataType>& out_type) const {
+  ARROW_ASSIGN_OR_RAISE(std::shared_ptr<ArrayData> result,
+                        internal::GetArrayView(data_, out_type));
+  return MakeArray(result);
+}
+
+// ----------------------------------------------------------------------
+// NullArray
+
+NullArray::NullArray(int64_t length) {
+  SetData(ArrayData::Make(null(), length, {nullptr}, length));
+}
+
+// ----------------------------------------------------------------------
+// Implement Array::Accept as inline visitor
+
+Status Array::Accept(ArrayVisitor* visitor) const {
+  return VisitArrayInline(*this, visitor);
+}
+
+Status Array::Validate() const { return internal::ValidateArray(*this); }
+
+Status Array::ValidateFull() const {
+  RETURN_NOT_OK(internal::ValidateArray(*this));
   return internal::ValidateArrayFull(*this);
-} 
- 
-}  // namespace arrow 
+}
+
+}  // namespace arrow

+ 244 - 244
contrib/libs/apache/arrow/cpp/src/arrow/array/array_base.h

@@ -1,260 +1,260 @@
-// Licensed to the Apache Software Foundation (ASF) under one 
-// or more contributor license agreements.  See the NOTICE file 
-// distributed with this work for additional information 
-// regarding copyright ownership.  The ASF licenses this file 
-// to you under the Apache License, Version 2.0 (the 
-// "License"); you may not use this file except in compliance 
-// with the License.  You may obtain a copy of the License at 
-// 
-//   http://www.apache.org/licenses/LICENSE-2.0 
-// 
-// Unless required by applicable law or agreed to in writing, 
-// software distributed under the License is distributed on an 
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
-// KIND, either express or implied.  See the License for the 
-// specific language governing permissions and limitations 
-// under the License. 
- 
-#pragma once 
- 
-#include <cstdint> 
-#include <iosfwd> 
-#include <memory> 
-#include <string> 
-#include <vector> 
- 
-#include "arrow/array/data.h" 
-#include "arrow/buffer.h" 
-#include "arrow/compare.h" 
-#include "arrow/result.h" 
-#include "arrow/status.h" 
-#include "arrow/type.h" 
-#include "arrow/util/bit_util.h" 
-#include "arrow/util/macros.h" 
-#include "arrow/util/visibility.h" 
-#include "arrow/visitor.h" 
- 
-namespace arrow { 
- 
-// ---------------------------------------------------------------------- 
-// User array accessor types 
- 
-/// \brief Array base type 
-/// Immutable data array with some logical type and some length. 
-/// 
-/// Any memory is owned by the respective Buffer instance (or its parents). 
-/// 
-/// The base class is only required to have a null bitmap buffer if the null 
-/// count is greater than 0 
-/// 
-/// If known, the null count can be provided in the base Array constructor. If 
-/// the null count is not known, pass -1 to indicate that the null count is to 
-/// be computed on the first call to null_count() 
-class ARROW_EXPORT Array { 
- public: 
-  virtual ~Array() = default; 
- 
-  /// \brief Return true if value at index is null. Does not boundscheck 
-  bool IsNull(int64_t i) const { 
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <cstdint>
+#include <iosfwd>
+#include <memory>
+#include <string>
+#include <vector>
+
+#include "arrow/array/data.h"
+#include "arrow/buffer.h"
+#include "arrow/compare.h"
+#include "arrow/result.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/macros.h"
+#include "arrow/util/visibility.h"
+#include "arrow/visitor.h"
+
+namespace arrow {
+
+// ----------------------------------------------------------------------
+// User array accessor types
+
+/// \brief Array base type
+/// Immutable data array with some logical type and some length.
+///
+/// Any memory is owned by the respective Buffer instance (or its parents).
+///
+/// The base class is only required to have a null bitmap buffer if the null
+/// count is greater than 0
+///
+/// If known, the null count can be provided in the base Array constructor. If
+/// the null count is not known, pass -1 to indicate that the null count is to
+/// be computed on the first call to null_count()
+class ARROW_EXPORT Array {
+ public:
+  virtual ~Array() = default;
+
+  /// \brief Return true if value at index is null. Does not boundscheck
+  bool IsNull(int64_t i) const {
     return null_bitmap_data_ != NULLPTR
                ? !BitUtil::GetBit(null_bitmap_data_, i + data_->offset)
                : data_->null_count == data_->length;
-  } 
- 
-  /// \brief Return true if value at index is valid (not null). Does not 
-  /// boundscheck 
-  bool IsValid(int64_t i) const { 
+  }
+
+  /// \brief Return true if value at index is valid (not null). Does not
+  /// boundscheck
+  bool IsValid(int64_t i) const {
     return null_bitmap_data_ != NULLPTR
                ? BitUtil::GetBit(null_bitmap_data_, i + data_->offset)
                : data_->null_count != data_->length;
-  } 
- 
-  /// \brief Return a Scalar containing the value of this array at i 
-  Result<std::shared_ptr<Scalar>> GetScalar(int64_t i) const; 
- 
-  /// Size in the number of elements this array contains. 
-  int64_t length() const { return data_->length; } 
- 
-  /// A relative position into another array's data, to enable zero-copy 
-  /// slicing. This value defaults to zero 
-  int64_t offset() const { return data_->offset; } 
- 
-  /// The number of null entries in the array. If the null count was not known 
-  /// at time of construction (and set to a negative value), then the null 
-  /// count will be computed and cached on the first invocation of this 
-  /// function 
-  int64_t null_count() const; 
- 
-  std::shared_ptr<DataType> type() const { return data_->type; } 
-  Type::type type_id() const { return data_->type->id(); } 
- 
-  /// Buffer for the validity (null) bitmap, if any. Note that Union types 
-  /// never have a null bitmap. 
-  /// 
-  /// Note that for `null_count == 0` or for null type, this will be null. 
-  /// This buffer does not account for any slice offset 
+  }
+
+  /// \brief Return a Scalar containing the value of this array at i
+  Result<std::shared_ptr<Scalar>> GetScalar(int64_t i) const;
+
+  /// Size in the number of elements this array contains.
+  int64_t length() const { return data_->length; }
+
+  /// A relative position into another array's data, to enable zero-copy
+  /// slicing. This value defaults to zero
+  int64_t offset() const { return data_->offset; }
+
+  /// The number of null entries in the array. If the null count was not known
+  /// at time of construction (and set to a negative value), then the null
+  /// count will be computed and cached on the first invocation of this
+  /// function
+  int64_t null_count() const;
+
+  std::shared_ptr<DataType> type() const { return data_->type; }
+  Type::type type_id() const { return data_->type->id(); }
+
+  /// Buffer for the validity (null) bitmap, if any. Note that Union types
+  /// never have a null bitmap.
+  ///
+  /// Note that for `null_count == 0` or for null type, this will be null.
+  /// This buffer does not account for any slice offset
   const std::shared_ptr<Buffer>& null_bitmap() const { return data_->buffers[0]; }
- 
-  /// Raw pointer to the null bitmap. 
-  /// 
-  /// Note that for `null_count == 0` or for null type, this will be null. 
-  /// This buffer does not account for any slice offset 
-  const uint8_t* null_bitmap_data() const { return null_bitmap_data_; } 
- 
-  /// Equality comparison with another array 
-  bool Equals(const Array& arr, const EqualOptions& = EqualOptions::Defaults()) const; 
-  bool Equals(const std::shared_ptr<Array>& arr, 
-              const EqualOptions& = EqualOptions::Defaults()) const; 
- 
-  /// \brief Return the formatted unified diff of arrow::Diff between this 
-  /// Array and another Array 
-  std::string Diff(const Array& other) const; 
- 
-  /// Approximate equality comparison with another array 
-  /// 
-  /// epsilon is only used if this is FloatArray or DoubleArray 
-  bool ApproxEquals(const std::shared_ptr<Array>& arr, 
-                    const EqualOptions& = EqualOptions::Defaults()) const; 
-  bool ApproxEquals(const Array& arr, 
-                    const EqualOptions& = EqualOptions::Defaults()) const; 
- 
-  /// Compare if the range of slots specified are equal for the given array and 
-  /// this array.  end_idx exclusive.  This methods does not bounds check. 
-  bool RangeEquals(int64_t start_idx, int64_t end_idx, int64_t other_start_idx, 
+
+  /// Raw pointer to the null bitmap.
+  ///
+  /// Note that for `null_count == 0` or for null type, this will be null.
+  /// This buffer does not account for any slice offset
+  const uint8_t* null_bitmap_data() const { return null_bitmap_data_; }
+
+  /// Equality comparison with another array
+  bool Equals(const Array& arr, const EqualOptions& = EqualOptions::Defaults()) const;
+  bool Equals(const std::shared_ptr<Array>& arr,
+              const EqualOptions& = EqualOptions::Defaults()) const;
+
+  /// \brief Return the formatted unified diff of arrow::Diff between this
+  /// Array and another Array
+  std::string Diff(const Array& other) const;
+
+  /// Approximate equality comparison with another array
+  ///
+  /// epsilon is only used if this is FloatArray or DoubleArray
+  bool ApproxEquals(const std::shared_ptr<Array>& arr,
+                    const EqualOptions& = EqualOptions::Defaults()) const;
+  bool ApproxEquals(const Array& arr,
+                    const EqualOptions& = EqualOptions::Defaults()) const;
+
+  /// Compare if the range of slots specified are equal for the given array and
+  /// this array.  end_idx exclusive.  This methods does not bounds check.
+  bool RangeEquals(int64_t start_idx, int64_t end_idx, int64_t other_start_idx,
                    const Array& other,
                    const EqualOptions& = EqualOptions::Defaults()) const;
-  bool RangeEquals(int64_t start_idx, int64_t end_idx, int64_t other_start_idx, 
+  bool RangeEquals(int64_t start_idx, int64_t end_idx, int64_t other_start_idx,
                    const std::shared_ptr<Array>& other,
                    const EqualOptions& = EqualOptions::Defaults()) const;
-  bool RangeEquals(const Array& other, int64_t start_idx, int64_t end_idx, 
+  bool RangeEquals(const Array& other, int64_t start_idx, int64_t end_idx,
                    int64_t other_start_idx,
                    const EqualOptions& = EqualOptions::Defaults()) const;
-  bool RangeEquals(const std::shared_ptr<Array>& other, int64_t start_idx, 
+  bool RangeEquals(const std::shared_ptr<Array>& other, int64_t start_idx,
                    int64_t end_idx, int64_t other_start_idx,
                    const EqualOptions& = EqualOptions::Defaults()) const;
- 
-  Status Accept(ArrayVisitor* visitor) const; 
- 
-  /// Construct a zero-copy view of this array with the given type. 
-  /// 
-  /// This method checks if the types are layout-compatible. 
-  /// Nested types are traversed in depth-first order. Data buffers must have 
-  /// the same item sizes, even though the logical types may be different. 
-  /// An error is returned if the types are not layout-compatible. 
-  Result<std::shared_ptr<Array>> View(const std::shared_ptr<DataType>& type) const; 
- 
-  /// Construct a zero-copy slice of the array with the indicated offset and 
-  /// length 
-  /// 
-  /// \param[in] offset the position of the first element in the constructed 
-  /// slice 
-  /// \param[in] length the length of the slice. If there are not enough 
-  /// elements in the array, the length will be adjusted accordingly 
-  /// 
-  /// \return a new object wrapped in std::shared_ptr<Array> 
-  std::shared_ptr<Array> Slice(int64_t offset, int64_t length) const; 
- 
-  /// Slice from offset until end of the array 
-  std::shared_ptr<Array> Slice(int64_t offset) const; 
- 
-  /// Input-checking variant of Array::Slice 
-  Result<std::shared_ptr<Array>> SliceSafe(int64_t offset, int64_t length) const; 
-  /// Input-checking variant of Array::Slice 
-  Result<std::shared_ptr<Array>> SliceSafe(int64_t offset) const; 
- 
+
+  Status Accept(ArrayVisitor* visitor) const;
+
+  /// Construct a zero-copy view of this array with the given type.
+  ///
+  /// This method checks if the types are layout-compatible.
+  /// Nested types are traversed in depth-first order. Data buffers must have
+  /// the same item sizes, even though the logical types may be different.
+  /// An error is returned if the types are not layout-compatible.
+  Result<std::shared_ptr<Array>> View(const std::shared_ptr<DataType>& type) const;
+
+  /// Construct a zero-copy slice of the array with the indicated offset and
+  /// length
+  ///
+  /// \param[in] offset the position of the first element in the constructed
+  /// slice
+  /// \param[in] length the length of the slice. If there are not enough
+  /// elements in the array, the length will be adjusted accordingly
+  ///
+  /// \return a new object wrapped in std::shared_ptr<Array>
+  std::shared_ptr<Array> Slice(int64_t offset, int64_t length) const;
+
+  /// Slice from offset until end of the array
+  std::shared_ptr<Array> Slice(int64_t offset) const;
+
+  /// Input-checking variant of Array::Slice
+  Result<std::shared_ptr<Array>> SliceSafe(int64_t offset, int64_t length) const;
+  /// Input-checking variant of Array::Slice
+  Result<std::shared_ptr<Array>> SliceSafe(int64_t offset) const;
+
   const std::shared_ptr<ArrayData>& data() const { return data_; }
- 
-  int num_fields() const { return static_cast<int>(data_->child_data.size()); } 
- 
-  /// \return PrettyPrint representation of array suitable for debugging 
-  std::string ToString() const; 
- 
-  /// \brief Perform cheap validation checks to determine obvious inconsistencies 
-  /// within the array's internal data. 
-  /// 
-  /// This is O(k) where k is the number of descendents. 
-  /// 
-  /// \return Status 
-  Status Validate() const; 
- 
-  /// \brief Perform extensive validation checks to determine inconsistencies 
-  /// within the array's internal data. 
-  /// 
-  /// This is potentially O(k*n) where k is the number of descendents and n 
-  /// is the array length. 
-  /// 
-  /// \return Status 
-  Status ValidateFull() const; 
- 
- protected: 
-  Array() : null_bitmap_data_(NULLPTR) {} 
- 
-  std::shared_ptr<ArrayData> data_; 
-  const uint8_t* null_bitmap_data_; 
- 
-  /// Protected method for constructors 
-  void SetData(const std::shared_ptr<ArrayData>& data) { 
-    if (data->buffers.size() > 0) { 
-      null_bitmap_data_ = data->GetValuesSafe<uint8_t>(0, /*offset=*/0); 
-    } else { 
-      null_bitmap_data_ = NULLPTR; 
-    } 
-    data_ = data; 
-  } 
- 
- private: 
-  ARROW_DISALLOW_COPY_AND_ASSIGN(Array); 
-}; 
- 
-static inline std::ostream& operator<<(std::ostream& os, const Array& x) { 
-  os << x.ToString(); 
-  return os; 
-} 
- 
-/// Base class for non-nested arrays 
-class ARROW_EXPORT FlatArray : public Array { 
- protected: 
-  using Array::Array; 
-}; 
- 
-/// Base class for arrays of fixed-size logical types 
-class ARROW_EXPORT PrimitiveArray : public FlatArray { 
- public: 
-  PrimitiveArray(const std::shared_ptr<DataType>& type, int64_t length, 
-                 const std::shared_ptr<Buffer>& data, 
-                 const std::shared_ptr<Buffer>& null_bitmap = NULLPTR, 
-                 int64_t null_count = kUnknownNullCount, int64_t offset = 0); 
- 
-  /// Does not account for any slice offset 
-  std::shared_ptr<Buffer> values() const { return data_->buffers[1]; } 
- 
- protected: 
-  PrimitiveArray() : raw_values_(NULLPTR) {} 
- 
-  void SetData(const std::shared_ptr<ArrayData>& data) { 
-    this->Array::SetData(data); 
-    raw_values_ = data->GetValuesSafe<uint8_t>(1, /*offset=*/0); 
-  } 
- 
-  explicit PrimitiveArray(const std::shared_ptr<ArrayData>& data) { SetData(data); } 
- 
-  const uint8_t* raw_values_; 
-}; 
- 
-/// Degenerate null type Array 
-class ARROW_EXPORT NullArray : public FlatArray { 
- public: 
-  using TypeClass = NullType; 
- 
-  explicit NullArray(const std::shared_ptr<ArrayData>& data) { SetData(data); } 
-  explicit NullArray(int64_t length); 
- 
- private: 
-  void SetData(const std::shared_ptr<ArrayData>& data) { 
-    null_bitmap_data_ = NULLPTR; 
-    data->null_count = data->length; 
-    data_ = data; 
-  } 
-}; 
- 
-}  // namespace arrow 
+
+  int num_fields() const { return static_cast<int>(data_->child_data.size()); }
+
+  /// \return PrettyPrint representation of array suitable for debugging
+  std::string ToString() const;
+
+  /// \brief Perform cheap validation checks to determine obvious inconsistencies
+  /// within the array's internal data.
+  ///
+  /// This is O(k) where k is the number of descendents.
+  ///
+  /// \return Status
+  Status Validate() const;
+
+  /// \brief Perform extensive validation checks to determine inconsistencies
+  /// within the array's internal data.
+  ///
+  /// This is potentially O(k*n) where k is the number of descendents and n
+  /// is the array length.
+  ///
+  /// \return Status
+  Status ValidateFull() const;
+
+ protected:
+  Array() : null_bitmap_data_(NULLPTR) {}
+
+  std::shared_ptr<ArrayData> data_;
+  const uint8_t* null_bitmap_data_;
+
+  /// Protected method for constructors
+  void SetData(const std::shared_ptr<ArrayData>& data) {
+    if (data->buffers.size() > 0) {
+      null_bitmap_data_ = data->GetValuesSafe<uint8_t>(0, /*offset=*/0);
+    } else {
+      null_bitmap_data_ = NULLPTR;
+    }
+    data_ = data;
+  }
+
+ private:
+  ARROW_DISALLOW_COPY_AND_ASSIGN(Array);
+};
+
+static inline std::ostream& operator<<(std::ostream& os, const Array& x) {
+  os << x.ToString();
+  return os;
+}
+
+/// Base class for non-nested arrays
+class ARROW_EXPORT FlatArray : public Array {
+ protected:
+  using Array::Array;
+};
+
+/// Base class for arrays of fixed-size logical types
+class ARROW_EXPORT PrimitiveArray : public FlatArray {
+ public:
+  PrimitiveArray(const std::shared_ptr<DataType>& type, int64_t length,
+                 const std::shared_ptr<Buffer>& data,
+                 const std::shared_ptr<Buffer>& null_bitmap = NULLPTR,
+                 int64_t null_count = kUnknownNullCount, int64_t offset = 0);
+
+  /// Does not account for any slice offset
+  std::shared_ptr<Buffer> values() const { return data_->buffers[1]; }
+
+ protected:
+  PrimitiveArray() : raw_values_(NULLPTR) {}
+
+  void SetData(const std::shared_ptr<ArrayData>& data) {
+    this->Array::SetData(data);
+    raw_values_ = data->GetValuesSafe<uint8_t>(1, /*offset=*/0);
+  }
+
+  explicit PrimitiveArray(const std::shared_ptr<ArrayData>& data) { SetData(data); }
+
+  const uint8_t* raw_values_;
+};
+
+/// Degenerate null type Array
+class ARROW_EXPORT NullArray : public FlatArray {
+ public:
+  using TypeClass = NullType;
+
+  explicit NullArray(const std::shared_ptr<ArrayData>& data) { SetData(data); }
+  explicit NullArray(int64_t length);
+
+ private:
+  void SetData(const std::shared_ptr<ArrayData>& data) {
+    null_bitmap_data_ = NULLPTR;
+    data->null_count = data->length;
+    data_ = data;
+  }
+};
+
+}  // namespace arrow

+ 102 - 102
contrib/libs/apache/arrow/cpp/src/arrow/array/array_binary.cc

@@ -1,108 +1,108 @@
-// Licensed to the Apache Software Foundation (ASF) under one 
-// or more contributor license agreements.  See the NOTICE file 
-// distributed with this work for additional information 
-// regarding copyright ownership.  The ASF licenses this file 
-// to you under the Apache License, Version 2.0 (the 
-// "License"); you may not use this file except in compliance 
-// with the License.  You may obtain a copy of the License at 
-// 
-//   http://www.apache.org/licenses/LICENSE-2.0 
-// 
-// Unless required by applicable law or agreed to in writing, 
-// software distributed under the License is distributed on an 
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
-// KIND, either express or implied.  See the License for the 
-// specific language governing permissions and limitations 
-// under the License. 
- 
-#include "arrow/array/array_binary.h" 
- 
-#include <cstdint> 
-#include <memory> 
- 
-#include "arrow/array/array_base.h" 
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/array/array_binary.h"
+
+#include <cstdint>
+#include <memory>
+
+#include "arrow/array/array_base.h"
 #include "arrow/array/validate.h"
-#include "arrow/type.h" 
+#include "arrow/type.h"
 #include "arrow/type_traits.h"
-#include "arrow/util/checked_cast.h" 
-#include "arrow/util/logging.h" 
- 
-namespace arrow { 
- 
-using internal::checked_cast; 
- 
-BinaryArray::BinaryArray(const std::shared_ptr<ArrayData>& data) { 
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+
+namespace arrow {
+
+using internal::checked_cast;
+
+BinaryArray::BinaryArray(const std::shared_ptr<ArrayData>& data) {
   ARROW_CHECK(is_binary_like(data->type->id()));
-  SetData(data); 
-} 
- 
-BinaryArray::BinaryArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets, 
-                         const std::shared_ptr<Buffer>& data, 
-                         const std::shared_ptr<Buffer>& null_bitmap, int64_t null_count, 
-                         int64_t offset) { 
-  SetData(ArrayData::Make(binary(), length, {null_bitmap, value_offsets, data}, 
-                          null_count, offset)); 
-} 
- 
-LargeBinaryArray::LargeBinaryArray(const std::shared_ptr<ArrayData>& data) { 
+  SetData(data);
+}
+
+BinaryArray::BinaryArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets,
+                         const std::shared_ptr<Buffer>& data,
+                         const std::shared_ptr<Buffer>& null_bitmap, int64_t null_count,
+                         int64_t offset) {
+  SetData(ArrayData::Make(binary(), length, {null_bitmap, value_offsets, data},
+                          null_count, offset));
+}
+
+LargeBinaryArray::LargeBinaryArray(const std::shared_ptr<ArrayData>& data) {
   ARROW_CHECK(is_large_binary_like(data->type->id()));
-  SetData(data); 
-} 
- 
-LargeBinaryArray::LargeBinaryArray(int64_t length, 
-                                   const std::shared_ptr<Buffer>& value_offsets, 
-                                   const std::shared_ptr<Buffer>& data, 
-                                   const std::shared_ptr<Buffer>& null_bitmap, 
-                                   int64_t null_count, int64_t offset) { 
-  SetData(ArrayData::Make(large_binary(), length, {null_bitmap, value_offsets, data}, 
-                          null_count, offset)); 
-} 
- 
-StringArray::StringArray(const std::shared_ptr<ArrayData>& data) { 
-  ARROW_CHECK_EQ(data->type->id(), Type::STRING); 
-  SetData(data); 
-} 
- 
-StringArray::StringArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets, 
-                         const std::shared_ptr<Buffer>& data, 
-                         const std::shared_ptr<Buffer>& null_bitmap, int64_t null_count, 
-                         int64_t offset) { 
-  SetData(ArrayData::Make(utf8(), length, {null_bitmap, value_offsets, data}, null_count, 
-                          offset)); 
-} 
- 
+  SetData(data);
+}
+
+LargeBinaryArray::LargeBinaryArray(int64_t length,
+                                   const std::shared_ptr<Buffer>& value_offsets,
+                                   const std::shared_ptr<Buffer>& data,
+                                   const std::shared_ptr<Buffer>& null_bitmap,
+                                   int64_t null_count, int64_t offset) {
+  SetData(ArrayData::Make(large_binary(), length, {null_bitmap, value_offsets, data},
+                          null_count, offset));
+}
+
+StringArray::StringArray(const std::shared_ptr<ArrayData>& data) {
+  ARROW_CHECK_EQ(data->type->id(), Type::STRING);
+  SetData(data);
+}
+
+StringArray::StringArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets,
+                         const std::shared_ptr<Buffer>& data,
+                         const std::shared_ptr<Buffer>& null_bitmap, int64_t null_count,
+                         int64_t offset) {
+  SetData(ArrayData::Make(utf8(), length, {null_bitmap, value_offsets, data}, null_count,
+                          offset));
+}
+
 Status StringArray::ValidateUTF8() const { return internal::ValidateUTF8(*data_); }
- 
-LargeStringArray::LargeStringArray(const std::shared_ptr<ArrayData>& data) { 
-  ARROW_CHECK_EQ(data->type->id(), Type::LARGE_STRING); 
-  SetData(data); 
-} 
- 
-LargeStringArray::LargeStringArray(int64_t length, 
-                                   const std::shared_ptr<Buffer>& value_offsets, 
-                                   const std::shared_ptr<Buffer>& data, 
-                                   const std::shared_ptr<Buffer>& null_bitmap, 
-                                   int64_t null_count, int64_t offset) { 
-  SetData(ArrayData::Make(large_utf8(), length, {null_bitmap, value_offsets, data}, 
-                          null_count, offset)); 
-} 
- 
+
+LargeStringArray::LargeStringArray(const std::shared_ptr<ArrayData>& data) {
+  ARROW_CHECK_EQ(data->type->id(), Type::LARGE_STRING);
+  SetData(data);
+}
+
+LargeStringArray::LargeStringArray(int64_t length,
+                                   const std::shared_ptr<Buffer>& value_offsets,
+                                   const std::shared_ptr<Buffer>& data,
+                                   const std::shared_ptr<Buffer>& null_bitmap,
+                                   int64_t null_count, int64_t offset) {
+  SetData(ArrayData::Make(large_utf8(), length, {null_bitmap, value_offsets, data},
+                          null_count, offset));
+}
+
 Status LargeStringArray::ValidateUTF8() const { return internal::ValidateUTF8(*data_); }
- 
-FixedSizeBinaryArray::FixedSizeBinaryArray(const std::shared_ptr<ArrayData>& data) { 
-  SetData(data); 
-} 
- 
-FixedSizeBinaryArray::FixedSizeBinaryArray(const std::shared_ptr<DataType>& type, 
-                                           int64_t length, 
-                                           const std::shared_ptr<Buffer>& data, 
-                                           const std::shared_ptr<Buffer>& null_bitmap, 
-                                           int64_t null_count, int64_t offset) 
-    : PrimitiveArray(type, length, data, null_bitmap, null_count, offset), 
-      byte_width_(checked_cast<const FixedSizeBinaryType&>(*type).byte_width()) {} 
- 
-const uint8_t* FixedSizeBinaryArray::GetValue(int64_t i) const { 
-  return raw_values_ + (i + data_->offset) * byte_width_; 
-} 
- 
-}  // namespace arrow 
+
+FixedSizeBinaryArray::FixedSizeBinaryArray(const std::shared_ptr<ArrayData>& data) {
+  SetData(data);
+}
+
+FixedSizeBinaryArray::FixedSizeBinaryArray(const std::shared_ptr<DataType>& type,
+                                           int64_t length,
+                                           const std::shared_ptr<Buffer>& data,
+                                           const std::shared_ptr<Buffer>& null_bitmap,
+                                           int64_t null_count, int64_t offset)
+    : PrimitiveArray(type, length, data, null_bitmap, null_count, offset),
+      byte_width_(checked_cast<const FixedSizeBinaryType&>(*type).byte_width()) {}
+
+const uint8_t* FixedSizeBinaryArray::GetValue(int64_t i) const {
+  return raw_values_ + (i + data_->offset) * byte_width_;
+}
+
+}  // namespace arrow

+ 234 - 234
contrib/libs/apache/arrow/cpp/src/arrow/array/array_binary.h

@@ -1,76 +1,76 @@
-// Licensed to the Apache Software Foundation (ASF) under one 
-// or more contributor license agreements.  See the NOTICE file 
-// distributed with this work for additional information 
-// regarding copyright ownership.  The ASF licenses this file 
-// to you under the Apache License, Version 2.0 (the 
-// "License"); you may not use this file except in compliance 
-// with the License.  You may obtain a copy of the License at 
-// 
-//   http://www.apache.org/licenses/LICENSE-2.0 
-// 
-// Unless required by applicable law or agreed to in writing, 
-// software distributed under the License is distributed on an 
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
-// KIND, either express or implied.  See the License for the 
-// specific language governing permissions and limitations 
-// under the License. 
- 
-// Array accessor classes for Binary, LargeBinart, String, LargeString, 
-// FixedSizeBinary 
- 
-#pragma once 
- 
-#include <cstdint> 
-#include <memory> 
-#include <string> 
-#include <vector> 
- 
-#include "arrow/array/array_base.h" 
-#include "arrow/array/data.h" 
-#include "arrow/buffer.h" 
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Array accessor classes for Binary, LargeBinart, String, LargeString,
+// FixedSizeBinary
+
+#pragma once
+
+#include <cstdint>
+#include <memory>
+#include <string>
+#include <vector>
+
+#include "arrow/array/array_base.h"
+#include "arrow/array/data.h"
+#include "arrow/buffer.h"
 #include "arrow/stl_iterator.h"
-#include "arrow/type.h" 
-#include "arrow/util/checked_cast.h" 
-#include "arrow/util/macros.h" 
-#include "arrow/util/string_view.h"  // IWYU pragma: export 
-#include "arrow/util/visibility.h" 
- 
-namespace arrow { 
- 
-// ---------------------------------------------------------------------- 
-// Binary and String 
- 
-/// Base class for variable-sized binary arrays, regardless of offset size 
-/// and logical interpretation. 
-template <typename TYPE> 
-class BaseBinaryArray : public FlatArray { 
- public: 
-  using TypeClass = TYPE; 
-  using offset_type = typename TypeClass::offset_type; 
+#include "arrow/type.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/macros.h"
+#include "arrow/util/string_view.h"  // IWYU pragma: export
+#include "arrow/util/visibility.h"
+
+namespace arrow {
+
+// ----------------------------------------------------------------------
+// Binary and String
+
+/// Base class for variable-sized binary arrays, regardless of offset size
+/// and logical interpretation.
+template <typename TYPE>
+class BaseBinaryArray : public FlatArray {
+ public:
+  using TypeClass = TYPE;
+  using offset_type = typename TypeClass::offset_type;
   using IteratorType = stl::ArrayIterator<BaseBinaryArray<TYPE>>;
- 
-  /// Return the pointer to the given elements bytes 
-  // XXX should GetValue(int64_t i) return a string_view? 
-  const uint8_t* GetValue(int64_t i, offset_type* out_length) const { 
-    // Account for base offset 
-    i += data_->offset; 
-    const offset_type pos = raw_value_offsets_[i]; 
-    *out_length = raw_value_offsets_[i + 1] - pos; 
-    return raw_data_ + pos; 
-  } 
- 
-  /// \brief Get binary value as a string_view 
-  /// 
-  /// \param i the value index 
-  /// \return the view over the selected value 
-  util::string_view GetView(int64_t i) const { 
-    // Account for base offset 
-    i += data_->offset; 
-    const offset_type pos = raw_value_offsets_[i]; 
-    return util::string_view(reinterpret_cast<const char*>(raw_data_ + pos), 
-                             raw_value_offsets_[i + 1] - pos); 
-  } 
- 
+
+  /// Return the pointer to the given elements bytes
+  // XXX should GetValue(int64_t i) return a string_view?
+  const uint8_t* GetValue(int64_t i, offset_type* out_length) const {
+    // Account for base offset
+    i += data_->offset;
+    const offset_type pos = raw_value_offsets_[i];
+    *out_length = raw_value_offsets_[i + 1] - pos;
+    return raw_data_ + pos;
+  }
+
+  /// \brief Get binary value as a string_view
+  ///
+  /// \param i the value index
+  /// \return the view over the selected value
+  util::string_view GetView(int64_t i) const {
+    // Account for base offset
+    i += data_->offset;
+    const offset_type pos = raw_value_offsets_[i];
+    return util::string_view(reinterpret_cast<const char*>(raw_data_ + pos),
+                             raw_value_offsets_[i + 1] - pos);
+  }
+
   /// \brief Get binary value as a string_view
   /// Provided for consistency with other arrays.
   ///
@@ -78,178 +78,178 @@ class BaseBinaryArray : public FlatArray {
   /// \return the view over the selected value
   util::string_view Value(int64_t i) const { return GetView(i); }
 
-  /// \brief Get binary value as a std::string 
-  /// 
-  /// \param i the value index 
-  /// \return the value copied into a std::string 
-  std::string GetString(int64_t i) const { return std::string(GetView(i)); } 
- 
-  /// Note that this buffer does not account for any slice offset 
-  std::shared_ptr<Buffer> value_offsets() const { return data_->buffers[1]; } 
- 
-  /// Note that this buffer does not account for any slice offset 
-  std::shared_ptr<Buffer> value_data() const { return data_->buffers[2]; } 
- 
-  const offset_type* raw_value_offsets() const { 
-    return raw_value_offsets_ + data_->offset; 
-  } 
- 
-  const uint8_t* raw_data() const { return raw_data_; } 
- 
-  /// \brief Return the data buffer absolute offset of the data for the value 
-  /// at the passed index. 
-  /// 
-  /// Does not perform boundschecking 
-  offset_type value_offset(int64_t i) const { 
-    return raw_value_offsets_[i + data_->offset]; 
-  } 
- 
-  /// \brief Return the length of the data for the value at the passed index. 
-  /// 
-  /// Does not perform boundschecking 
-  offset_type value_length(int64_t i) const { 
-    i += data_->offset; 
-    return raw_value_offsets_[i + 1] - raw_value_offsets_[i]; 
-  } 
- 
-  /// \brief Return the total length of the memory in the data buffer 
-  /// referenced by this array. If the array has been sliced then this may be 
-  /// less than the size of the data buffer (data_->buffers[2]). 
-  offset_type total_values_length() const { 
-    if (data_->length > 0) { 
-      return raw_value_offsets_[data_->length + data_->offset] - 
-             raw_value_offsets_[data_->offset]; 
-    } else { 
-      return 0; 
-    } 
-  } 
- 
+  /// \brief Get binary value as a std::string
+  ///
+  /// \param i the value index
+  /// \return the value copied into a std::string
+  std::string GetString(int64_t i) const { return std::string(GetView(i)); }
+
+  /// Note that this buffer does not account for any slice offset
+  std::shared_ptr<Buffer> value_offsets() const { return data_->buffers[1]; }
+
+  /// Note that this buffer does not account for any slice offset
+  std::shared_ptr<Buffer> value_data() const { return data_->buffers[2]; }
+
+  const offset_type* raw_value_offsets() const {
+    return raw_value_offsets_ + data_->offset;
+  }
+
+  const uint8_t* raw_data() const { return raw_data_; }
+
+  /// \brief Return the data buffer absolute offset of the data for the value
+  /// at the passed index.
+  ///
+  /// Does not perform boundschecking
+  offset_type value_offset(int64_t i) const {
+    return raw_value_offsets_[i + data_->offset];
+  }
+
+  /// \brief Return the length of the data for the value at the passed index.
+  ///
+  /// Does not perform boundschecking
+  offset_type value_length(int64_t i) const {
+    i += data_->offset;
+    return raw_value_offsets_[i + 1] - raw_value_offsets_[i];
+  }
+
+  /// \brief Return the total length of the memory in the data buffer
+  /// referenced by this array. If the array has been sliced then this may be
+  /// less than the size of the data buffer (data_->buffers[2]).
+  offset_type total_values_length() const {
+    if (data_->length > 0) {
+      return raw_value_offsets_[data_->length + data_->offset] -
+             raw_value_offsets_[data_->offset];
+    } else {
+      return 0;
+    }
+  }
+
   IteratorType begin() const { return IteratorType(*this); }
 
   IteratorType end() const { return IteratorType(*this, length()); }
 
- protected: 
-  // For subclasses 
+ protected:
+  // For subclasses
   BaseBinaryArray() = default;
- 
-  // Protected method for constructors 
-  void SetData(const std::shared_ptr<ArrayData>& data) { 
-    this->Array::SetData(data); 
-    raw_value_offsets_ = data->GetValuesSafe<offset_type>(1, /*offset=*/0); 
-    raw_data_ = data->GetValuesSafe<uint8_t>(2, /*offset=*/0); 
-  } 
- 
+
+  // Protected method for constructors
+  void SetData(const std::shared_ptr<ArrayData>& data) {
+    this->Array::SetData(data);
+    raw_value_offsets_ = data->GetValuesSafe<offset_type>(1, /*offset=*/0);
+    raw_data_ = data->GetValuesSafe<uint8_t>(2, /*offset=*/0);
+  }
+
   const offset_type* raw_value_offsets_ = NULLPTR;
   const uint8_t* raw_data_ = NULLPTR;
-}; 
- 
-/// Concrete Array class for variable-size binary data 
-class ARROW_EXPORT BinaryArray : public BaseBinaryArray<BinaryType> { 
- public: 
-  explicit BinaryArray(const std::shared_ptr<ArrayData>& data); 
- 
-  BinaryArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets, 
-              const std::shared_ptr<Buffer>& data, 
-              const std::shared_ptr<Buffer>& null_bitmap = NULLPTR, 
-              int64_t null_count = kUnknownNullCount, int64_t offset = 0); 
- 
- protected: 
-  // For subclasses such as StringArray 
-  BinaryArray() : BaseBinaryArray() {} 
-}; 
- 
-/// Concrete Array class for variable-size string (utf-8) data 
-class ARROW_EXPORT StringArray : public BinaryArray { 
- public: 
-  using TypeClass = StringType; 
- 
-  explicit StringArray(const std::shared_ptr<ArrayData>& data); 
- 
-  StringArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets, 
-              const std::shared_ptr<Buffer>& data, 
-              const std::shared_ptr<Buffer>& null_bitmap = NULLPTR, 
-              int64_t null_count = kUnknownNullCount, int64_t offset = 0); 
- 
-  /// \brief Validate that this array contains only valid UTF8 entries 
-  /// 
-  /// This check is also implied by ValidateFull() 
-  Status ValidateUTF8() const; 
-}; 
- 
-/// Concrete Array class for large variable-size binary data 
-class ARROW_EXPORT LargeBinaryArray : public BaseBinaryArray<LargeBinaryType> { 
- public: 
-  explicit LargeBinaryArray(const std::shared_ptr<ArrayData>& data); 
- 
-  LargeBinaryArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets, 
-                   const std::shared_ptr<Buffer>& data, 
-                   const std::shared_ptr<Buffer>& null_bitmap = NULLPTR, 
-                   int64_t null_count = kUnknownNullCount, int64_t offset = 0); 
- 
- protected: 
-  // For subclasses such as LargeStringArray 
-  LargeBinaryArray() : BaseBinaryArray() {} 
-}; 
- 
-/// Concrete Array class for large variable-size string (utf-8) data 
-class ARROW_EXPORT LargeStringArray : public LargeBinaryArray { 
- public: 
-  using TypeClass = LargeStringType; 
- 
-  explicit LargeStringArray(const std::shared_ptr<ArrayData>& data); 
- 
-  LargeStringArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets, 
-                   const std::shared_ptr<Buffer>& data, 
-                   const std::shared_ptr<Buffer>& null_bitmap = NULLPTR, 
-                   int64_t null_count = kUnknownNullCount, int64_t offset = 0); 
- 
-  /// \brief Validate that this array contains only valid UTF8 entries 
-  /// 
-  /// This check is also implied by ValidateFull() 
-  Status ValidateUTF8() const; 
-}; 
- 
-// ---------------------------------------------------------------------- 
-// Fixed width binary 
- 
-/// Concrete Array class for fixed-size binary data 
-class ARROW_EXPORT FixedSizeBinaryArray : public PrimitiveArray { 
- public: 
-  using TypeClass = FixedSizeBinaryType; 
+};
+
+/// Concrete Array class for variable-size binary data
+class ARROW_EXPORT BinaryArray : public BaseBinaryArray<BinaryType> {
+ public:
+  explicit BinaryArray(const std::shared_ptr<ArrayData>& data);
+
+  BinaryArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets,
+              const std::shared_ptr<Buffer>& data,
+              const std::shared_ptr<Buffer>& null_bitmap = NULLPTR,
+              int64_t null_count = kUnknownNullCount, int64_t offset = 0);
+
+ protected:
+  // For subclasses such as StringArray
+  BinaryArray() : BaseBinaryArray() {}
+};
+
+/// Concrete Array class for variable-size string (utf-8) data
+class ARROW_EXPORT StringArray : public BinaryArray {
+ public:
+  using TypeClass = StringType;
+
+  explicit StringArray(const std::shared_ptr<ArrayData>& data);
+
+  StringArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets,
+              const std::shared_ptr<Buffer>& data,
+              const std::shared_ptr<Buffer>& null_bitmap = NULLPTR,
+              int64_t null_count = kUnknownNullCount, int64_t offset = 0);
+
+  /// \brief Validate that this array contains only valid UTF8 entries
+  ///
+  /// This check is also implied by ValidateFull()
+  Status ValidateUTF8() const;
+};
+
+/// Concrete Array class for large variable-size binary data
+class ARROW_EXPORT LargeBinaryArray : public BaseBinaryArray<LargeBinaryType> {
+ public:
+  explicit LargeBinaryArray(const std::shared_ptr<ArrayData>& data);
+
+  LargeBinaryArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets,
+                   const std::shared_ptr<Buffer>& data,
+                   const std::shared_ptr<Buffer>& null_bitmap = NULLPTR,
+                   int64_t null_count = kUnknownNullCount, int64_t offset = 0);
+
+ protected:
+  // For subclasses such as LargeStringArray
+  LargeBinaryArray() : BaseBinaryArray() {}
+};
+
+/// Concrete Array class for large variable-size string (utf-8) data
+class ARROW_EXPORT LargeStringArray : public LargeBinaryArray {
+ public:
+  using TypeClass = LargeStringType;
+
+  explicit LargeStringArray(const std::shared_ptr<ArrayData>& data);
+
+  LargeStringArray(int64_t length, const std::shared_ptr<Buffer>& value_offsets,
+                   const std::shared_ptr<Buffer>& data,
+                   const std::shared_ptr<Buffer>& null_bitmap = NULLPTR,
+                   int64_t null_count = kUnknownNullCount, int64_t offset = 0);
+
+  /// \brief Validate that this array contains only valid UTF8 entries
+  ///
+  /// This check is also implied by ValidateFull()
+  Status ValidateUTF8() const;
+};
+
+// ----------------------------------------------------------------------
+// Fixed width binary
+
+/// Concrete Array class for fixed-size binary data
+class ARROW_EXPORT FixedSizeBinaryArray : public PrimitiveArray {
+ public:
+  using TypeClass = FixedSizeBinaryType;
   using IteratorType = stl::ArrayIterator<FixedSizeBinaryArray>;
- 
-  explicit FixedSizeBinaryArray(const std::shared_ptr<ArrayData>& data); 
- 
-  FixedSizeBinaryArray(const std::shared_ptr<DataType>& type, int64_t length, 
-                       const std::shared_ptr<Buffer>& data, 
-                       const std::shared_ptr<Buffer>& null_bitmap = NULLPTR, 
-                       int64_t null_count = kUnknownNullCount, int64_t offset = 0); 
- 
-  const uint8_t* GetValue(int64_t i) const; 
-  const uint8_t* Value(int64_t i) const { return GetValue(i); } 
- 
-  util::string_view GetView(int64_t i) const { 
-    return util::string_view(reinterpret_cast<const char*>(GetValue(i)), byte_width()); 
-  } 
- 
-  std::string GetString(int64_t i) const { return std::string(GetView(i)); } 
- 
-  int32_t byte_width() const { return byte_width_; } 
- 
-  const uint8_t* raw_values() const { return raw_values_ + data_->offset * byte_width_; } 
- 
+
+  explicit FixedSizeBinaryArray(const std::shared_ptr<ArrayData>& data);
+
+  FixedSizeBinaryArray(const std::shared_ptr<DataType>& type, int64_t length,
+                       const std::shared_ptr<Buffer>& data,
+                       const std::shared_ptr<Buffer>& null_bitmap = NULLPTR,
+                       int64_t null_count = kUnknownNullCount, int64_t offset = 0);
+
+  const uint8_t* GetValue(int64_t i) const;
+  const uint8_t* Value(int64_t i) const { return GetValue(i); }
+
+  util::string_view GetView(int64_t i) const {
+    return util::string_view(reinterpret_cast<const char*>(GetValue(i)), byte_width());
+  }
+
+  std::string GetString(int64_t i) const { return std::string(GetView(i)); }
+
+  int32_t byte_width() const { return byte_width_; }
+
+  const uint8_t* raw_values() const { return raw_values_ + data_->offset * byte_width_; }
+
   IteratorType begin() const { return IteratorType(*this); }
 
   IteratorType end() const { return IteratorType(*this, length()); }
 
- protected: 
-  void SetData(const std::shared_ptr<ArrayData>& data) { 
-    this->PrimitiveArray::SetData(data); 
-    byte_width_ = 
-        internal::checked_cast<const FixedSizeBinaryType&>(*type()).byte_width(); 
-  } 
- 
-  int32_t byte_width_; 
-}; 
- 
-}  // namespace arrow 
+ protected:
+  void SetData(const std::shared_ptr<ArrayData>& data) {
+    this->PrimitiveArray::SetData(data);
+    byte_width_ =
+        internal::checked_cast<const FixedSizeBinaryType&>(*type()).byte_width();
+  }
+
+  int32_t byte_width_;
+};
+
+}  // namespace arrow

+ 47 - 47
contrib/libs/apache/arrow/cpp/src/arrow/array/array_decimal.cc

@@ -1,51 +1,51 @@
-// Licensed to the Apache Software Foundation (ASF) under one 
-// or more contributor license agreements.  See the NOTICE file 
-// distributed with this work for additional information 
-// regarding copyright ownership.  The ASF licenses this file 
-// to you under the Apache License, Version 2.0 (the 
-// "License"); you may not use this file except in compliance 
-// with the License.  You may obtain a copy of the License at 
-// 
-//   http://www.apache.org/licenses/LICENSE-2.0 
-// 
-// Unless required by applicable law or agreed to in writing, 
-// software distributed under the License is distributed on an 
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
-// KIND, either express or implied.  See the License for the 
-// specific language governing permissions and limitations 
-// under the License. 
- 
-#include "arrow/array/array_decimal.h" 
- 
-#include <cstdint> 
-#include <memory> 
-#include <string> 
- 
-#include "arrow/array/array_binary.h" 
-#include "arrow/array/data.h" 
-#include "arrow/type.h" 
-#include "arrow/util/checked_cast.h" 
-#include "arrow/util/decimal.h" 
-#include "arrow/util/logging.h" 
- 
-namespace arrow { 
- 
-using internal::checked_cast; 
- 
-// ---------------------------------------------------------------------- 
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/array/array_decimal.h"
+
+#include <cstdint>
+#include <memory>
+#include <string>
+
+#include "arrow/array/array_binary.h"
+#include "arrow/array/data.h"
+#include "arrow/type.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/decimal.h"
+#include "arrow/util/logging.h"
+
+namespace arrow {
+
+using internal::checked_cast;
+
+// ----------------------------------------------------------------------
 // Decimal128
- 
-Decimal128Array::Decimal128Array(const std::shared_ptr<ArrayData>& data) 
-    : FixedSizeBinaryArray(data) { 
+
+Decimal128Array::Decimal128Array(const std::shared_ptr<ArrayData>& data)
+    : FixedSizeBinaryArray(data) {
   ARROW_CHECK_EQ(data->type->id(), Type::DECIMAL128);
-} 
- 
-std::string Decimal128Array::FormatValue(int64_t i) const { 
-  const auto& type_ = checked_cast<const Decimal128Type&>(*type()); 
-  const Decimal128 value(GetValue(i)); 
-  return value.ToString(type_.scale()); 
-} 
- 
+}
+
+std::string Decimal128Array::FormatValue(int64_t i) const {
+  const auto& type_ = checked_cast<const Decimal128Type&>(*type());
+  const Decimal128 value(GetValue(i));
+  return value.ToString(type_.scale());
+}
+
 // ----------------------------------------------------------------------
 // Decimal256
 
@@ -60,4 +60,4 @@ std::string Decimal256Array::FormatValue(int64_t i) const {
   return value.ToString(type_.scale());
 }
 
-}  // namespace arrow 
+}  // namespace arrow

Some files were not shown because too many files changed in this diff