[fix](paimon-cpp) deduplicate Arrow linking to fix SIGSEGV in FilterRowGroupsByPredicate#60883
[fix](paimon-cpp) deduplicate Arrow linking to fix SIGSEGV in FilterRowGroupsByPredicate#60883xylaaaaa wants to merge 3 commits intoapache:masterfrom
Conversation
…owGroupsByPredicate When ENABLE_PAIMON_CPP is ON, both Doris's own libarrow.a and paimon-cpp's libarrow.a were linked into doris_be, causing 3698 duplicate global symbols. This led to SIGSEGV crashes in paimon::parquet::ParquetFileBatchReader:: FilterRowGroupsByPredicate when libarrow_dataset.a resolved arrow core calls to the wrong copy (compiled with different feature flags). Both are Arrow 17.0.0 but compiled with different options: - Doris: COMPUTE=OFF, DATASET=OFF, ACERO=OFF, FLIGHT=ON - paimon: COMPUTE=ON, DATASET=ON, ACERO=ON, FLIGHT=OFF Fix: when paimon_deps Arrow stack is selected, remove Doris's 'arrow' from COMMON_THIRDPARTY. paimon's libarrow.a is a superset and provides all symbols needed by Doris's arrow_flight / arrow_flight_sql.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes SIGSEGV crashes in paimon-cpp's ParquetFileBatchReader when ENABLE_PAIMON_CPP is ON. The crash was caused by linking both Doris's libarrow.a and paimon-cpp's libarrow.a into the binary, creating 3698 duplicate global symbols. Although both are Arrow 17.0.0, they were compiled with different feature flags (Doris: FLIGHT enabled, paimon: COMPUTE/DATASET/ACERO/FILESYSTEM enabled), causing memory layout incompatibilities that led to crashes when arrow_dataset resolved symbols to the wrong copy.
Changes:
- Implement stack-based Arrow library selection logic that chooses either the complete Doris or paimon_deps Arrow stack
- When paimon_deps stack is selected, remove Doris's arrow from COMMON_THIRDPARTY to eliminate duplicate symbols
- Add status messages to indicate which Arrow stack is being used
Comments suppressed due to low confidence (1)
be/CMakeLists.txt:634
- The comment states "mixing different Arrow versions" but both Doris and paimon use Arrow 17.0.0 according to the PR description. Consider updating the comment to clarify that the issue is mixing the same Arrow version compiled with different feature flags, not different versions.
# mixing different Arrow versions (e.g. Doris core + paimon dataset/acero),
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
be/CMakeLists.txt
Outdated
| if (_doris_arrow_core AND _doris_arrow_dataset AND _doris_arrow_acero) | ||
| set(_selected_arrow_stack "doris") | ||
| set(_selected_arrow_core "${_doris_arrow_core}") | ||
| set(_selected_arrow_dataset "${_doris_arrow_dataset}") | ||
| set(_selected_arrow_acero "${_doris_arrow_acero}") | ||
| set(_selected_arrow_filesystem "${_doris_arrow_filesystem}") | ||
| elseif (_paimon_arrow_core AND _paimon_arrow_dataset AND _paimon_arrow_acero) | ||
| set(_selected_arrow_stack "paimon_deps") | ||
| set(_selected_arrow_core "${_paimon_arrow_core}") | ||
| set(_selected_arrow_dataset "${_paimon_arrow_dataset}") | ||
| set(_selected_arrow_acero "${_paimon_arrow_acero}") | ||
| set(_selected_arrow_filesystem "${_paimon_arrow_filesystem}") |
There was a problem hiding this comment.
The selection logic prioritizes Doris's Arrow stack over paimon_deps when both are complete. If Doris's build configuration changes in the future to include DATASET and ACERO modules, this could cause the same duplicate symbol issues this PR is fixing, because paimon's code would be linked against Doris's Arrow instead of paimon_deps's Arrow. Consider adding a comment explaining this priority decision, or if paimon_deps should always be preferred when ENABLE_PAIMON_CPP is ON, swap the priority order.
Proposed changes
Problem
When
ENABLE_PAIMON_CPPis ON, both Doris's ownlibarrow.aand paimon-cpp'slibarrow.aare linked intodoris_be, causing 3698 duplicate global symbols. This leads to SIGSEGV crashes inpaimon::parquet::ParquetFileBatchReader::FilterRowGroupsByPredicatewhenlibarrow_dataset.aresolves arrow core calls to the wrong copy (compiled with different feature flags).Both are Arrow 17.0.0 but compiled with different options:
Crash Stack
Root Cause
Inside
-Wl,--start-group ... --end-group, the linker may resolve symbols fromlibarrow_dataset.a(paimon's) to Doris'slibarrow.a, which was compiled without COMPUTE/FILESYSTEM modules. The internal object memory layout differs, causingarrow::Statusand other objects to trigger illegal memory access when passed across library boundaries.Fix
When the
paimon_depsArrow stack is selected (because Doris lackslibarrow_dataset.a/libarrow_acero.a), remove Doris'sarrowfromCOMMON_THIRDPARTY.paimon's
libarrow.ais a superset of Doris's version (same 17.0.0, with additional modules enabled), so it provides all symbols needed by Doris'slibarrow_flight.a/libarrow_flight_sql.a.Impact
be/CMakeLists.txtchanged (~10 lines).ENABLE_PAIMON_CPP=OFF.Types of changes