Skip to content

Cache type codes to eliminate shared_ptr overhead in hot paths#465

Open
iskakaushik wants to merge 1 commit intoClickHouse:masterfrom
iskakaushik:cache-type-codes-perf
Open

Cache type codes to eliminate shared_ptr overhead in hot paths#465
iskakaushik wants to merge 1 commit intoClickHouse:masterfrom
iskakaushik:cache-type-codes-perf

Conversation

@iskakaushik
Copy link

@iskakaushik iskakaushik commented Feb 25, 2026

  • ColumnDecimal: Cache data_type_code_ at construction to avoid shared_ptr<Type> temporaries and dynamic_cast via As<>() on every Append(Int128) and At() call. The underlying storage type (Int32/Int64/Int128) is invariant after construction.
  • ColumnLowCardinality: Cache index_type_code_ to replace VisitIndexColumn() — which called Type()->GetCode() (creating a shared_ptr temporary) plus dynamic_cast per invocation — with direct static_cast in getDictionaryIndex, appendIndex, and removeLastIndex. Cache is updated in LoadBody and Swap where index_column_ may change.

Profiling showed shared_ptr<Type> destructors consuming 5.54% of CPU in ColumnDecimal::Append and 3.91% in VisitIndexColumn, making these the highest-impact non-compression optimizations available.

Estimated gain: ~7-9% combined insert throughput improvement.

ColumnDecimal: cache data_type_code_ to avoid shared_ptr<Type>
temporaries and dynamic_cast via As<>() on every Append/At call.

ColumnLowCardinality: cache index_type_code_ to replace
VisitIndexColumn (which called Type()->GetCode() + dynamic_cast
per invocation) with direct static_cast in getDictionaryIndex,
appendIndex, and removeLastIndex.

Benchmarked with a 10M row insert (43-column schema, 100K rows/batch,
LZ4 1MB chunks) over 10 interleaved iterations. Block build time
improved by 7.0% (3477ms → 3233ms, t=30.8, p≈0) and grand total
by 1.6% (12086ms → 11890ms, t=6.4, p<0.001). Insert phase itself
is unchanged as it remains LZ4-compression-bound.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant