refactor(dataset-raw): bucketed dump partitions by sistemd · Pull Request #1888 · edgeandnode/amp

sistemd · 2026-03-03T12:51:17Z

In preparation for a Solana extractor refactoring, introduce fn bucket_size() -> Option<NonZeroU64> - a new method on the BlockStreamer trait.
The return value of this method (if Some) is the size of the buckets in which this extractor fetches blocks. This matters for determining how to split block ranges for parallel streaming. For example, the Solana extractor would benefit from knowing that each instance of it is the only one responsible for buckets that encompass the range that the instance needs to extract.
Refactors the dump logic to respect bucket_size (if Some) when partitioning block ranges that should be dumped.

claude

Review Summary

Critical

BlockStreamerWithRetry does not delegate bucket_size(): The new bucketed partitioning path is unreachable in production because .with_retry() wraps the client before it reaches dump_ranges, and the retry wrapper inherits the default None return. This makes the Solana bucket_size implementation and split_and_partition_bucketed effectively dead code.

Bugs / Robustness

Silent fallback in Solana bucket_size(): NonZeroU64::new(DEFAULT_SLOTS_PER_EPOCH) should use .expect() or a const assertion instead of silently returning None if the constant were ever 0.
Potential arithmetic overflow in Phase 1: this_end + 1 and the bucket end calculation can overflow near u64::MAX (theoretical, but worth guarding).

Consistency / Guidelines

Missing min_partition_blocks assertion: split_and_partition_bucketed lacks the invariant check that split_and_partition has for non-leftover partitions.
Missing # Panics documentation: Both partition functions use assert! without documenting panic conditions per docs/code/rust-documentation.md.
Documentation mismatch in extractors.md: bucket_size() shown as a required method (no body) but has a default impl; checklist says "four required methods" but there are now five.

Testing

Additional test cases suggested: n=1, n > number of buckets, non-contiguous ranges with gaps across buckets, and a case exercising Phase 3 merge logic.

crates/core/datasets-raw/src/client.rs

crates/extractors/solana/src/client.rs

crates/core/worker-datasets-raw/src/dataset.rs

docs/code/extractors.md

crates/core/worker-datasets-raw/src/dataset.rs

- In preparation for a Solana extractor refactoring, introduce `fn bucket_size() -> Option<NonZeroU64>` - a new method on the `BlockStreamer` trait. - The return value of this method (if `Some`) is the size of the buckets in which this extractor fetches blocks. This matters for determining how to split block ranges for parallel streaming. For example, the Solana extractor would benefit from knowing that each instance of it is the only one responsible for buckets that encompass the range that the instance needs to extract. - Refactors the dump logic to respect `bucket_size` (if `Some`) when partitioning block ranges that should be dumped.

sistemd added the claude-review label Mar 3, 2026

claude bot reviewed Mar 3, 2026

View reviewed changes

sistemd added 5 commits March 3, 2026 18:53

fix(core): correctly propagate bucket_size() in block stream wrappers

c413d6e

fix(solana): make sure bucket size is non-zero

5a53782

nit: prevent overflow

306f20f

test: more test cases for bucketed partitioning

fd3e758

sistemd force-pushed the sistemd/block-streamer-bucket-size branch from 3a3b34a to fd3e758 Compare March 3, 2026 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(dataset-raw): bucketed dump partitions#1888

refactor(dataset-raw): bucketed dump partitions#1888
sistemd wants to merge 5 commits intomainfrom
sistemd/block-streamer-bucket-size

sistemd commented Mar 3, 2026

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sistemd commented Mar 3, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Review Summary

Critical

Bugs / Robustness

Consistency / Guidelines

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant