Skip to content

feat(benchmarking): adding ERC20 benchmarking test#3114

Open
chatton wants to merge 26 commits intomainfrom
cian/erc20-benchmark
Open

feat(benchmarking): adding ERC20 benchmarking test#3114
chatton wants to merge 26 commits intomainfrom
cian/erc20-benchmark

Conversation

@chatton
Copy link
Contributor

@chatton chatton commented Mar 2, 2026

Overview

Closes #3122

Adds TestERC20Throughput to measure ERC-20 token transfer throughput using spamoor for load generation. Reports MGas/s, TPS, and per-span latency breakdown.

Summary by CodeRabbit

  • Tests

    • Added ERC20 throughput benchmark test measuring transaction/gas/block metrics and end-to-end throughput.
    • Added benchmarking helpers for metric collection, summaries, trace aggregation, and batch result writing.
  • Chores

    • Added dedicated CI job to run the ERC20 throughput benchmark.
    • Streamlined test setup/teardown, improved trace collection, removed fragile span assertions, and updated a test dependency.

…mark

- Create test/e2e/benchmark/ subpackage with SpamoorSuite (testify/suite)
- Move spamoor smoke test into suite as TestSpamoorSmoke
- Split helpers into focused files: traces.go, output.go, metrics.go
- Introduce resultWriter for defer-based benchmark JSON output
- Export shared symbols from evm_test_common.go for cross-package use
- Restructure CI to fan-out benchmark jobs and fan-in publishing
- Run benchmarks on PRs only when benchmark-related files change
Resolve conflicts keeping the benchmark suite refactoring:
- benchmark.yml: keep path filters and suite-style test command
- evm_spamoor_smoke_test.go: keep deleted (moved to benchmark pkg)
- evm_test_common.go: keep exported types, drop writeTraceBenchmarkJSON
  (now in benchmark/output.go)
go test sets the working directory to the package under test, so the
env var should be relative to test/e2e/benchmark/, not test/e2e/.
go test treats all arguments after an unknown flag (--evm-binary) as
test binary args, so ./benchmark/ was never recognized as a package
pattern.
go test sets the cwd to the package directory (test/e2e/benchmark/),
so the binary path needs an extra parent traversal.
The benchmark package doesn't define the --binary flag that test-e2e
passes. It has its own CI workflow so it doesn't need to run here.
…nfig

collectBlockMetrics hit reth's 20K FilterLogs limit at high tx volumes.
Replace with direct header iteration over [startBlock, endBlock] and add
Phase 1 metrics: non-empty ratio, block interval p50/p99, gas/block and
tx/block p50/p99.

Optimize spamoor configuration for 100ms block time:
- --slot-duration 100ms, --startup-delay 0 on daemon
- throughput=50 per 100ms slot (500 tx/s per spammer)
- max_pending=50000 to avoid 3s block poll backpressure
- 5 staggered spammers with 50K txs each

Results: 55 MGas/s, 1414 TPS, 19.8% non-empty blocks (up from 6%).
- Move startBlock capture after spammer creation to exclude warm-up
- Replace 20s drain sleep with smart poll (waitForDrain)
- Add deleteAllSpammers cleanup to handle stale spamoor DB entries
- Lower trace sample rate to 10% to prevent Jaeger OOM
- make reth tag configurable via EV_RETH_TAG env var (default pr-140)
- fix OTLP config: remove duplicate env vars, use http/protobuf protocol
- use require.Eventually for host readiness polling
- rename requireHTTP to requireHostUp
- use non-fatal logging in resultWriter.flush deferred context
- fix stale doc comment (setupCommonEVMEnv -> SetupCommonEVMEnv)
- rename loop variable to avoid shadowing testing.TB convention
- add block/internal/executing/** to CI path trigger
- remove unused require import from output.go
# Conflicts:
#	scripts/test.mk
# Conflicts:
#	test/e2e/benchmark/suite_test.go
# Conflicts:
#	test/e2e/benchmark/suite_test.go
move EV_RETH_TAG resolution and rpc connection limits into setupEnv
so all benchmark tests share the same reth configuration. lower ERC20
spammer count from 5 to 2 to reduce resource contention on local
hardware while keeping the loop for easy scaling on dedicated infra.
- add blockMetricsSummary with summarize(), log(), and entries() methods
- add evNodeOverhead() for computing ProduceBlock vs ExecuteTxs overhead
- add collectTraces() suite method to deduplicate trace collection pattern
- add addEntries() convenience method on resultWriter
- slim TestERC20Throughput from ~217 to ~119 lines
- reuse collectTraces in TestSpamoorSmoke
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

Adds an ERC20 throughput benchmark and CI job, plus e2e benchmarking utilities for block-level metrics, throughput/percentile calculations, spamoor lifecycle control, trace collection (ev-node and optional ev-reth), and related test/suite updates.

Changes

Cohort / File(s) Summary
CI Workflow
.github/workflows/benchmark.yml
Added erc20-benchmark job to run the ERC20 throughput test (setup Go, Buildx, install just, build binaries, run Go test for ERC20 throughput with a 15m test timeout).
ERC20 Benchmark Tests & Helpers
test/e2e/benchmark/spamoor_erc20_test.go, test/e2e/benchmark/helpers.go, test/e2e/benchmark/output.go
New TestERC20Throughput; large helpers for blockMetrics collection, percentile/stats, throughput (MGas/s), polling utilities (waitForSpamoorDone, waitForDrain), ev-node overhead analysis, and batch resultWriter.addEntries.
Suite & Trace Infrastructure
test/e2e/benchmark/suite_test.go, test/e2e/benchmark/traces.go
Added rethTag resolution and default tag, traceResult type, collectTraces/tryCollectServiceTraces to gather ev-node and optional ev-reth traces; reduced sampling rate, adjusted node start args, and removed assertSpanNames.
Smoke Test Cleanup
test/e2e/benchmark/spamoor_smoke_test.go
Removed env-based reth tag logic, added deleteAllSpammers pre-check, simplified trace collection to unified collectTraces flow.
Dependency Update
test/e2e/go.mod
Bumped tastora dependency from v0.16.0 to v0.16.1-0.20260302131806-2816c7b82bfb.

Sequence Diagram

sequenceDiagram
    participant Test as Test/ERC20 Benchmark
    participant Spamoor as Spamoor
    participant Chain as EVM Chain
    participant Metrics as Metrics Collector
    participant Traces as Trace Collector

    Test->>Spamoor: configure & launch spammers
    Test->>Chain: record start block header
    Spamoor->>Chain: submit ERC20 transfer txs
    Test->>Spamoor: waitForSpamoorDone (monitor sent/failed)
    Test->>Chain: waitForDrain (observe empty blocks)
    Test->>Chain: record end block header
    Test->>Metrics: collectBlockMetrics(start,end)
    Metrics->>Chain: iterate blocks, gather gas/tx/intervals
    Metrics-->>Test: aggregated stats (MGas/s, p50/p99, non-empty ratio)
    Test->>Traces: collectTraces(ev-node, ev-reth)
    Traces-->>Test: spans & ev-node overhead
    Test->>Test: log summary, write benchmark entries, cleanup spammers
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • alpe
  • randygrok

Poem

🐰 Hopping through blocks with a ledger bright,

ERC20s dancing into the night,
Spamoor hums and packets take wing,
Metrics bloom — MGas and p50 sing,
I nibble traces and watch throughput light.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(benchmarking): adding ERC20 benchmarking test' clearly and specifically describes the main change: adding an ERC20 benchmarking test to the benchmarking suite.
Description check ✅ Passed The description provides an overview with a linked issue reference (#3122) and explains the test adds TestERC20Throughput to measure ERC-20 token transfer throughput using spamoor and reports MGas/s, TPS, and per-span latency metrics.
Linked Issues check ✅ Passed The PR implements all required objectives from issue #3122: adds TestERC20Throughput, uses spamoor for load generation, and reports MGas/s, TPS, and per-span latency breakdown metrics.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the ERC20 benchmark feature: new benchmark job in CI, test implementation, helper utilities, and minor cleanup of related benchmark infrastructure are all aligned with issue #3122 objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cian/erc20-benchmark

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

cosmossdk.io/math v1.5.3
github.com/celestiaorg/go-square/v3 v3.0.2
github.com/celestiaorg/tastora v0.16.0
github.com/celestiaorg/tastora v0.16.1-0.20260302131806-2816c7b82bfb
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working off a feature branch in tastora, we can add things to as we go instead of create a lot of separate releases for incrementally adding features.

// ev-reth reads OTEL_EXPORTER_OTLP_ENDPOINT and passes it directly
// to with_endpoint(). opentelemetry-otlp v0.31 HTTP exporter does
// not auto-append /v1/traces, so the full path is required.
"OTEL_EXPORTER_OTLP_ENDPOINT="+jg.Internal.IngestHTTPEndpoint()+"/v1/traces",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually this will be VictoriaTraces and not a local jaeger deployment.

// different tests are distinguishable in the same output file.
func (s *blockMetricsSummary) entries(prefix string) []entry {
return []entry{
{Name: prefix + " - MGas/s", Unit: "MGas/s", Value: s.AchievedMGas},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initially just dumping a bunch of stats, we can see what these look like when running against a real deployment to establish a baseline.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 3, 2026, 11:29 AM

@claude
Copy link
Contributor

claude bot commented Mar 2, 2026

Claude finished @chatton's task in 3m 1s —— View job


Code Review — feat(benchmarking): adding ERC20 benchmarking test

Overall the code is well-structured and readable with good doc comments throughout. The benchmark design (excluding warm-up from measurement window, using consecutive empty blocks as drain signal, percentile stats) is solid. A few issues worth addressing before merge:


🔴 Drain timeout silently passes the test

test/e2e/benchmark/spamoor_erc20_test.go:77–79

if err := waitForDrain(drainCtx, t.Logf, e.ethClient, 10); err != nil {
    t.Logf("warning: %v", err)
}

waitForDrain now correctly returns an error on timeout, but the call site only logs a warning and lets the test continue. If drain times out after 30 seconds, endBlock will be set before all transactions are included, causing throughput and gas metrics to be undercounted while the test still reports success. This violates determinism.

Change to s.Require().NoError(err, "mempool did not drain in time"), consistent with how waitForSpamoorDone is handled on the line above. Fix this →


🟡 Performance regression in CI needs investigation

The CI benchmark alert shows 9–18x slowdowns in SpamoorSmoke span durations (e.g. Engine.GetPayload went from ~1.6ms to ~30ms avg). This PR changes the sampling rate from 1 to 0.1, which could shift the reported averages by filtering out representative spans, but a 10x change in wall-clock duration suggests something more — possibly environment noise or a real regression. The fact that the two jobs (spamoor-benchmark and erc20-benchmark) run in parallel may be introducing resource contention on the shared runner.

Worth confirming whether this regression is reproducible or just run-to-run noise before merging.


🟡 BENCH_JSON_OUTPUT not set in the erc20-benchmark CI job

.github/workflows/benchmark.yml:97–101

- name: Run ERC20 throughput test
  run: |
    cd test/e2e && go test -tags evm \
      -run='^TestSpamoorSuite$/^TestERC20Throughput$' -v -timeout=15m \
      ./benchmark/ --evm-binary=../../../build/evm

Unlike the spamoor-benchmark job which sets BENCH_JSON_OUTPUT=spamoor_bench.json, the erc20-benchmark job does not set this variable. resultWriter.flush() will always be a no-op, so the w.addEntries(summary.entries(...)) and w.addSpans(...) calls accumulate data that is never persisted. The output file artifact step doesn't exist either — there's nothing to upload. This is noted in the TODO comment, but it means the job produces no machine-readable output at all in CI currently.


🟡 actions/setup-go version inconsistency in CI

.github/workflows/benchmark.yml:88

The erc20-benchmark job uses actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0 while evm-benchmark and spamoor-benchmark both use @4b73464bb391d4059bd26b0524d20df3927bd417 # v6.3.0. Minor, but should be consistent to avoid subtle build toolchain differences across jobs.


🟡 Hardcoded defaultRethTag = "pr-140" used in all CI runs

test/e2e/benchmark/suite_test.go:56

Acknowledged by the author as a temporary measure. The erc20-benchmark CI job also doesn't set EV_RETH_TAG, so the fallback "pr-140" is always used in CI. This is worth tracking as a follow-up issue (and a TODO comment has been added) since pr-140 is a branch-scoped Docker tag that could disappear unexpectedly.


🔵 Redundant loop variable copy in smoke test

test/e2e/benchmark/spamoor_smoke_test.go:63–65

for _, id := range ids {
    idToDelete := id   // unnecessary with Go 1.22+ per-iteration loop variables
    t.Cleanup(func() { _ = api.DeleteSpammer(idToDelete) })
}

The go.mod declares go 1.25.6, so loop variables are already per-iteration. The idToDelete := id copy is redundant. The ERC20 test correctly uses id directly in the cleanup closure. Fix this →


🔵 collectBlockMetrics makes two sequential RPC calls per block

test/e2e/benchmark/helpers.go:382–399

Each block incurs a HeaderByNumber call plus a TransactionCount call. For a long benchmark producing thousands of blocks, this is O(2N) sequential RPCs. Since TransactionCount requires the block hash (from the header), it's unavoidable without fetching full blocks, but BlockByNumber would return the tx count directly in one call. Not a blocking issue since this runs post-load, but worth noting for longer benchmarks.


🔵 Potential NaN in waitForSpamoorDone progress log (theoretical)

test/e2e/benchmark/helpers.go:175

sent/time.Since(start).Seconds()

If the ticker fires before 1 nanosecond has elapsed from start, this would be +Inf. With a 2-second poll interval this is effectively impossible, but a zero-check or using the elapsed variable already computed on the previous line (elapsed) would be more robust.


Summary

Severity Issue File
🔴 Drain timeout should fail the test, not warn spamoor_erc20_test.go:77
🟡 CI perf regression needs investigation benchmark.yml / CI alert
🟡 BENCH_JSON_OUTPUT not set → flush is always no-op benchmark.yml:97
🟡 setup-go version inconsistency across jobs benchmark.yml:88
🟡 pr-140 hardcoded tag used in all CI runs suite_test.go:56
🔵 Redundant loop variable copy spamoor_smoke_test.go:64
🔵 Two RPC calls per block in metrics collection helpers.go:382

The drain timeout issue is the only one that could meaningfully skew benchmark results. The rest are CI hygiene or minor nits. The code quality and architecture are good overall.

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
- name: Install just
uses: extractions/setup-just@v3

Check warning

Code scanning / CodeQL

Unpinned tag for a non-immutable Action in workflow Medium

Unpinned 3rd party Action 'Benchmarks' step
Uses Step
uses 'extractions/setup-just' with ref 'v3', not a pinned commit hash
@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.80%. Comparing base (042b75a) to head (a135436).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3114      +/-   ##
==========================================
+ Coverage   60.78%   60.80%   +0.01%     
==========================================
  Files         113      113              
  Lines       11693    11693              
==========================================
+ Hits         7108     7110       +2     
+ Misses       3782     3779       -3     
- Partials      803      804       +1     
Flag Coverage Δ
combined 60.80% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Spamoor Trace Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.

Benchmark suite Current: a135436 Previous: 042b75a Ratio
SpamoorSmoke - BlockExecutor.ApplyBlock (avg) 68722.21875 us 7392.792792792793 us 9.30
SpamoorSmoke - BlockExecutor.CreateBlock (avg) 8.8125 us 3.24024024024024 us 2.72
SpamoorSmoke - BlockExecutor.ProduceBlock (avg) 69674.09375 us 8343.466867469879 us 8.35
SpamoorSmoke - DA.Submit (avg) 1123.2222222222222 us 707.7727272727273 us 1.59
SpamoorSmoke - DASubmitter.SubmitData (avg) 4631 us 1043.258064516129 us 4.44
SpamoorSmoke - Engine.GetPayload (avg) 29268.6875 us 1667.3753753753754 us 17.55
SpamoorSmoke - Engine.NewPayload (avg) 36524.78125 us 2610.774774774775 us 13.99
SpamoorSmoke - Executor.ExecuteTxs (avg) 68710.375 us 7375.666666666667 us 9.32
SpamoorSmoke - Executor.GetTxs (avg) 10623.42857142857 us 1272.2777777777778 us 8.35
SpamoorSmoke - TxPool.GetTxs (avg) 10479.857142857143 us 1250.4444444444443 us 8.38

This comment was automatically generated by workflow using github-action-benchmark.

@chatton chatton marked this pull request as ready for review March 2, 2026 14:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
.github/workflows/benchmark.yml (1)

34-34: ⚠️ Potential issue | 🟠 Major

Pin extractions/setup-just to an immutable commit SHA in all jobs.

Using @v3 is mutable and can drift unexpectedly across runs. Pin to a 40-char SHA for reproducibility and supply-chain safety.

#!/bin/bash
# List workflow action refs that are not pinned to a 40-char commit SHA.
rg -nP '^\s*-\s*uses:\s*[^@]+@([^\s#]+)' .github/workflows \
| gawk '!/@[0-9a-f]{40}([[:space:]]|$)/ {print}'

Also applies to: 66-66, 94-94

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml at line 34, Replace the mutable action ref
"extractions/setup-just@v3" with an immutable 40-character commit SHA (e.g.,
extractions/setup-just@<40-char-sha>) so the workflow is pinned; update every
occurrence of that action (the other instances flagged in the workflow) to use
the same full commit SHA and verify no other actions in the workflow use short
tags like `@v3`, `@v2`, or `@main`—ensure the new refs are 40 hex chars to guarantee
reproducibility and supply-chain safety.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/benchmark/helpers.go`:
- Around line 201-214: Change waitForDrain to return an error instead of only
logging on timeout: update its signature (waitForDrain(ctx context.Context, log
func(string, ...any), client *ethclient.Client, consecutiveEmpty int) error),
replace the ctx.Done() branch to return a descriptive error (including emptyRun
and consecutiveEmpty) rather than calling log and returning, and update all call
sites (e.g., the test call in spamoor_erc20_test.go) to assert/handle the
returned error (for example using s.Require().NoError(waitForDrain(...))). Also
apply the same pattern to the other similar timeout branch noted in the file so
all timeouts propagate errors to callers.
- Around line 290-294: The AchievedTPS calculation in the blockMetricsSummary
return unconditionally divides by ss.Seconds(), which can be zero; update the
code that builds the blockMetricsSummary so you compute seconds := ss.Seconds()
(or similar), check if seconds == 0 and set AchievedTPS = 0 (or a safe default)
when zero, otherwise set AchievedTPS = float64(m.TotalTxCount) / seconds; modify
the return to use that guarded value (reference blockMetricsSummary,
AchievedTPS, ss.Seconds(), and m.TotalTxCount).

In `@test/e2e/benchmark/suite_test.go`:
- Around line 55-63: The current rethTag() function falls back to the hardcoded
defaultRethTag ("pr-140"), which is unsafe for CI; update the implementation so
it either (A) replaces defaultRethTag with an immutable stable release tag or
digest (e.g., a real release string) and use that as the fallback, or (B) remove
the fallback and make rethTag() fail fast when EV_RETH_TAG is not set
(log.Fatalf or return an error) to force CI configuration; change the symbol
defaultRethTag and the rethTag() function accordingly and ensure callers handle
the fatal/error behavior if you choose the fail-fast option.

---

Duplicate comments:
In @.github/workflows/benchmark.yml:
- Line 34: Replace the mutable action ref "extractions/setup-just@v3" with an
immutable 40-character commit SHA (e.g., extractions/setup-just@<40-char-sha>)
so the workflow is pinned; update every occurrence of that action (the other
instances flagged in the workflow) to use the same full commit SHA and verify no
other actions in the workflow use short tags like `@v3`, `@v2`, or `@main`—ensure the
new refs are 40 hex chars to guarantee reproducibility and supply-chain safety.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 805f927 and 4c7b7e1.

⛔ Files ignored due to path filters (1)
  • test/e2e/go.sum is excluded by !**/*.sum
📒 Files selected for processing (8)
  • .github/workflows/benchmark.yml
  • test/e2e/benchmark/helpers.go
  • test/e2e/benchmark/output.go
  • test/e2e/benchmark/spamoor_erc20_test.go
  • test/e2e/benchmark/spamoor_smoke_test.go
  • test/e2e/benchmark/suite_test.go
  • test/e2e/benchmark/traces.go
  • test/e2e/go.mod

Comment on lines +55 to +63
// TODO: temporary hardcoded tag, will be replaced with a proper release tag
const defaultRethTag = "pr-140"

func rethTag() string {
if tag := os.Getenv("EV_RETH_TAG"); tag != "" {
return tag
}
return defaultRethTag
}
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for EV_RETH_TAG usage across the codebase
rg -n 'EV_RETH_TAG|defaultRethTag|rethTag\(\)|WithTag\(rethTag' --type-list | head -20

Repository: evstack/ev-node

Length of output: 448


🏁 Script executed:

# Search for CI/workflow files that might set EV_RETH_TAG
find . -type f \( -name "*.yml" -o -name "*.yaml" \) | head -20

Repository: evstack/ev-node

Length of output: 759


🏁 Script executed:

# Check the actual test file and surrounding context
cat -n test/e2e/benchmark/suite_test.go | head -80

Repository: evstack/ev-node

Length of output: 2877


🏁 Script executed:

# Search for any references to pr-140 in the codebase
rg -n 'pr-140|pr-\d+' --type-list

Repository: evstack/ev-node

Length of output: 448


🏁 Script executed:

# Search for EV_RETH_TAG in workflows
rg 'EV_RETH_TAG' .github/workflows/

Repository: evstack/ev-node

Length of output: 41


🏁 Script executed:

# Search for rethTag usage in the codebase
rg 'rethTag|WithTag'

Repository: evstack/ev-node

Length of output: 173


🏁 Script executed:

# Check the benchmark workflow specifically
cat -n .github/workflows/benchmark.yml

Repository: evstack/ev-node

Length of output: 8038


🏁 Script executed:

# Search for any references to pr-140 or similar tag patterns
rg 'pr-140|pr-\d+|reth.*tag'

Repository: evstack/ev-node

Length of output: 123


Replace hardcoded reth tag with immutable release tag or require explicit CI configuration.

The fallback to "pr-140" is the only active code path in CI—EV_RETH_TAG is never configured in .github/workflows/benchmark.yml (lines 69–73). This makes benchmark runs non-deterministic and fragile: if that PR-scoped tag is removed or retargeted, all benchmarks fail silently or use unintended image versions. Either use a stable release tag/digest, or explicitly require EV_RETH_TAG to be set (fail fast if unset) to enforce reproducibility and match the test file determinism guideline.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/suite_test.go` around lines 55 - 63, The current rethTag()
function falls back to the hardcoded defaultRethTag ("pr-140"), which is unsafe
for CI; update the implementation so it either (A) replaces defaultRethTag with
an immutable stable release tag or digest (e.g., a real release string) and use
that as the fallback, or (B) remove the fallback and make rethTag() fail fast
when EV_RETH_TAG is not set (log.Fatalf or return an error) to force CI
configuration; change the symbol defaultRethTag and the rethTag() function
accordingly and ensure callers handle the fatal/error behavior if you choose the
fail-fast option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be addressed in a follow up when there is a proper tag to use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chatton, sounds good! Since you'll be addressing this in a follow-up, you might want to track it with a GitHub issue or add a TODO comment in the code pointing to the future work—it'll make it easier to find when the proper release tag is available. 👍


🧠 Learnings used
Learnt from: CR
Repo: evstack/ev-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T20:31:15.823Z
Learning: Applies to **/*_test.go : Ensure tests are deterministic

@chatton chatton requested a review from alpe March 2, 2026 15:30
@chatton chatton mentioned this pull request Mar 2, 2026
8 tasks
- waitForDrain returns an error on timeout instead of silently logging
- guard AchievedTPS computation when steady-state duration is zero
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/e2e/benchmark/spamoor_erc20_test.go (1)

77-79: ⚠️ Potential issue | 🟠 Major

Fail the benchmark when mempool drain times out.

Continuing after waitForDrain timeout can undercount processed txs/gas while still passing, which makes reported throughput non-deterministic. Use s.Require().NoError(err, "mempool did not drain in time") here instead of warning-only logging.

As per coding guidelines, **/*_test.go: Ensure tests are deterministic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/spamoor_erc20_test.go` around lines 77 - 79, The benchmark
currently only logs a warning when waitForDrain(drainCtx, t.Logf, e.ethClient,
10) returns an error, which can allow timeouts to undercount throughput; replace
the warning-only handling with a hard test failure by calling
s.Require().NoError(err, "mempool did not drain in time") (use the same err from
waitForDrain) so the benchmark fails deterministically on drain timeout; locate
the call to waitForDrain and swap the t.Logf branch to invoke
s.Require().NoError(err, "mempool did not drain in time") referencing
waitForDrain, drainCtx, e.ethClient and err.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/e2e/benchmark/spamoor_erc20_test.go`:
- Around line 77-79: The benchmark currently only logs a warning when
waitForDrain(drainCtx, t.Logf, e.ethClient, 10) returns an error, which can
allow timeouts to undercount throughput; replace the warning-only handling with
a hard test failure by calling s.Require().NoError(err, "mempool did not drain
in time") (use the same err from waitForDrain) so the benchmark fails
deterministically on drain timeout; locate the call to waitForDrain and swap the
t.Logf branch to invoke s.Require().NoError(err, "mempool did not drain in
time") referencing waitForDrain, drainCtx, e.ethClient and err.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c7b7e1 and b14085e.

📒 Files selected for processing (2)
  • test/e2e/benchmark/helpers.go
  • test/e2e/benchmark/spamoor_erc20_test.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.github/workflows/benchmark.yml (1)

94-94: ⚠️ Potential issue | 🟠 Major

Pin extractions/setup-just to an immutable commit SHA.

Line [94] uses a mutable tag (@v3) for a third-party action, which weakens supply-chain integrity.

🔒 Suggested fix
       - name: Install just
-        uses: extractions/setup-just@v3
+        uses: extractions/setup-just@<full-40-char-commit-sha> # v3
#!/bin/bash
# Verify mutable vs pinned refs for setup-just in this workflow.
rg -nP 'uses:\s*extractions/setup-just@v[0-9]+(\.[0-9]+)?(\.[0-9]+)?\b' .github/workflows/benchmark.yml
rg -nP 'uses:\s*extractions/setup-just@[0-9a-f]{40}\b' .github/workflows/benchmark.yml

# Expected after fix:
# - First command: no matches
# - Second command: one match per setup-just usage
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml at line 94, Replace the mutable action
reference "uses: extractions/setup-just@v3" with a pinned immutable commit SHA;
locate the workflow line that contains uses: extractions/setup-just@v3 and
change it to uses: extractions/setup-just@<commit-sha> (the full 40-character
commit hash from the action repo) so the workflow references a specific
immutable revision instead of the v3 tag.
🧹 Nitpick comments (1)
.github/workflows/benchmark.yml (1)

88-88: Align actions/setup-go version in erc20-benchmark job with other benchmark jobs.

The erc20-benchmark job uses v6.2.0 while evm-benchmark and spamoor-benchmark use v6.3.0, creating inconsistency across benchmark tooling.

♻️ Suggested fix
-        uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
+        uses: actions/setup-go@4b73464bb391d4059bd26b0524d20df3927bd417 # v6.3.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml at line 88, Update the actions/setup-go step
in the erc20-benchmark job to use the same version as the other benchmark jobs:
replace the pinned ref
"actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5" (v6.2.0) with the
ref used by evm-benchmark and spamoor-benchmark (v6.3.0) so the setup-go action
version is consistent across benchmark jobs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/workflows/benchmark.yml:
- Line 94: Replace the mutable action reference "uses:
extractions/setup-just@v3" with a pinned immutable commit SHA; locate the
workflow line that contains uses: extractions/setup-just@v3 and change it to
uses: extractions/setup-just@<commit-sha> (the full 40-character commit hash
from the action repo) so the workflow references a specific immutable revision
instead of the v3 tag.

---

Nitpick comments:
In @.github/workflows/benchmark.yml:
- Line 88: Update the actions/setup-go step in the erc20-benchmark job to use
the same version as the other benchmark jobs: replace the pinned ref
"actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5" (v6.2.0) with the
ref used by evm-benchmark and spamoor-benchmark (v6.3.0) so the setup-go action
version is consistent across benchmark jobs.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b14085e and af2aebd.

⛔ Files ignored due to path filters (1)
  • test/e2e/go.sum is excluded by !**/*.sum
📒 Files selected for processing (2)
  • .github/workflows/benchmark.yml
  • test/e2e/go.mod
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/e2e/go.mod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add ERC20 throughput benchmark test

1 participant