refactor: move spamoor benchmark into testify suite by chatton · Pull Request #3107 · evstack/ev-node

chatton · 2026-02-25T12:16:21Z

Overview

Introduces a benchmarking package within the e2e dir a testsuite we can add additional spamoor tests too.

Summary by CodeRabbit

Tests
- Added comprehensive end-to-end benchmark test suite for load testing and performance measurement with tracing integration.
- Reorganized benchmark tests into dedicated test directory with enhanced metrics collection and output utilities.
Refactor
- Refactored shared test utilities into public APIs for broader reuse across test suites.
- Updated test infrastructure to support trace span extraction and validation.
Chores
- Updated CI/CD workflow to execute benchmarks on relevant code changes with artifact management for benchmark data.

…mark - Create test/e2e/benchmark/ subpackage with SpamoorSuite (testify/suite) - Move spamoor smoke test into suite as TestSpamoorSmoke - Split helpers into focused files: traces.go, output.go, metrics.go - Introduce resultWriter for defer-based benchmark JSON output - Export shared symbols from evm_test_common.go for cross-package use - Restructure CI to fan-out benchmark jobs and fan-in publishing - Run benchmarks on PRs only when benchmark-related files change

Resolve conflicts keeping the benchmark suite refactoring: - benchmark.yml: keep path filters and suite-style test command - evm_spamoor_smoke_test.go: keep deleted (moved to benchmark pkg) - evm_test_common.go: keep exported types, drop writeTraceBenchmarkJSON (now in benchmark/output.go)

coderabbitai · 2026-02-25T12:28:07Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request reorganizes and refactors the Spamoor smoke test from a standalone file into a dedicated test/e2e/benchmark/ directory structure. It exports previously private test helper functions and types from evm_test_common.go, adds new benchmarking utilities for trace parsing and result output, updates the CI workflow with path-based triggers and adjusted artifact paths, and deletes the legacy test file while moving its functionality into the new modular structure.

Changes

Cohort / File(s)	Summary
CI Workflow Updates `.github/workflows/benchmark.yml`	Added path filters for PR triggers, updated artifact paths from `test/e2e/` to `test/e2e/benchmark/`, adjusted Spamoor test invocation to include both suite and smoke variants, and modified working directories for benchmark execution.
New Benchmark Test Suite Infrastructure `test/e2e/benchmark/suite_test.go`	Introduced `SpamoorSuite` test suite with Docker-based environment setup, including Jaeger tracing, reth node with OTLP, sequencer, and Spamoor node; added `SetupTest()` lifecycle and `collectServiceTraces()` helper for retrieving traces from services.
New Benchmark Test Implementation `test/e2e/benchmark/spamoor_smoke_test.go`	Added `TestSpamoorSmoke()` test method that configures spammers (EOATX and GasBurnerTX), collects metrics and traces from ev-node-smoke and ev-reth, validates expected spans, and asserts on transaction outcomes.
New Benchmark Helper Utilities `test/e2e/benchmark/metrics.go`, `test/e2e/benchmark/output.go`, `test/e2e/benchmark/traces.go`	Added polling and host readiness helpers, Prometheus counter aggregation, JSON benchmark result writing, Jaeger span extraction/conversion, and span name/duration validation utilities for e2e benchmarks.
Test Common API Exports `test/e2e/evm_test_common.go`	Exported previously private functions (`setupSequencerNode` → `SetupSequencerNode`, `setupCommonEVMEnv` → `SetupCommonEVMEnv`, `aggregateSpanStats` → `AggregateSpanStats`, `printTraceReport` → `PrintTraceReport`) and types (`traceSpan` → `TraceSpan`, `spanStats` → `SpanStats` with renamed fields); removed legacy `benchmarkEntry` and `writeTraceBenchmarkJSON`.
Test File API Updates `test/e2e/evm_contract_bench_test.go`, `test/e2e/evm_force_inclusion_e2e_test.go`, `test/e2e/evm_full_node_e2e_test.go`, `test/e2e/failover_e2e_test.go`	Updated function calls to use exported versions of setup helpers (`SetupCommonEVMEnv`, `SetupSequencerNode`) and updated interface references to renamed `TraceSpan` type.
Removed Legacy Test `test/e2e/evm_spamoor_smoke_test.go`	Deleted entire file (287 lines) containing original Spamoor smoke test, helper types (jaegerSpan), and utility functions; functionality migrated to new `test/e2e/benchmark/` structure.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

chore: gather spans from both ev-node and ev-reth in spamoor test #3099 — Directly related as both PRs modify Spamoor smoke e2e test and jaegerSpan/trace extraction helpers for benchmarking infrastructure.
feat: adding spammoor test to benchmark #3105 — Related to Spamoor benchmarking pipeline changes, including smoke test reorganization and benchmark JSON/trace handling utilities.

Suggested labels

T:testing, T:code-hygiene

Suggested reviewers

alpe
auricom
julienrbrt

Poem

🐰 Tests hop to a new home so neat,
With traces and benchmarks all set to compete,
In benchmark/ they gather and shine,
Public APIs exported, refactored so fine,
Spamoor smoke dances to a rhythm divine! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 78.26% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description provides a brief overview but lacks detail about implementation, changes made, and specific components added. It references adding a benchmarking package and testify suite but does not fully explain the scope of modifications.	Expand the description to clarify what components were added/refactored, why the testify suite improves the structure, and how this facilitates adding future tests.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main change: moving the Spamoor benchmark test into a testify suite structure within a new benchmarking package.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cian/bench-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-25T12:28:55Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Mar 2, 2026, 10:21 AM

claude · 2026-02-25T12:28:55Z

Claude finished @chatton's task in 2m 46s —— View job

Code Review: `refactor: move spamoor benchmark into testify suite`

Overall this is a clean refactor that improves the benchmark structure meaningfully. The testify suite approach, isolated Docker environments per test, and proper cleanup are all good patterns. A few issues worth addressing before merge.

Medium: `t.Context()` used inside `t.Cleanup` — context already cancelled

suite_test.go:67 and suite_test.go:115:

t.Cleanup(func() { _ = jg.Remove(t.Context()) })   // line 67
t.Cleanup(func() { _ = spNode.Remove(t.Context()) }) // line 115

t.Context() is cancelled when the test function returns. Cleanup functions run after the test function returns, so the context passed to Remove is already done. Docker client calls receiving a cancelled context may fail immediately and silently (the _ discards the error). Use context.Background() with a short timeout instead:

t.Cleanup(func() {
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    _ = jg.Remove(ctx)
})

Fix this →

Minor: Doc comments still reference old unexported names

evm_test_common.go:306: // setupSequencerNode initializes... → should be // SetupSequencerNode initializes...

evm_test_common.go:532: // SetupCommonEVMEnv creates... is correct ✓ (this one was already fixed).

The comment block at line 306 starts with the old unexported name. Per Go conventions the first word of a doc comment must be the exported symbol name. This will cause golangci-lint (godot/revive) to flag it.

Fix this →

Minor: Mixed-service span aggregation loses per-service attribution

spamoor_smoke_test.go:90:

w.addSpans(append(evNodeSpans, evRethSpans...))

addSpans groups by span name via AggregateSpanStats. If both services happen to emit a span with the same name (e.g. execute_tx) their timings are merged into one bucket, making it impossible to tell which service is slower. Both services happen to have distinct span names right now, but this is fragile as more spans are added.

Consider calling addSpans separately per service, or label spans with their source. Alternatively, document the assumption that span names are unique across services.

Positive: Things done well

output.go:66-81: The flush() deferred method correctly uses t.Logf for soft errors instead of require.NoError, which would call runtime.Goexit from a deferred function after test completion. This is the right approach.
metrics.go:19: Correctly adopted require.Eventually (addressing the earlier CodeRabbit nitpick) — clean and cancellable.
Test isolation: Each SetupTest() call creates a fresh Docker client and network ID, preventing test interference. Good pattern.
Closure variable capture: spamoor_smoke_test.go:70-72 correctly shadows idToDelete to avoid the classic loop-variable closure bug.
Context in collectServiceTraces: Uses an explicit context.WithTimeout with 3-minute bound on a method that could block on network I/O — good practice.
.just/test.just: The grep -v /benchmark exclusion of benchmark tests from test-e2e is a good guard to avoid running expensive Docker benchmarks in the regular e2e suite.
CI workflow path filters: The block/internal/executing/** path addition ensures block executor benchmark changes still trigger the workflow.

FYI: Acknowledged issues (no action needed)

time.Sleep(60s) in the smoke test — acceptable given the confirmed need to let Spamoor generate traffic before querying Jaeger.
rethTag = "pr-140" with EV_RETH_TAG env override and TODO comment — clear enough for a transitional state.
OTLP endpoint configuration (OTEL_EXPORTER_OTLP_ENDPOINT with /v1/traces suffix) — author confirmed manual testing with both reth versions, so the non-spec-compliant-but-working config is acceptable for now.

chatton · 2026-02-25T12:29:33Z

test/e2e/benchmark/spamoor_smoke_test.go

+	defer w.flush()
+
+	e := s.setupEnv(config{
+		rethTag:     "pr-140",


the PR the traces were added, we can change this to a proper tag in a follow up.

chatton · 2026-02-25T12:32:07Z

.github/workflows/benchmark.yml

  pull_request:
    branches:
      - main
+    paths:


only run benchmark tests for PRs when relevant files are changed.

go test sets the working directory to the package under test, so the env var should be relative to test/e2e/benchmark/, not test/e2e/.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

test/e2e/benchmark/spamoor_smoke_test.go (1)
20-23: ⚠️ Potential issue | 🟠 Major

Hardcoded rethTag: "pr-140" will become stale and cause opaque failures.

This tag refers to a specific container image built for a PR. Once that image is cleaned up or the tag is overwritten, the test will silently fail to start or pull the container with a confusing error. Expose this via an environment variable or CLI flag so it can be updated without a code change.
🔧 Suggested refactor
+rethTagFromEnv := os.Getenv("BENCH_RETH_TAG")
+if rethTagFromEnv == "" {
+    rethTagFromEnv = "pr-140"
+}
 e := s.setupEnv(config{
-    rethTag:     "pr-140",
+    rethTag:     rethTagFromEnv,
     serviceName: "ev-node-smoke",
 })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/spamoor_smoke_test.go` around lines 20 - 23, Replace the
hardcoded rethTag value ("pr-140") in the test setup by reading it from
configuration (environment variable or test CLI flag) so tests don't break when
image tags change: update the call site where s.setupEnv(config{ rethTag:
"pr-140", ... }) is used in spamoor_smoke_test.go to instead pull the tag from
an env var (eg. os.Getenv("RETH_TAG")) or a provided flag, and adjust the config
struct and any test bootstrap that constructs config to accept and pass that
dynamic value (look for the config type, its rethTag field, and the setupEnv
function to make the change).

🧹 Nitpick comments (5)

test/e2e/benchmark/metrics.go (1)

15-34: Consider accepting a context.Context for cancellable polling.

requireHTTP drives a blocking polling loop with no way to interrupt it if the parent test is cancelled mid-flight (e.g., via t.Context() deadline or a FailNow from another goroutine). For long timeouts this can leave dangling goroutines/blocked tests.

🔧 Suggested refactor

-func requireHTTP(t testing.TB, url string, timeout time.Duration) {
+func requireHTTP(t testing.TB, ctx context.Context, url string, timeout time.Duration) {
 	t.Helper()
 	client := &http.Client{Timeout: 200 * time.Millisecond}
-	deadline := time.Now().Add(timeout)
+	ctx, cancel := context.WithTimeout(ctx, timeout)
+	defer cancel()
 	var lastErr error
-	for time.Now().Before(deadline) {
+	ticker := time.NewTicker(100 * time.Millisecond)
+	defer ticker.Stop()
+	for {
+		select {
+		case <-ctx.Done():
+			t.Fatalf("daemon not ready at %s: %v", url, lastErr)
+			return
+		case <-ticker.C:
+		}
 		resp, err := client.Get(url)
 		if err == nil {
 			_ = resp.Body.Close()
 			if resp.StatusCode >= 200 && resp.StatusCode < 300 {
 				return
 			}
 			lastErr = fmt.Errorf("status %d", resp.StatusCode)
 		} else {
 			lastErr = err
 		}
-		time.Sleep(100 * time.Millisecond)
 	}
-	t.Fatalf("daemon not ready at %s: %v", url, lastErr)
 }

As per coding guidelines, "Use context.Context for cancellation in Go."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/metrics.go` around lines 15 - 34, The requireHTTP function
currently blocks and can't be cancelled; change its signature to accept a
context.Context (e.g., requireHTTP(ctx context.Context, t testing.TB, url
string, timeout time.Duration) or use t.Context() inside) and use that context
for cancellation inside the polling loop: create requests with
http.NewRequestWithContext or attach ctx to the client.Do call, replace
time.Sleep with a select on ctx.Done() and time.After(100*time.Millisecond),
check ctx.Err() and call t.Fatalf when the context is cancelled, and ensure you
still honor the timeout logic and preserve the lastErr handling in the loop.

test/e2e/benchmark/spamoor_smoke_test.go (1)

68-70: Fixed 60s time.Sleep is fragile for trace accumulation.

On fast machines traces may be ready much sooner; on slow CI runners traces may not be ready even after 60s. Consider polling the Jaeger API with a timeout instead of an unconditional sleep, making the test both faster in the happy path and more robust in the slow path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/spamoor_smoke_test.go` around lines 68 - 70, Replace the
fixed time.Sleep(60 * time.Second) wait with a polling loop that queries the
Jaeger API until the expected traces/spans from spamoor appear or a context
deadline elapses; implement this by replacing the direct call to time.Sleep with
a function (e.g., waitForTraces(ctx, serviceName, expectedSpanCount)) that uses
a context with timeout, exponential/backoff retries, and HTTP calls to the
Jaeger query endpoint, returning early when traces are observed and failing the
test only if the timeout is reached—update the test to call waitForTraces
instead of time.Sleep and use the existing service identifier used in the test
to locate traces.

test/e2e/benchmark/traces.go (1)

23-47: Loop variable t shadows the conventional test-helper name convention.

Using t as a range variable in extractSpansFromTraces (which imports "testing") is not a bug but is slightly confusing, particularly for readers accustomed to t meaning testing.TB. A more descriptive name like traceEntry or raw would improve clarity.
🔧 Suggested refactor
-	for _, t := range traces {
-		traceMap, ok := t.(map[string]any)
+	for _, traceEntry := range traces {
+		traceMap, ok := traceEntry.(map[string]any)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/traces.go` around lines 23 - 47, In
extractSpansFromTraces, rename the range variable t to a clearer name (e.g.,
traceEntry or rawTrace) to avoid shadowing the conventional testing variable t;
update all uses of t in that function (the outer loop in extractSpansFromTraces
and its casts to traceMap) so the logic remains unchanged but the variable is
more descriptive and less confusing to readers familiar with testing.TB.

test/e2e/benchmark/output.go (1)

67-76: require.NoError inside a deferred flush will call runtime.Goexit on failure, which is safe but worth documenting.

When flush is called via defer after a test has already failed, require.NoError → FailNow → runtime.Goexit is invoked from within the deferred function, which Go handles correctly (deferred calls complete). No behavioral bug here, but the interaction with require vs os.WriteFile returning an error silently (by being suppressed) could be surprising. Consider logging the error path rather than asserting fatally so benchmark output failures don't obscure the real test failure.

🔧 Suggested refactor

 func (w *resultWriter) flush() {
 	outputPath := os.Getenv("BENCH_JSON_OUTPUT")
 	if outputPath == "" || len(w.entries) == 0 {
 		return
 	}
 
 	data, err := json.MarshalIndent(w.entries, "", "  ")
-	require.NoError(w.t, err, "failed to marshal benchmark JSON")
-	require.NoError(w.t, os.WriteFile(outputPath, data, 0644), "failed to write benchmark JSON to %s", outputPath)
+	if err != nil {
+		w.t.Logf("WARNING: failed to marshal benchmark JSON: %v", err)
+		return
+	}
+	if err := os.WriteFile(outputPath, data, 0644); err != nil {
+		w.t.Logf("WARNING: failed to write benchmark JSON to %s: %v", outputPath, err)
+		return
+	}
 	w.t.Logf("wrote %d benchmark entries to %s", len(w.entries), outputPath)
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/output.go` around lines 67 - 76, In resultWriter.flush,
avoid using require.NoError (which calls FailNow/Goexit when deferred) for the
json.MarshalIndent and os.WriteFile errors; instead check the error values from
json.MarshalIndent and os.WriteFile and log them non-fatally (e.g., use
w.t.Errorf or w.t.Logf with descriptive messages referencing outputPath and the
error) so flush reports failures without invoking runtime.Goexit from a deferred
context; update the calls that currently use require.NoError(w.t, ...) to
explicit if err != nil { w.t.Errorf("...: %v", err); return } or similar logging
behavior for both the marshal and write steps in resultWriter.flush.

test/e2e/evm_test_common.go (1)

532-534: Doc comment references the old unexported name.

Line 532 says setupCommonEVMEnv but the function is now exported as SetupCommonEVMEnv. Per Go convention, the doc comment should start with the exported identifier name.

Proposed fix

-// setupCommonEVMEnv creates and initializes ev-reth instances, while also initializing the local ev-node instance
+// SetupCommonEVMEnv creates and initializes ev-reth instances, while also initializing the local ev-node instance

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/evm_test_common.go` around lines 532 - 534, The doc comment
currently starts with the old unexported name; update the comment to begin with
the exported function name SetupCommonEVMEnv so it follows Go doc conventions
(e.g., change the leading sentence from "setupCommonEVMEnv creates..." to
"SetupCommonEVMEnv creates..."), leaving the rest of the description intact and
ensuring the comment now documents the exported SetupCommonEVMEnv function.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/benchmark.yml:
- Around line 11-16: Update the PR path filter in the GitHub Actions workflow to
include the block executor benchmark sources so changes to that package trigger
the workflow; specifically add the path pattern 'block/internal/executing/**'
(or more broadly 'block/**' if preferred) to the existing paths array in
.github/workflows/benchmark.yml so the job that runs `go test
./block/internal/executing/...` will trigger on relevant PRs.

In `@test/e2e/benchmark/suite_test.go`:
- Around line 73-78: The OTLP env setup in the b.WithTag(...).WithEnv call is
wrong: change the value for "OTEL_EXPORTER_OTLP_ENDPOINT" to use the base ingest
URL (remove the "/v1/traces" suffix) so the SDK can append per-signal paths,
keep "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT" as-is (full URL), and update
"OTEL_EXPORTER_OTLP_PROTOCOL" and "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL" from
"http" to the spec-correct "http/protobuf" (or another valid spec value like
"grpc" or "http/json" if intended).

---

Duplicate comments:
In `@test/e2e/benchmark/spamoor_smoke_test.go`:
- Around line 20-23: Replace the hardcoded rethTag value ("pr-140") in the test
setup by reading it from configuration (environment variable or test CLI flag)
so tests don't break when image tags change: update the call site where
s.setupEnv(config{ rethTag: "pr-140", ... }) is used in spamoor_smoke_test.go to
instead pull the tag from an env var (eg. os.Getenv("RETH_TAG")) or a provided
flag, and adjust the config struct and any test bootstrap that constructs config
to accept and pass that dynamic value (look for the config type, its rethTag
field, and the setupEnv function to make the change).

---

Nitpick comments:
In `@test/e2e/benchmark/metrics.go`:
- Around line 15-34: The requireHTTP function currently blocks and can't be
cancelled; change its signature to accept a context.Context (e.g.,
requireHTTP(ctx context.Context, t testing.TB, url string, timeout
time.Duration) or use t.Context() inside) and use that context for cancellation
inside the polling loop: create requests with http.NewRequestWithContext or
attach ctx to the client.Do call, replace time.Sleep with a select on ctx.Done()
and time.After(100*time.Millisecond), check ctx.Err() and call t.Fatalf when the
context is cancelled, and ensure you still honor the timeout logic and preserve
the lastErr handling in the loop.

In `@test/e2e/benchmark/output.go`:
- Around line 67-76: In resultWriter.flush, avoid using require.NoError (which
calls FailNow/Goexit when deferred) for the json.MarshalIndent and os.WriteFile
errors; instead check the error values from json.MarshalIndent and os.WriteFile
and log them non-fatally (e.g., use w.t.Errorf or w.t.Logf with descriptive
messages referencing outputPath and the error) so flush reports failures without
invoking runtime.Goexit from a deferred context; update the calls that currently
use require.NoError(w.t, ...) to explicit if err != nil { w.t.Errorf("...: %v",
err); return } or similar logging behavior for both the marshal and write steps
in resultWriter.flush.

In `@test/e2e/benchmark/spamoor_smoke_test.go`:
- Around line 68-70: Replace the fixed time.Sleep(60 * time.Second) wait with a
polling loop that queries the Jaeger API until the expected traces/spans from
spamoor appear or a context deadline elapses; implement this by replacing the
direct call to time.Sleep with a function (e.g., waitForTraces(ctx, serviceName,
expectedSpanCount)) that uses a context with timeout, exponential/backoff
retries, and HTTP calls to the Jaeger query endpoint, returning early when
traces are observed and failing the test only if the timeout is reached—update
the test to call waitForTraces instead of time.Sleep and use the existing
service identifier used in the test to locate traces.

In `@test/e2e/benchmark/traces.go`:
- Around line 23-47: In extractSpansFromTraces, rename the range variable t to a
clearer name (e.g., traceEntry or rawTrace) to avoid shadowing the conventional
testing variable t; update all uses of t in that function (the outer loop in
extractSpansFromTraces and its casts to traceMap) so the logic remains unchanged
but the variable is more descriptive and less confusing to readers familiar with
testing.TB.

In `@test/e2e/evm_test_common.go`:
- Around line 532-534: The doc comment currently starts with the old unexported
name; update the comment to begin with the exported function name
SetupCommonEVMEnv so it follows Go doc conventions (e.g., change the leading
sentence from "setupCommonEVMEnv creates..." to "SetupCommonEVMEnv creates..."),
leaving the rest of the description intact and ensuring the comment now
documents the exported SetupCommonEVMEnv function.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 212ac08 and cc56590.

📒 Files selected for processing (12)

.github/workflows/benchmark.yml
test/e2e/benchmark/metrics.go
test/e2e/benchmark/output.go
test/e2e/benchmark/spamoor_smoke_test.go
test/e2e/benchmark/suite_test.go
test/e2e/benchmark/traces.go
test/e2e/evm_contract_bench_test.go
test/e2e/evm_force_inclusion_e2e_test.go
test/e2e/evm_full_node_e2e_test.go
test/e2e/evm_spamoor_smoke_test.go
test/e2e/evm_test_common.go
test/e2e/failover_e2e_test.go

💤 Files with no reviewable changes (1)

test/e2e/evm_spamoor_smoke_test.go

.github/workflows/benchmark.yml

test/e2e/benchmark/suite_test.go

go test treats all arguments after an unknown flag (--evm-binary) as test binary args, so ./benchmark/ was never recognized as a package pattern.

go test sets the cwd to the package directory (test/e2e/benchmark/), so the binary path needs an extra parent traversal.

github-actions

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Spamoor Trace Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.

Benchmark suite	Current: `8752fee`	Previous: `f73a124`	Ratio
`SpamoorSmoke - Executor.SetFinal (avg)`	`2078.176646706587` us	`1257` us	`1.65`

This comment was automatically generated by workflow using github-action-benchmark.

The benchmark package doesn't define the --binary flag that test-e2e passes. It has its own CI workflow so it doesn't need to run here.

codecov · 2026-02-25T14:49:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.95%. Comparing base (f73a124) to head (8752fee).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3107      +/-   ##
==========================================
+ Coverage   60.90%   60.95%   +0.05%     
==========================================
  Files         113      113              
  Lines       11617    11617              
==========================================
+ Hits         7075     7081       +6     
+ Misses       3743     3737       -6     
  Partials      799      799

Flag	Coverage Δ
combined	`60.95% <ø> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alpe

Nice work. 👍
utack

alpe · 2026-02-26T14:03:34Z

test/e2e/benchmark/metrics.go

+	client := &http.Client{Timeout: 200 * time.Millisecond}
+	deadline := time.Now().Add(timeout)
+	var lastErr error
+	for time.Now().Before(deadline) {


personal preference: require.Eventually()

alpe · 2026-02-26T14:04:55Z

test/e2e/benchmark/metrics.go

+)
+
+// requireHTTP polls a URL until it returns a 2xx status code or the timeout expires.
+func requireHTTP(t testing.TB, url string, timeout time.Duration) {


nit: naming is hard. how about requireHostUp

- make reth tag configurable via EV_RETH_TAG env var (default pr-140) - fix OTLP config: remove duplicate env vars, use http/protobuf protocol - use require.Eventually for host readiness polling - rename requireHTTP to requireHostUp - use non-fatal logging in resultWriter.flush deferred context - fix stale doc comment (setupCommonEVMEnv -> SetupCommonEVMEnv) - rename loop variable to avoid shadowing testing.TB convention - add block/internal/executing/** to CI path trigger - remove unused require import from output.go

# Conflicts: # scripts/test.mk

coderabbitai

🧹 Nitpick comments (1)

test/e2e/evm_test_common.go (1)
306-318: Align the exported function comment with SetupSequencerNode.

Line 306 still starts with setupSequencerNode while the exported symbol at Line 318 is SetupSequencerNode. Please update the comment prefix to match the exported name for doc/lint consistency.

As per coding guidelines: **/*.go: Document exported types and functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/evm_test_common.go` around lines 306 - 318, The function doc comment
currently starts with the unexported name `setupSequencerNode` but the exported
function is `SetupSequencerNode`; update the comment's first line to begin with
`SetupSequencerNode` so the doc comment matches the exported symbol and
satisfies the Go linter rules for exported functions (refer to the comment block
above the `SetupSequencerNode` function).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e/evm_test_common.go`:
- Around line 306-318: The function doc comment currently starts with the
unexported name `setupSequencerNode` but the exported function is
`SetupSequencerNode`; update the comment's first line to begin with
`SetupSequencerNode` so the doc comment matches the exported symbol and
satisfies the Go linter rules for exported functions (refer to the comment block
above the `SetupSequencerNode` function).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85c9d2d and 1c3b560.

📒 Files selected for processing (8)

.github/workflows/benchmark.yml
test/e2e/benchmark/metrics.go
test/e2e/benchmark/output.go
test/e2e/benchmark/spamoor_smoke_test.go
test/e2e/benchmark/suite_test.go
test/e2e/benchmark/traces.go
test/e2e/evm_contract_bench_test.go
test/e2e/evm_test_common.go

🚧 Files skipped from review as they are similar to previous changes (3)

test/e2e/benchmark/spamoor_smoke_test.go
test/e2e/benchmark/suite_test.go
test/e2e/benchmark/metrics.go

github-actions bot assigned chatton Feb 25, 2026

chatton marked this pull request as draft February 25, 2026 12:17

chatton mentioned this pull request Feb 25, 2026

feat: add gasburner gas throughput benchmark #3108

Closed

chatton commented Feb 25, 2026

View reviewed changes

chatton marked this pull request as ready for review February 25, 2026 12:30

chatton commented Feb 25, 2026

View reviewed changes

fix: correct BENCH_JSON_OUTPUT path for spamoor benchmark

18fc15a

go test sets the working directory to the package under test, so the env var should be relative to test/e2e/benchmark/, not test/e2e/.

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

.github/workflows/benchmark.yml Show resolved Hide resolved

test/e2e/benchmark/suite_test.go Show resolved Hide resolved

fix: place package pattern before test binary flags in benchmark CI

fccd9db

go test treats all arguments after an unknown flag (--evm-binary) as test binary args, so ./benchmark/ was never recognized as a package pattern.

chatton marked this pull request as draft February 25, 2026 13:44

fix: adjust evm-binary path for benchmark subpackage working directory

ae525ca

go test sets the cwd to the package directory (test/e2e/benchmark/), so the binary path needs an extra parent traversal.

github-actions bot reviewed Feb 25, 2026

View reviewed changes

fix: exclude benchmark subpackage from make test-e2e

85c9d2d

The benchmark package doesn't define the --binary flag that test-e2e passes. It has its own CI workflow so it doesn't need to run here.

chatton marked this pull request as ready for review February 26, 2026 09:27

alpe approved these changes Feb 26, 2026

View reviewed changes

chatton added 2 commits March 2, 2026 09:36

Merge branch 'main' into cian/bench-refactor

1c3b560

# Conflicts: # scripts/test.mk

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

chatton added 3 commits March 2, 2026 09:50

chore: specify http

03b9239

chore: filter out benchmark tests from test-e2e

fe3ca23

Merge branch 'main' into cian/bench-refactor

8752fee

chatton added this pull request to the merge queue Mar 2, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 2, 2026

chatton added this pull request to the merge queue Mar 2, 2026

Merged via the queue into main with commit fe99227 Mar 2, 2026
50 of 51 checks passed

chatton deleted the cian/bench-refactor branch March 2, 2026 11:50

coderabbitai bot mentioned this pull request Mar 2, 2026

feat(benchmarking): adding ERC20 benchmarking test #3114

Open

Conversation

chatton commented Feb 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: refactor: move spamoor benchmark into testify suite

Medium: t.Context() used inside t.Cleanup — context already cancelled

Minor: Doc comments still reference old unexported names

Minor: Mixed-service span aggregation loses per-service attribution

Positive: Things done well

FYI: Acknowledged issues (no action needed)

Uh oh!

chatton Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatton Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

codecov bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

alpe left a comment

Choose a reason for hiding this comment

Uh oh!

alpe Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

alpe Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chatton commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading

github-actions bot commented Feb 25, 2026 •

edited

Loading

claude bot commented Feb 25, 2026 •

edited

Loading

Code Review: `refactor: move spamoor benchmark into testify suite`

Medium: `t.Context()` used inside `t.Cleanup` — context already cancelled

chatton Feb 25, 2026 •

edited

Loading

github-actions bot left a comment •

edited

Loading

codecov bot commented Feb 25, 2026 •

edited

Loading