Skip to content

feat: auto-detect Engine API GetPayload version for Osaka fork#3113

Merged
tac0turtle merged 4 commits intomainfrom
feat/auto-detect-osaka-getpayload-version
Mar 2, 2026
Merged

feat: auto-detect Engine API GetPayload version for Osaka fork#3113
tac0turtle merged 4 commits intomainfrom
feat/auto-detect-osaka-getpayload-version

Conversation

@tac0turtle
Copy link
Contributor

@tac0turtle tac0turtle commented Feb 27, 2026

GetPayload now automatically selects between engine_getPayloadV4 (Prague) and engine_getPayloadV5 (Osaka) by caching the last successful version and retrying with the alternative on "Unsupported fork" errors (code -38005).

This handles Prague chains, Osaka-at-genesis chains, and time-based Prague-to-Osaka upgrades with zero configuration. At most one extra RPC call occurs at the fork transition point.

Overview

Summary by CodeRabbit

  • New Features

    • Automatic fallback between payload API versions with caching to prefer the successful version.
    • Exposes the currently used payload method for improved tracing and diagnostics.
    • Enhanced error messages that indicate which payload method and payload ID failed, improving observability.
  • Tests

    • Added comprehensive tests validating version selection, fallback behavior, caching, fork upgrade scenarios, and error propagation.

GetPayload now automatically selects between engine_getPayloadV4 (Prague)
and engine_getPayloadV5 (Osaka) by caching the last successful version and
retrying with the alternative on "Unsupported fork" errors (code -38005).

This handles Prague chains, Osaka-at-genesis chains, and time-based
Prague-to-Osaka upgrades with zero configuration. At most one extra
RPC call occurs at the fork transition point.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Feb 27, 2026

Claude finished @tac0turtle's task in 2m 3s —— View job


Code Review: Auto-detect Engine API GetPayload version for Osaka fork

  • Gather context and understand the request
  • Read changed files
  • Review code quality & architecture
  • Review error handling & resilience
  • Review security considerations
  • Review performance & resource efficiency
  • Review testing & reliability
  • Post final review feedback

Summary

This is a well-scoped, clean implementation. The fallback-and-cache pattern is correct, the atomic flag provides safe lock-free coordination, and the test coverage is thorough. A few observations below.


Architecture & Code Quality

engine_rpc_client.go

The implementation is minimal and correct. The atomic.Bool for useV5 is the right primitive here — lock-free, zero-cost on the hot path once the fork is resolved.

One subtle point: the expression e.useV5.Store(altMethod == getPayloadV5Method) correctly derives the stored value from the local variable altMethod (not from a re-read of useV5). This avoids a potential TOCTOU issue under concurrent calls. Good.

Concurrent fork transition safety: Two simultaneous calls during the transition (both reading useV5=false) will each try V4, each fall back to V5, and both store useV5=true. Result: correct, at most N×2 RPCs for N concurrent callers at the exact fork boundary. Acceptable trade-off given fork transitions are rare.

One-way transition assumption: useV5 only transitions false → true, never in reverse. This is valid for time-based Ethereum forks which don't downgrade. The code relies on this implicit assumption — a brief comment near the useV5 declaration would make this explicit for future readers.

engine_rpc_tracing.go

The payloadMethodGetter interface + type assertion pattern is clean and avoids adding an optional method to the EngineRPCClient interface. One edge case: if tracedEngineRPCClient itself were ever wrapped by another tracedEngineRPCClient (double-wrapping), the inner assertion t.inner.(payloadMethodGetter) would fail (since tracedEngineRPCClient doesn't implement GetPayloadMethod()), and the span would silently omit the method attribute. This is unlikely in the current wiring (execution.go:255) but worth noting.


Error Handling

The error messages are well-formed and include method name and payload ID for diagnosability:

  • engine_getPayloadV4 payload <id>: <err> for non-fork errors
  • engine_getPayloadV5 fallback after engine_getPayloadV4 unsupported fork, payload <id>: <err> for failed fallbacks

One subtle observability gap: when the fallback succeeds (V4→V5 transition call), the tracing span records method=engine_getPayloadV5 — the winning method — without indicating that V4 was attempted first. An operator investigating a slightly slower call won't see in the span that two RPCs were made. Consider adding a "fallback": true attribute Fix this → on the span when altMethod was used.


Testing

The tests in engine_rpc_client_test.go are comprehensive and cover the critical scenarios:

  • Prague (V4-only) chain with caching
  • Osaka-at-genesis (V4→V5 fallback + caching)
  • Time-based Prague→Osaka upgrade with three-phase assertion (V4 works, transition, V5 cached)
  • Non-fork error propagation
  • isUnsupportedForkErr table-driven test including wrapped error

One gap: there is no direct test asserting GetPayloadMethod() returns the correct value before/after the transition. While this is tested implicitly through the call sequence assertions, an explicit test would guard against GetPayloadMethod() drifting from the actual used method. Fix this →

Minor: the fakeEngineServer has an internal mu that holds the lock for the entire HTTP handler, while test closures also lock their own mu. The nesting order is always fakeEngineServer.mu → test.mu, so no deadlock risk, but it's worth being aware of when adding new tests.


Performance

  • atomic.Bool load/store is O(1) and non-blocking ✓
  • Constants avoid string allocation on every call ✓
  • At most 1 extra RPC per node lifetime (at fork boundary) ✓
  • No heap allocations on the success path beyond what CallContext already does ✓

Minor Nits

  1. engine_rpc_client.go:51-56 — The method/altMethod swap could include a brief comment explaining this is symmetric: either direction (V4→V5 or V5→V4) uses the same fallback path, covering the theoretical edge case of Osaka-to-Prague downgrade (e.g., reorg recovery). Not strictly necessary but improves understanding.

  2. engine_rpc_tracing.go:83-85 — The method attribute is set even on error. For the case where both methods fail, useV5 is not updated, so the span will show the pre-failure preference. This is technically correct but might be non-obvious to readers. A comment clarifying this behavior would help.

  3. NewPayload and ForkchoiceUpdated remain hardcoded to V4 and V3 respectively. If Osaka introduces new versions of these methods, they'll need similar treatment. Not in scope for this PR, but the pattern established here makes it straightforward to extend.


Overall Assessment

The implementation is correct, thread-safe, and well-tested. The approach of using an atomic.Bool with retry-on-unsupported-fork is idiomatic and keeps the complexity minimal. The issues noted above are minor and don't block merging.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 2, 2026, 11:30 AM

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 954b703 and 4f7a617.

📒 Files selected for processing (1)
  • execution/evm/engine_rpc_tracing.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • execution/evm/engine_rpc_tracing.go

📝 Walkthrough

Walkthrough

Adds negotiated Engine API selection to the Engine RPC client: it prefers a cached method (V4 or V5), calls that method, falls back on Engine "Unsupported fork" (-38005) errors to the alternate method, caches the successful method, and exposes the resolved method via GetPayloadMethod for tracing.

Changes

Cohort / File(s) Summary
Engine RPC Client Implementation
execution/evm/engine_rpc_client.go
Adds engineErrUnsupportedFork (-38005), getPayloadV4Method/getPayloadV5Method constants, isUnsupportedForkErr helper, a useV5 atomic.Bool flag, fallback-and-caching logic in GetPayload, and GetPayloadMethod() to report the resolved method.
Engine RPC Client Tests
execution/evm/engine_rpc_client_test.go
New tests with a fake JSON-RPC server covering V4/V5 calls, unsupported-fork fallback, caching behavior across upgrades, error propagation, and isUnsupportedForkErr cases.
Tracing Integration
execution/evm/engine_rpc_tracing.go
Adds payloadMethodGetter interface (GetPayloadMethod() string) and updates tracing to read the actual resolved method from the client instead of using a hardcoded method name.

Sequence Diagram

sequenceDiagram
    participant Client
    participant EngineRPCClient
    participant Engine_V4 as Engine API V4
    participant Engine_V5 as Engine API V5

    Client->>EngineRPCClient: GetPayload(id)
    alt useV5 == false (or unset)
        EngineRPCClient->>Engine_V4: engine_getPayloadV4(id)
        Engine_V4-->>EngineRPCClient: error (-38005) "Unsupported fork"
        EngineRPCClient->>Engine_V5: engine_getPayloadV5(id)
        Engine_V5-->>EngineRPCClient: Payload
        EngineRPCClient->>EngineRPCClient: set useV5 = true
    else useV5 == true
        EngineRPCClient->>Engine_V5: engine_getPayloadV5(id)
        Engine_V5-->>EngineRPCClient: Payload
    end
    EngineRPCClient-->>Client: Payload or wrapped error (includes method + id)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I hopped between V4 and V5 today,

When forks said "no," I found another way.
I cached the route and stayed upbeat —
Now payloads land right on my little feet. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: introducing automatic version detection for the Engine API GetPayload method to handle Osaka fork transitions.
Description check ✅ Passed The description provides a clear overview of the feature and how it works, but lacks link to any related issue (e.g., 'Closes #').

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/auto-detect-osaka-getpayload-version

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.95%. Comparing base (f73a124) to head (4f7a617).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3113      +/-   ##
==========================================
+ Coverage   60.90%   60.95%   +0.05%     
==========================================
  Files         113      113              
  Lines       11617    11617              
==========================================
+ Hits         7075     7081       +6     
+ Misses       3743     3738       -5     
+ Partials      799      798       -1     
Flag Coverage Δ
combined 60.95% <ø> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
execution/evm/engine_rpc_client.go (1)

52-64: Wrap RPC errors with call context to improve diagnosability.

The current returns lose method-level context when failures bubble up. Wrapping here will make operational debugging much easier.

♻️ Proposed patch
 import (
 	"context"
 	"errors"
+	"fmt"
 	"sync/atomic"
@@
 	err := e.client.CallContext(ctx, &result, method, payloadID)
 	if err == nil {
 		return &result, nil
 	}
 
 	if !isUnsupportedForkErr(err) {
-		return nil, err
+		return nil, fmt.Errorf("%s failed for payload %x: %w", method, payloadID, err)
 	}
@@
 	err = e.client.CallContext(ctx, &result, altMethod, payloadID)
 	if err != nil {
-		return nil, err
+		return nil, fmt.Errorf("%s failed after %s unsupported fork for payload %x: %w", altMethod, method, payloadID, err)
 	}

As per coding guidelines "Wrap errors with context using fmt.Errorf".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@execution/evm/engine_rpc_client.go` around lines 52 - 64, The RPC CallContext
error returns currently lose method context; when returning errors from the
calls to e.client.CallContext (both the primary call using method and the
fallback using altMethod) wrap the original error using fmt.Errorf to include
the method name and payloadID (e.g. fmt.Errorf("engine rpc %s payload %v: %w",
method, payloadID, err)), and do the same for the fallback call (include
altMethod in the message); update returns in the block around
isUnsupportedForkErr(err) and the subsequent fallback so errors preserve call
context for diagnosability.
execution/evm/engine_rpc_client_test.go (1)

96-215: Consider table-driving the GetPayload scenario tests.

These three scenario tests share setup/teardown and method-tracking logic; table-driven structure would reduce duplication and make new fork-transition cases easier to add.

As per coding guidelines "Use table-driven tests where appropriate".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@execution/evm/engine_rpc_client_test.go` around lines 96 - 215, The three
tests TestGetPayload_PragueChain_UsesV4, TestGetPayload_OsakaChain_FallsBackToV5
and TestGetPayload_ForkUpgrade_SwitchesV4ToV5 duplicate setup/teardown and
call-tracking; refactor them into a single table-driven test that iterates over
cases (name, server behavior closure for fakeEngineServer, expected call
sequences before/after cache or upgrade, and any state transitions like toggling
osakaActive), reuse NewEngineRPCClient and client.GetPayload(ctx,
engine.PayloadID{}) in a loop, and assert calledMethods for each subcase; keep
unique identifiers (fakeEngineServer, NewEngineRPCClient, GetPayload,
calledMethods, osakaActive) so each case can provide the server callback and
expected sequences, reducing duplication and making future fork scenarios easier
to add.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@execution/evm/engine_rpc_client_test.go`:
- Around line 172-215: The test TestGetPayload_ForkUpgrade_SwitchesV4ToV5
currently only asserts success but not that the client cached the switch to V5;
instrument the fakeEngineServer handler with counters (e.g., v4Calls, v5Calls)
or a boolean flag to record how many times engine_getPayloadV4 and
engine_getPayloadV5 were invoked, call client.GetPayload as already written,
then after the first post-upgrade call assert that v4Calls == 1 and v5Calls >= 1
(first fallback used V5), and after the subsequent call assert v4Calls remains 1
and v5Calls increased (or v4Calls did not increase) to prove the client cached
and directly used V5; reference the test function
TestGetPayload_ForkUpgrade_SwitchesV4ToV5, the fakeEngineServer handler, and
client.GetPayload when adding these assertions.

---

Nitpick comments:
In `@execution/evm/engine_rpc_client_test.go`:
- Around line 96-215: The three tests TestGetPayload_PragueChain_UsesV4,
TestGetPayload_OsakaChain_FallsBackToV5 and
TestGetPayload_ForkUpgrade_SwitchesV4ToV5 duplicate setup/teardown and
call-tracking; refactor them into a single table-driven test that iterates over
cases (name, server behavior closure for fakeEngineServer, expected call
sequences before/after cache or upgrade, and any state transitions like toggling
osakaActive), reuse NewEngineRPCClient and client.GetPayload(ctx,
engine.PayloadID{}) in a loop, and assert calledMethods for each subcase; keep
unique identifiers (fakeEngineServer, NewEngineRPCClient, GetPayload,
calledMethods, osakaActive) so each case can provide the server callback and
expected sequences, reducing duplication and making future fork scenarios easier
to add.

In `@execution/evm/engine_rpc_client.go`:
- Around line 52-64: The RPC CallContext error returns currently lose method
context; when returning errors from the calls to e.client.CallContext (both the
primary call using method and the fallback using altMethod) wrap the original
error using fmt.Errorf to include the method name and payloadID (e.g.
fmt.Errorf("engine rpc %s payload %v: %w", method, payloadID, err)), and do the
same for the fallback call (include altMethod in the message); update returns in
the block around isUnsupportedForkErr(err) and the subsequent fallback so errors
preserve call context for diagnosability.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c449847 and ce75825.

📒 Files selected for processing (2)
  • execution/evm/engine_rpc_client.go
  • execution/evm/engine_rpc_client_test.go

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Spamoor Trace Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.

Benchmark suite Current: 954b703 Previous: c449847 Ratio
SpamoorSmoke - BlockExecutor.CreateBlock (avg) 4.144171779141105 us 2.4564564564564564 us 1.69

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@chatton chatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good, I think we need to update it to make the actual method that is configured to be called exposed in some way so that the tracing wrapper reports the correct method call in the span.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
execution/evm/engine_rpc_client.go (1)

50-77: Minor observability gap on failed fallback attempts.

When the fallback method also fails (line 71), GetPayloadMethod() will return the primary method name (since useV5 isn't updated on failure), but the trace won't reflect that both methods were attempted. This could be confusing when debugging failed fork transitions.

Consider: Is the detailed error message (which includes both method names) sufficient for debugging, or should the traced method also indicate a fallback was attempted?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@execution/evm/engine_rpc_client.go` around lines 50 - 77, GetPayload
currently tries a primary RPC method then falls back to an alternate but does
not record that a fallback was attempted when the fallback also fails; update
engineRPCClient.GetPayload to record the fallback attempt in observability
(e.g., attach a span tag/log field) before returning the final error so traces
reflect both getPayloadV4Method and getPayloadV5Method were tried; reference
useV5, getPayloadV4Method/getPayloadV5Method and the error branch where the
altMethod CallContext fails to add the trace/log annotation indicating a
fallback was attempted (do not change useV5 on failure).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@execution/evm/engine_rpc_client.go`:
- Around line 50-77: GetPayload currently tries a primary RPC method then falls
back to an alternate but does not record that a fallback was attempted when the
fallback also fails; update engineRPCClient.GetPayload to record the fallback
attempt in observability (e.g., attach a span tag/log field) before returning
the final error so traces reflect both getPayloadV4Method and getPayloadV5Method
were tried; reference useV5, getPayloadV4Method/getPayloadV5Method and the error
branch where the altMethod CallContext fails to add the trace/log annotation
indicating a fallback was attempted (do not change useV5 on failure).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce75825 and 954b703.

📒 Files selected for processing (3)
  • execution/evm/engine_rpc_client.go
  • execution/evm/engine_rpc_client_test.go
  • execution/evm/engine_rpc_tracing.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • execution/evm/engine_rpc_client_test.go

Copy link
Contributor

@chatton chatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit on naming 👍

@tac0turtle tac0turtle enabled auto-merge March 2, 2026 11:29
@tac0turtle tac0turtle added this pull request to the merge queue Mar 2, 2026
Merged via the queue into main with commit 805f927 Mar 2, 2026
33 checks passed
@tac0turtle tac0turtle deleted the feat/auto-detect-osaka-getpayload-version branch March 2, 2026 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants