Skip to content

Implement dynamic token stale period based on token TTL#677

Open
mihaimitrea-db wants to merge 6 commits intomainfrom
mihaimitrea-db/auth_token_buffer_extension
Open

Implement dynamic token stale period based on token TTL#677
mihaimitrea-db wants to merge 6 commits intomainfrom
mihaimitrea-db/auth_token_buffer_extension

Conversation

@mihaimitrea-db
Copy link

Summary

Extends the token refresh buffer from a fixed 5 minutes to a dynamic period that adapts to token lifetime, improving reliability across different token types while maintaining backward compatibility.

The TTL is computed based on the remaining time to live at the moment the token is received and not its real TTL. For example, if the token has a TTL of 60 minutes but was acquired 20 minutes before being used by the SDK (e.g. through the CLI), its effective TTL will be 60 − 20 = 40 minutes.

Why

The previous 5 minute stale period barely covered the allowed monthly downtime of ~4.32 minutes. Thus, if the auth services were to be down and a request were to come 4 minute into the stale period it would have only 1 minute to obtain a new valid token before expiry. With an extended stale period of 20 minutes, the SDK has an extra ~15 minutes of stale but valid tokens to use, allowing the auth system to recover.

Changes

  • Increase the maximum stale period from 5 to 20 minutes to support 99.95%
    availability.
  • Implement dynamic stale period calculation: min(TTL × 0.5, 20 minutes).
  • Compute stale period per-token at acquisition time.

Backward compatibility:

  • The public Builder method setStaleDuration() is preserved. Calling it disables dynamic mode via a useDynamicStaleDuration flag, reverting the behavior to the legacy fixed-duration stale window. This ensures that any caller already configuring a custom stale period is unaffected.

Implementation:

  • Add computeStaleDuration(Token) that computes min(TTL / 2, MAX_STALE_DURATION).
  • Add useDynamicStaleDuration flag to the Builder, defaulting to true; setStaleDuration() sets it to false.
  • Add volatile dynamicStaleDuration field; initialized to MAX_STALE_DURATION as a safe default when no token is pre-set, or computed from the pre-set token via computeStaleDuration() when one is provided.
  • Update getTokenState() to use dynamicStaleDuration or the legacy staleDuration based on the flag.
  • Update getTokenBlocking() to recompute the stale period after a successful synchronous refresh.
  • Update triggerAsyncRefresh() to recompute the stale period after a successful async refresh.

Testing

  • Update testAsyncRefreshParametrized to use TestClockSupplier for deterministic time control, adding a clockAdvanceMinutes parameter to bring tokens into the stale window without relying on wall-clock timing.
  • Add a capped stale duration scenario: 60-min TTL token advanced 41 minutes leaves lifeTime = 19 min ≤ 20 min cap → STALE, verifying MAX_STALE_DURATION is correctly applied.
  • Update testAsyncRefreshFailureFallback to use a 4-minute TTL token advanced by 3 minutes to reliably enter the stale window under the dynamic formula.

Replace the fixed 5-minute stale window with a per-token formula: stale_period = min(TTL / 2, 20 minutes). This ensures that short-lived tokens (e.g. FastPath tokens with a 10-minute TTL) enter the stale window early enough to trigger a proactive async refresh, while long-lived tokens are capped at 20 minutes to maintain a meaningful refresh buffer under a 99.99% uptime SLA.

The stale duration is computed at token fetch time using the remaining TTL as a proxy, since the SDK does not track token issuance timestamps. It is updated after every successful refresh (both synchronous and asynchronous) via computeStaleDuration(), and stored in a volatile field so that the unsynchronized fast-path read in getTokenAsync() sees a consistent value.

Due to the exposed CachedTokenSource builder function setStalePeriod backwards compatibility is maintained through the useDynamicStalePeriod flag. Using the setStalePeriod function disables this flag, reverting the behaviour of the token cache to the legacy one.
…ion.

Tests were failing because the TokenSource getToken function could return null tokens. Introduced null checks that only update the staleDuration if the refreshed token is not null. Otherwise the stale duration remains fixed as the null token will be marked as expired at the next call.
@github-actions
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-java

Inputs:

  • PR number: 677
  • Commit SHA: 539c66974ba0695efab03b56c10f6f37c5f12f3d

Checks will be approved automatically on success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant