Implement dynamic token stale period based on token TTL by mihaimitrea-db · Pull Request #677 · databricks/databricks-sdk-java

mihaimitrea-db · 2026-02-24T15:58:45Z

Summary

Extends the token refresh buffer from a fixed 5 minutes to a dynamic period that adapts to token lifetime, improving reliability across different token types while maintaining backward compatibility.

The TTL is computed based on the remaining time to live at the moment the token is received and not its real TTL. For example, if the token has a TTL of 60 minutes but was acquired 20 minutes before being used by the SDK (e.g. through the CLI), its effective TTL will be 60 − 20 = 40 minutes.

Why

The previous 5 minute stale period barely covered the allowed monthly downtime of ~4.32 minutes. Thus, if the auth services were to be down and a request were to come 4 minute into the stale period it would have only 1 minute to obtain a new valid token before expiry. With an extended stale period of 20 minutes, the SDK has an extra ~15 minutes of stale but valid tokens to use, allowing the auth system to recover.

Changes

Increase the maximum stale period from 5 to 20 minutes to support 99.95%
availability.
Implement dynamic stale period calculation: min(TTL × 0.5, 20 minutes).
Compute stale period per-token at acquisition time.

Backward compatibility:

The public Builder method setStaleDuration() is preserved. Calling it disables dynamic mode via a useDynamicStaleDuration flag, reverting the behavior to the legacy fixed-duration stale window. This ensures that any caller already configuring a custom stale period is unaffected.

Implementation:

Add computeStaleDuration(Token) that computes min(TTL / 2, MAX_STALE_DURATION).
Add useDynamicStaleDuration flag to the Builder, defaulting to true; setStaleDuration() sets it to false.
Add volatile dynamicStaleDuration field; initialized to MAX_STALE_DURATION as a safe default when no token is pre-set, or computed from the pre-set token via computeStaleDuration() when one is provided.
Update getTokenState() to use dynamicStaleDuration or the legacy staleDuration based on the flag.
Update getTokenBlocking() to recompute the stale period after a successful synchronous refresh.
Update triggerAsyncRefresh() to recompute the stale period after a successful async refresh.

Testing

Update testAsyncRefreshParametrized to use TestClockSupplier for deterministic time control, adding a clockAdvanceMinutes parameter to bring tokens into the stale window without relying on wall-clock timing.
Add a capped stale duration scenario: 60-min TTL token advanced 41 minutes leaves lifeTime = 19 min ≤ 20 min cap → STALE, verifying MAX_STALE_DURATION is correctly applied.
Update testAsyncRefreshFailureFallback to use a 4-minute TTL token advanced by 3 minutes to reliably enter the stale window under the dynamic formula.

Replace the fixed 5-minute stale window with a per-token formula: stale_period = min(TTL / 2, 20 minutes). This ensures that short-lived tokens (e.g. FastPath tokens with a 10-minute TTL) enter the stale window early enough to trigger a proactive async refresh, while long-lived tokens are capped at 20 minutes to maintain a meaningful refresh buffer under a 99.99% uptime SLA. The stale duration is computed at token fetch time using the remaining TTL as a proxy, since the SDK does not track token issuance timestamps. It is updated after every successful refresh (both synchronous and asynchronous) via computeStaleDuration(), and stored in a volatile field so that the unsynchronized fast-path read in getTokenAsync() sees a consistent value. Due to the exposed CachedTokenSource builder function setStalePeriod backwards compatibility is maintained through the useDynamicStalePeriod flag. Using the setStalePeriod function disables this flag, reverting the behaviour of the token cache to the legacy one.

…ion. Tests were failing because the TokenSource getToken function could return null tokens. Introduced null checks that only update the staleDuration if the refreshed token is not null. Otherwise the stale duration remains fixed as the null token will be marked as expired at the next call.

… legacy version and new dynamic one.

github-actions · 2026-02-25T09:16:10Z

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-java

Inputs:

PR number: 677
Commit SHA: 539c66974ba0695efab03b56c10f6f37c5f12f3d

Checks will be approved automatically on success.

mihaimitrea-db added 5 commits February 24, 2026 10:22

Fix linting errors.

8a77237

Fix small linting error.

83788ef

Introduce checks for permanent tokens with no expiry time.

21e755a

mihaimitrea-db temporarily deployed to test-trigger-is February 24, 2026 15:59 — with GitHub Actions Inactive

Rename stale duration to staticStaleDuration to differentiate between…

539c669

… legacy version and new dynamic one.

mihaimitrea-db temporarily deployed to test-trigger-is February 25, 2026 09:16 — with GitHub Actions Inactive

mihaimitrea-db temporarily deployed to test-trigger-is February 25, 2026 09:17 — with GitHub Actions Inactive

mihaimitrea-db requested a review from parthban-db February 25, 2026 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement dynamic token stale period based on token TTL#677

Implement dynamic token stale period based on token TTL#677
mihaimitrea-db wants to merge 6 commits intomainfrom
mihaimitrea-db/auth_token_buffer_extension

mihaimitrea-db commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mihaimitrea-db commented Feb 24, 2026

Summary

Why

Changes

Backward compatibility:

Implementation:

Testing

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant