Skip to content

HDDS-14724. Fix infinite CPU spin loop in ECBlockInputStream#9833

Merged
adoroszlai merged 1 commit intoapache:masterfrom
stuxuhai:HDDS-14724
Mar 3, 2026
Merged

HDDS-14724. Fix infinite CPU spin loop in ECBlockInputStream#9833
adoroszlai merged 1 commit intoapache:masterfrom
stuxuhai:HDDS-14724

Conversation

@stuxuhai
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes a critical 100% CPU infinite spin loop bug in ECBlockInputStream that occurs during transient network unavailability or DataNode unreadiness (e.g., when the underlying NIO channel returns 0 bytes).

Currently, ECBlockInputStream#readFromStream only checks if actualRead == -1 (EOF). If 0 bytes are returned while expectedRead > 0, the stream fails to advance its position but remains in the while loop, causing an infinite CPU spin and thread starvation.

Proposed Solution:
Instead of introducing complex timeouts or backoff loops, this patch aligns the EC read path with the traditional replica read path (BlockInputStream#readWithStrategy).
By strictly validating actualRead != expectedRead (or explicitly intercepting 0), it throws an IOException immediately on inconsistent reads.

This naturally integrates with the existing Ozone client architecture:

  1. ECBlockInputStream throws IOException.
  2. readWithStrategy wraps it into BadDataLocationException.
  3. ECBlockInputStreamProxy catches it and gracefully falls over to failoverToReconstructionRead.

This minimalist approach completely eliminates the spin loop while fully leveraging the ecosystem's native reconstruction/failover mechanisms without modifying the Proxy class.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14724

How was this patch tested?

Added a JUnit test testZeroByteReadTriggersFailoverException to verify that a 0-byte read in ECBlockInputStream correctly throws an IOException (which translates to BadDataLocationException), instantly breaking the loop.

@adoroszlai adoroszlai requested a review from sodonnel February 26, 2026 10:50
@adoroszlai adoroszlai merged commit f2c7b6f into apache:master Mar 3, 2026
44 checks passed
@adoroszlai
Copy link
Contributor

Thanks @stuxuhai for the patch.

@stuxuhai stuxuhai deleted the HDDS-14724 branch March 3, 2026 15:13
@stuxuhai
Copy link
Contributor Author

stuxuhai commented Mar 3, 2026

Thanks for the review and the merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants