Skip to content

HDDS-11765. ContainerChecksumTreeManager to handle missed block deletions from the deleted block ids#9855

Open
hevinhsu wants to merge 1 commit intoapache:masterfrom
hevinhsu:HDDS-11765
Open

HDDS-11765. ContainerChecksumTreeManager to handle missed block deletions from the deleted block ids#9855
hevinhsu wants to merge 1 commit intoapache:masterfrom
hevinhsu:HDDS-11765

Conversation

@hevinhsu
Copy link
Contributor

@hevinhsu hevinhsu commented Mar 2, 2026

What changes were proposed in this pull request?

ContainerChecksumTreeManager to handle missed block deletions from the deleted block ids.

Please describe your PR in detail:
During reconciliation with a peer, if deleted blocks are present on the peer but still exist locally, those blocks are deleted locally as well. This addresses cases where replicas may miss block delete transactions from SCM.

  • Introduce a deleteBlockForReconciliation method that invokes existing APIs (deleteBlock, deleteUnreferenced) to perform physical deletion and remove related metadata from RocksDB when necessary.
  • Add unit and integration tests.
  • Generated-by: GitHub Copilot with Claude Opus 4.6 (with manual tweaks)

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11765

How was this patch tested?

https://github.com/hevinhsu/ozone/actions/runs/22558617239

// Remove block metadata from DB and update counters.
try (BatchOperation batch = db.getStore().getBatchHandler().initBatchOperation()) {
db.getStore().getBlockDataTable().deleteWithBatch(batch, blockKey);
// Also remove from lastChunkInfoTable for schema V2/V3.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the recovery process in exception is thrown on one of calls after deleting the block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question. There is no explicit rollback mechanism here. The recovery relies on retry and idempotent deletion, which is the same approach used by BlockDeletingTask.

Since the question is about failures after the block is physically deleted (L2055), here are the two cases:

  1. DB batch commit fails (L2059-2072):
    The chunk file is gone but DB metadata still references it.
    The caller (reconcileContainerInternal L1756-1761) catches the IOException and continues. Since the block metadata still exists in DB, the next reconciliation will detect the divergence again and retry deleteBlockForReconciliation, and the batch commit will succeed.
    This is the same trade-off BlockDeletingTask makes — see the TODO at deleteTransactions L470-473 acknowledging this gap.

  2. In-memory stats update fails (L2075-2077):
    These operations only update in-memory counters (decDeletion, decrementUsedSpace) and do not throw IOException.
    Even if a failure occurs (e.g. process crash or OOM), the DB state is already correct, and the in-memory statistics are rebuilt from RocksDB on DN restart.

So the behavior is consistent with the eventual-consistency model already used by BlockDeletingTask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants