HDDS-11765. ContainerChecksumTreeManager to handle missed block deletions from the deleted block ids#9855
HDDS-11765. ContainerChecksumTreeManager to handle missed block deletions from the deleted block ids#9855hevinhsu wants to merge 1 commit intoapache:masterfrom
Conversation
| // Remove block metadata from DB and update counters. | ||
| try (BatchOperation batch = db.getStore().getBatchHandler().initBatchOperation()) { | ||
| db.getStore().getBlockDataTable().deleteWithBatch(batch, blockKey); | ||
| // Also remove from lastChunkInfoTable for schema V2/V3. |
There was a problem hiding this comment.
what would be the recovery process in exception is thrown on one of calls after deleting the block?
There was a problem hiding this comment.
Thanks for the question. There is no explicit rollback mechanism here. The recovery relies on retry and idempotent deletion, which is the same approach used by BlockDeletingTask.
Since the question is about failures after the block is physically deleted (L2055), here are the two cases:
-
DB batch commit fails (L2059-2072):
The chunk file is gone but DB metadata still references it.
The caller (reconcileContainerInternalL1756-1761) catches theIOExceptionand continues. Since the block metadata still exists in DB, the next reconciliation will detect the divergence again and retrydeleteBlockForReconciliation, and the batch commit will succeed.
This is the same trade-offBlockDeletingTaskmakes — see the TODO atdeleteTransactionsL470-473 acknowledging this gap. -
In-memory stats update fails (L2075-2077):
These operations only update in-memory counters (decDeletion,decrementUsedSpace) and do not throwIOException.
Even if a failure occurs (e.g. process crash or OOM), the DB state is already correct, and the in-memory statistics are rebuilt from RocksDB on DN restart.
So the behavior is consistent with the eventual-consistency model already used by BlockDeletingTask.
What changes were proposed in this pull request?
ContainerChecksumTreeManagerto handle missed block deletions from the deleted block ids.Please describe your PR in detail:
During reconciliation with a peer, if deleted blocks are present on the peer but still exist locally, those blocks are deleted locally as well. This addresses cases where replicas may miss block delete transactions from SCM.
deleteBlockForReconciliationmethod that invokes existing APIs (deleteBlock,deleteUnreferenced) to perform physical deletion and remove related metadata from RocksDB when necessary.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11765
How was this patch tested?
https://github.com/hevinhsu/ozone/actions/runs/22558617239