HDDS-14730. Update Recon container sync to use container IDs#9842
HDDS-14730. Update Recon container sync to use container IDs#9842jasonosullivan34 wants to merge 7 commits intoapache:masterfrom
Conversation
devmadhuu
left a comment
There was a problem hiding this comment.
Thanks @jasonosullivan34 for the patch. Few comments in line with code. Pls check.
Also please add in PR description , how the patch was tested. Also better write following tests:
-
A unit test for ContainerStateMap.getContainerIDs(state, start, count) verifying pagination and state filtering
-
A unit or integration test for syncWithSCMContainerInfo() covering the "container missing from Recon, add it" path, and the "container already present, skip it" path
...e/recon/src/main/java/org/apache/hadoop/ozone/recon/spi/StorageContainerServiceProvider.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/hadoop/ozone/recon/spi/impl/StorageContainerServiceProviderImpl.java
Outdated
Show resolved
Hide resolved
...pache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java
Show resolved
Hide resolved
...work/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java
Outdated
Show resolved
Hide resolved
.../apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java
Outdated
Show resolved
Hide resolved
| if (isSyncDataFromSCMRunning.compareAndSet(false, true)) { | ||
| try { | ||
| List<ContainerInfo> containers = containerManager.getContainers(); | ||
| List<ContainerID> containerIDs = containerManager.getContainerIDs(); |
There was a problem hiding this comment.
We need not to load this list in memory, containerManager.getContainerIDs() loads all container IDs from Recon's local ContainerStateManager into a List upfront. Then for every SCM-returned ID, List.contains() does an O(n) linear scan. On a cluster with a million containers, this is O(n²) work for a full sync.
The ContainerManager interface already exposes containerExist(ContainerID id), which does a direct O(log n) map lookup without materializing any list.
...econ/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconStorageContainerManagerFacade.java
Outdated
Show resolved
Hide resolved
.../apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java
Outdated
Show resolved
Hide resolved
...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java
Show resolved
Hide resolved
|
|
||
| /** | ||
| * Returns container IDs under certain conditions. | ||
| * Search container IDs from start ID(exclusive), |
There was a problem hiding this comment.
ContainerStateMap.getContainerIDs() uses tailMap(start) which is inclusive. Pls correct the javadoc.
There was a problem hiding this comment.
I copied this javadoc from ContainerManager.getContainers, will I correct that one too?
devmadhuu
left a comment
There was a problem hiding this comment.
Thanks @jasonosullivan34 for improving the patch, however, still few points need to handle. Pls check.
| public boolean syncWithSCMContainerInfo() | ||
| throws IOException { | ||
| throws Exception { | ||
| if (isSyncDataFromSCMRunning.compareAndSet(false, true)) { |
There was a problem hiding this comment.
| if (isSyncDataFromSCMRunning.compareAndSet(false, true)) { | |
| if (isSyncDataFromSCMRunning.compareAndSet(false, true)) { | |
| try { | |
| return containerSyncHelper.syncWithSCMContainerInfo(); | |
| } finally { | |
| isSyncDataFromSCMRunning.compareAndSet(true, false); | |
| } | |
| } | |
| LOG.debug("SCM DB sync is already running."); | |
| return false; |
| return true; | ||
| } | ||
|
|
||
| private long getContainerCountPerCall(long totalContainerCount) { |
There was a problem hiding this comment.
Earlier, CONTAINER_METADATA_SIZE was defined as 1MB to estimate a ContainerInfo object. A ContainerID proto is ~8–12 bytes. With an IPC max of 128MB, the old code limited batches to floor(128MB / 1MB) = 128 containers per call. The new code does the same — 128 IDs per call — when it could safely fetch floor(128MB / 12 bytes) ≈ 5.5 million IDs per call. This means the change may actually increase the number of RPCs instead of reducing them. So I think, we should test with large set of container Ids specially when a cluster will have 4-5 millions CLOSED containers and record the SCM latency and impact. Because that was the whole objective behind why this JIRA was raised. And based on impact, we should think of innovative way to handle impact on SCM as well as rpc message length also should not exceed default 128 MB.
|
|
||
| public boolean syncWithSCMContainerInfo() | ||
| throws IOException { | ||
| throws Exception { |
There was a problem hiding this comment.
The helper already catches all Exception internally and returns false — so the throws Exception is misleading. The facade should declare throws IOException (or nothing if the helper swallows everything).
|
|
||
| import static org.apache.hadoop.fs.CommonConfigurationKeys.IPC_MAXIMUM_DATA_LENGTH; | ||
| import static org.apache.hadoop.fs.CommonConfigurationKeys.IPC_MAXIMUM_DATA_LENGTH_DEFAULT; | ||
| import static org.apache.hadoop.ozone.recon.scm.ReconStorageContainerManagerFacade.CONTAINER_METADATA_SIZE; |
There was a problem hiding this comment.
We are importing CONTAINER_METADATA_SIZE from ReconStorageContainerManagerFacade. Since the helper now owns the getContainerCountPerCall() logic, the constant (and the new CONTAINER_ID_SIZE) should live in the helper or a shared constants class.
| import org.apache.hadoop.ozone.recon.spi.StorageContainerServiceProvider; | ||
| import org.junit.jupiter.api.Test; | ||
|
|
||
| class TestReconStorageContainerSyncHelper { |
There was a problem hiding this comment.
Can we add a test verifying the multi-page pagination path — specifically, that the startContainerId is correctly advanced to lastID + 1 after each batch, and that a second call is made with the updated start ID. A cluster with hundreds of thousands of containers will always hit the paginated path.
What changes were proposed in this pull request?
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14730
How was this patch tested?
Unit tests
Manual testing
@devmadhuu