-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
Description
problem
CKS cluster remains in Alert state if the scaling fails due to capacity issue on the hypervisor host
versions
ACS 4.22
The steps to reproduce the bug
Have cloudstack environment with 2 kvm host in a cluster
- Launch a Cks cluster with size 2 ( worker nodes)
Worker nodes deployed on kvm host 2
-
CKS cluster in running state
-
Deploy other vm's in the cloudstack environment so that capacity of kvm host have reached
-
Scale the CKS cluster to size 3
-
Scaling of the CKS cluster fails due to capacity issue
The new worker node will be in stopped state
- CKS cluster will be in Alert state
2026-02-24 11:12:14,223 DEBUG [c.c.k.c.KubernetesClusterManagerImpl] (Kubernetes-Cluster-State-Scanner-1:[ctx-c196e036]) (logid:43979d1a) Found VM: VM instance {"id":16,"instanceName":"i-2-16-VM","state":"Stopped","type":"User","uuid":"47386d74-3c9f-49aa-b102-1c10537c8350"} in the Kubernetes cluster KubernetesCluster {"id":2,"name":"test","uuid":"e155ab23-68ca-4c3e-b8c5-7175a3f65fda"} in state: Stopped while expected to be in state: Running. So moving the cluster to Alert state for reconciliation
2026-02-24 11:12:14,224 DEBUG [c.c.k.c.KubernetesClusterManagerImpl] (Kubernetes-Cluster-State-Scanner-1:[ctx-c196e036]) (logid:43979d1a) Found VM: VM instance {"id":9,"instanceName":"i-2-9-VM","state":"Running","type":"User","uuid":"ebf0a5a6-01b7-462a-bad6-1f61887f0f41"} in the Kubernetes cluster KubernetesCluster {"id":2,"name":"test","uuid":"e155ab23-68ca-4c3e-b8c5-7175a3f65fda"} in state: Running while expected to be in state: Stopped. So moving the cluster to Alert state for reconciliation
- Cannot remove the worker node which is stopped state
Exception thrown
What to do about it?
CKS cluster should go back to running state since the scaling failed due to insufficent capacity issue
Currently, we are checking only for resource limit during scaling operation with this pr
We should also check host capacity before scaling
Reactions are currently unavailable