CKS cluster remains in Alert state if the scaling fails due to capacity issue on the hypervisor host

### problem

CKS cluster remains in Alert state if the scaling fails due to capacity issue on the hypervisor host

### versions

ACS 4.22

### The steps to reproduce the bug

Have cloudstack environment with 2 kvm host in a cluster 

1. Launch a Cks cluster with size 2 ( worker nodes) 

Worker nodes deployed on kvm host 2 

2. CKS cluster in running state 

3. Deploy other vm's in the cloudstack environment  so that capacity of kvm host have reached 

4. Scale the CKS cluster to size 3 

5. Scaling of the CKS cluster fails due to capacity issue 

The new worker node will be in stopped state 

6. CKS cluster will be in Alert state 

```

2026-02-24 11:12:14,223 DEBUG [c.c.k.c.KubernetesClusterManagerImpl] (Kubernetes-Cluster-State-Scanner-1:[ctx-c196e036]) (logid:43979d1a) Found VM: VM instance {"id":16,"instanceName":"i-2-16-VM","state":"Stopped","type":"User","uuid":"47386d74-3c9f-49aa-b102-1c10537c8350"} in the Kubernetes cluster KubernetesCluster {"id":2,"name":"test","uuid":"e155ab23-68ca-4c3e-b8c5-7175a3f65fda"} in state: Stopped while expected to be in state: Running. So moving the cluster to Alert state for reconciliation
2026-02-24 11:12:14,224 DEBUG [c.c.k.c.KubernetesClusterManagerImpl] (Kubernetes-Cluster-State-Scanner-1:[ctx-c196e036]) (logid:43979d1a) Found VM: VM instance {"id":9,"instanceName":"i-2-9-VM","state":"Running","type":"User","uuid":"ebf0a5a6-01b7-462a-bad6-1f61887f0f41"} in the Kubernetes cluster KubernetesCluster {"id":2,"name":"test","uuid":"e155ab23-68ca-4c3e-b8c5-7175a3f65fda"} in state: Running while expected to be in state: Stopped. So moving the cluster to Alert state for reconciliation
```

7. Cannot remove the worker node which is stopped state 

Exception thrown 

<img width="1623" height="528" alt="Image" src="https://github.com/user-attachments/assets/32b8bad2-db9c-4686-ac57-3c26b9f9d378" />


<img width="1628" height="719" alt="Image" src="https://github.com/user-attachments/assets/31f3768f-b5c0-4b7c-9f07-c48da3debe42" />

<img width="1571" height="178" alt="Image" src="https://github.com/user-attachments/assets/905e2efd-9de3-46f4-8cc8-08e3cd669d32" />

### What to do about it?

CKS cluster should go back to running state since the scaling failed  due to insufficent capacity issue 

Currently, we are checking only for resource limit during scaling operation with this pr 

https://github.com/apache/cloudstack/pull/12167

We should also check host capacity before scaling 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CKS cluster remains in Alert state if the scaling fails due to capacity issue on the hypervisor host #12699

problem

versions

The steps to reproduce the bug

What to do about it?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CKS cluster remains in Alert state if the scaling fails due to capacity issue on the hypervisor host #12699

Description

problem

versions

The steps to reproduce the bug

What to do about it?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions