Is it possible to scale down a pod to 0 replicas when other pod is down?I'm familiar with the basics of the Horizontal Auto-Scaling concept, but as I understand it scales pod up or down only when demands for resources (CPU, memory) changes.
My CI pipeline follows a green/blue pattern, so when the new version of the application is being deployed the second one is scaled down to 0 replicas, leaving other pods belonging to the same environment up wasting resources. Do you have any idea how to solve it using kubernetes or helm features?
Thanks
If you have a CI pipeline you can just run the kubectl command and scale down the deployment before deploying the blue-green this way no resource wasting will be there.
However yes, you can scale UP/DOWN the deployment or application based on the custom metrics.
i would recommend you checking out Cloud-native project Keda : https://keda.sh/
Keda:
KEDA is a Kubernetes-based Event Driven Autoscaler. With KEDA, you can
drive the scaling of any container in Kubernetes based on the number
of events needing to be processed.
Example
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: {scaled-object-name}
spec:
scaleTargetRef:
deploymentName: {deployment-name} # must be in the same namespace as the ScaledObject
containerName: {container-name} #Optional. Default: deployment.spec.template.spec.containers[0]
pollingInterval: 30 # Optional. Default: 30 seconds
cooldownPeriod: 300 # Optional. Default: 300 seconds
minReplicaCount: 0 # Optional. Default: 0
maxReplicaCount: 100 # Optional. Default: 100
triggers:
# {list of triggers to activate the deployment}
Scale object ref : https://keda.sh/docs/1.4/concepts/scaling-deployments/#scaledobject-spec
Related
Helo i am running a .NET application in Azure Kubernetes Services as a 3 pod cluster (1 pod per node).
I am trying to understand how can i make my cluster elastic depending on load ?
How can i configure the deployment.yaml so that after a certain % of the cpu utilization and/or % of memory per pod it spawns another pod? The same thing when load decreases, how do i shut down instances.
Is there any guide/tutorial to set this up based on percentage (ideally) ?
The basic feature you need to use is called HorizontalPodAutoscaler or for short HPA. There you can configure cpu or memory limits and if the limit is exceeded, the pod replica number will be increased. E.g. from this walkthrough:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
This will scale out the php-apache deployment, as soon as the pods cpu utilization is greater than 50 %. Be aware that calculating the resource utilization and the resulting number of replicas is not as intuitive, as it might seam. Also see docs (the whole page should be quite interesting too). You can also combine criteria for scale out.
There are also addons that help you scale based on other parameters, like the number of messages in a queue. Check out keda, they provide different scalers, like RabbitMQ, Kafka, AWS CloudWatch, Azure Monitor, etc.
And since you wrote
1 pod per node
you might be running a DaemonSet. In that case your only option to scale out would be to add additional nodes, since with daemonsets there is always exactly one pod per node. If that's the case you could think about using a Deployment combined with a PodAntiAffinity instead, see docs. By that you can configure pods to preferably run on nodes where pods of the same deployment are not running yet, e.g.:
[...]
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
topologyKey: topology.kubernetes.io/zone
[...]
From docs:
The anti-affinity rule says that the scheduler should try to avoid scheduling the Pod onto a node that is in the same zone as one or more Pods with the label security=S2. More precisely, the scheduler should try to avoid placing the Pod on a node that has the topology.kubernetes.io/zone=R label if there are other nodes in the same zone currently running Pods with the Security=S2 Pod label.
That would make scale out more flexible as it is with a daemonset, yet you have a similar effect of pods being equally distributed through out the cluster.
If you want/need to stick to a daemonset you can check out the AKS Cluster Autoscaler, that can be used to automatically add/remove additional nodes from your cluster, based on resource consumption.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: testingHPA
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: my_app
minReplicas: 3
maxReplicas: 5
targetCPUUtilizationPercentage: 85
Above is the normal hpa.yaml structure, is it possible to use kind as a pod and auto scale it ??
As already pointed by others, it is not possible to set Pod as the Kind object as the target resource for an HPA.
The document describes HPA as:
The Horizontal Pod Autoscaler automatically scales the number of Pods
in a replication controller, deployment, replica set or stateful set
based on observed CPU utilization (or, with custom metrics support, on
some other application-provided metrics). Note that Horizontal Pod
Autoscaling does not apply to objects that can't be scaled, for
example, DaemonSets.
The document also described how the algorithm is implemented at the backend as:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
and since the Pod resource does not have the replicas field as part of its spec therefore we can conclude that the same is not supported for auto scaling using the HPA.
A single Pod is only ever one Pod. It does not have any mechanism for horizontal scaling because it is that mechanism for everything else.
I have a cluster with Cluster Autoscaler activated and HPA for one of my deployments.
This is the HPA definition:
kind: HorizontalPodAutoscaler
metadata:
name: hpa-resource-metrics-cpu
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: ReplicationController
name: hello-hpa-cpu
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 50
Now in a situation where my cluster is being used very lightly, that means this deployment will only have 1 available replica.
And since the cluster is not under high usage, it could be the case that the node containing that replica is scheduled for deletion (downscaling).
In that case, it would make my deployment have a downtime (when the cluster node is deleted, the only replica for the deployment is deleted as well, so it needs to be rescheduled in a new pod). I don't want that to happen (the downtime).
From this issue: https://github.com/kubernetes/kubernetes/issues/48307, it seems that Pod Disruption Budgets are not applicable to deployments with only 1 replica.
So the only solution to my problem would be to have minReplicas set to 2?
Or is there something else I could do to prevent this downtime, and still let minReplicas as 1?
Kubernetes has the notion of a disruption. The cluster autoscaler (or an administrator) taking a node offline is a "voluntary" disruption (as distinct from, say, the node losing power) and so you have some control over it. If you create a pod disruption budget:
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: hello-pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: hello
You have specified that there shouldn't be fewer than one pod, with a label app: hello, when the cluster tries to perform a voluntary disruption.
Doing this can prevent the cluster autoscaler from actually deleting the node. The examples in the PDB documentation generally have multiple replicas and can tolerate some of them being offline, so it's possible to delete 1 replica of 3 and recreate it on a different node. There is an extended example where there's not capacity in the cluster to start a rescheduled pod, and this blocks destroying a node. You might set the HPA to minReplicas: 3 to avoid this case, even if it means your system will be overprovisioned at the quietest times.
On k8s cluster (GCP) during nodes auto-scaling, my pods are rescheduled automatically. The main problem that they perform computations and keep results in memory during auto-scaling. Because of rescheduling, pods lose all results and tasks.
I want to disable rescheduling for specified pods. I know a few possible solutions:
nodeSelector (not very flexible due to the dynamic nature of a cluster)
pod disruption budget PDB
I have tried PDB and set minAvailable = 1 but it didn't work. I found that you can also set maxUnavailable=0, will it more effective? I didn't understand exactly the behaviour if maxUnavailable when it's set to 0. Could you explain it more? Thank you!
Link for more details - https://github.com/dask/dask-kubernetes/issues/112
Setting max unavailable to 0 is a way to go and also, using nodepools can be a good workaround.
gcloud container node-pools create <nodepool> --node-taints=app=dask-scheduler:NoSchedule
gcloud container node-pools create <nodepool> --node-labels app=dask-scheduler
This will create the nodepool with the label app=dask-scheduler, after in the pod spec, you can do this:
nodeSelector:
app: dask-scheduler
And put the dask scheduler on a node-pool that doesn't autoscale.
There's an object called PDB where in its spec you can set maxUnavailable
in the example of maxUnavailable=1, this means if you had 100 pods defined, always make sure there is only one removed/drained/re-scheduled at a time
in the case of maxUnavailable, if you have 2 pods, and you set maxUnavailable to 0, it will never remove your pods. It being the scheduler
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: zookeeper
Are you specifying resource requests and limits?
I have an application with some endpoints that are quite CPU intensive. Because of that I have configured a Horizontal Pod Autoscaler like this:
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: DeploymentConfig
name: some-app
targetCPUUtilizationPercentage: 30
The point is, supposing there's a request on a pod that keeps it working at 100% CPU for 5 mins. I takes two minutes until Openshift/Kubernetes schedules new pods.
Is there a way to speed up this process? It forces us to be almost unresponsive for two minutes.
The same thing happens to downscale, having to wait for two minutes until it destroys the unnecessary pods.
Ideally there should be some config option to set this up.
For OpenShift, Please modify /etc/origin/master/master-config.yaml
kubernetesMasterConfig:
controllerArguments:
horizontal-pod-autoscaler-downscale-delay: 2m0s
horizontal-pod-autoscaler-upscale-delay: 2m0s
and restart openshift master.
It's not a scalar, you should set it like this
kubernetesMasterConfig:
controllerArguments:
horizontal-pod-autoscaler-downscale-delay:
- 2m0s
horizontal-pod-autoscaler-upscale-delay:
- 2m0s
At least in OpenShift Origin v3.11