Kubernetes scale down particular pod - kubernetes

I have a Kubernetes deployment which can have multiple replica pods. I wish to horizontally increase and decrease the pods based on some logic in my python application (not custom metrics in hpa).
I have two ways to this:
Using Horizontal Pod Autoscalar and changing minReplicas, maxReplicas though my application by using kubernetes APIs
Directly updating the "/spec/replicas" field in my deployment using the APIs
Both the above things are working for upscale and downscale.
But, when I scale down, I want to remove a particular Pod, and not any other pod.
If I update the minReplicas maxReplicas in HPA, then it randomly deletes a pod.
Same when I update the /spec/replicas field in the deployment.
How can I delete a particular pod while scaling down?

I am not aware of any way to ensure that a particular pod in a ReplicaSet will be deleted during a scale down. You could achieve this behavior with a StatefulSet which will always delete the last pod on scale down.
For example, if we had a StatefulSet foo that was scaled to 3 we would have pods:
foo-0
foo-1
foo-2
And if we scaled the StatefulSet to 2, the controller would delete foo-2. But note that there are other limitations to be aware of with StatefulSet.

Related

What handles a StatefulSet replication?

If a Deployment uses ReplicaSets to scale Pods up and down, and StatefulSets don't have ReplicaSets...
So, how does it manage to scale Pods up and down? I mean, what resource is responsible? What requests does a StatefulSet make in order to scale?
In short StatefulSet Controller handles statefulset replicas.
A StatefulSet is a Kubernetes API object for managing stateful application workloads. StatefulSets handle the deployment and scaling of sets of Kubernetes pods, providing guarantees about their uniqueness and ordering.
Similar to deployments, StatefulSets manage pods with identical container specifications. They differ in terms of maintaining a persistent identity for each pod. While the pods are all created based on the same spec, they are not interchangeable, so each pod is given a persistent identifier that is maintained through rescheduling.
Benefits of a StatefulSet deployment include:
Unique identifiers—every pod in the StatefulSet is assigned a unique, stable network identity, consisting of a hostname based on the application name and instance number. For example, a StatefulSet for a web application with three instances may have pods labeled web1, web2 and web3.
Persistent storage—every pod has its own stable, persistent volume, either by default or as defined per storage class. When the pods in a cluster are scaled down or deleted, their associated volumes are not lost, and the data persists. Unneeded resources can be purged by scaling down the StatefulSet to 0 before deleting the unused pods.
Ordered deployment and scaling—the pods in a StatefulSet are created and deployed in order, according to their increments. Pods are also shut down in (reverse) order, ensuring that the deployment and runtime are reliable and repeatable. The StatefulSet won’t scale until all every required pod is running, so if a pod fails, it will recreate the pod before it attempts to add more instances as per the scaling requirements.
Automated, ordered updates—a StatefulSets can handle rolling updates, shutting down each node and rebuilding it according to the original order, until every node has been replaced and the older versions cleaned up. The persistent volumes can be reused, so data is migrated to the new version automatically.

Can a node autoscaler automatically start an extra pod when replica count is 1 & minAvailable is also 1?

our autoscaling (horizontal and vertical) works pretty fine, except the downscaling is not working somehow (yeah, we checked the usual suspects like https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#i-have-a-couple-of-nodes-with-low-utilization-but-they-are-not-scaled-down-why ).
Since we want to save resources and have pods which are not ultra-sensitive, we are setting following
Deployment
replicas: 1
PodDisruptionBudget
minAvailable: 1
HorizontalPodAutoscaler
minReplicas: 1
maxReplicas: 10
But it seems now that this is the problem that the autoscaler is not scaling down the nodes (even though the node is only used by 30% by CPU + memory and we have other nodes which have absolutely enough memory + cpu to move these pods).
Is it possible in general that the auto scaler starts an extra pod on the free node and removes the old pod from the old node?
Is it possible in general that the auto scaler starts an extra pod on the free node and removes the old pod from the old node?
Yes, that should be possible in general, but in order for the cluster autoscaler to remove a node, it must be possible to move all pods running on the node somewhere else.
According to docs there are a few type of pods that are not movable:
Pods with restrictive PodDisruptionBudget.
Kube-system pods that:
are not run on the node by default
don't have a pod disruption budget set or their PDB is too restrictive >(since CA 0.6).
Pods that are not backed by a controller object (so not created by >deployment, replica set, job, stateful set etc).
Pods with local storage.
Pods that cannot be moved elsewhere due to various constraints (lack of >resources, non-matching node selectors or affinity, matching anti-affinity, etc)
Pods that have the following annotation set:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false
You could check the cluster autoscaler logs, they may provide a hint to why no scale in happens:
kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler
Without having more information about your setup it is hard to guess what is going wrong, but unless you are using local storage, node selectors or affinity/anti-affinity rules etc Pod disruption policies is a likely candidate. Even if you are not using them explicitly they can still prevent node scale in if they there are pods in the kube-system namespace that are missing pod disruption policies (See this answer for an example of such a scenario in GKE)

ReplicationController wait for pods to terminate

I'm currently learning Kubernetes and I'm facing a problem with trying to realize a concept using Kubernetes.
I'm looking for something that works like a ReplicationController where I can tell K8s to start 50 replicas. But when I reduce the amount of replicas I need K8s to wait for the pods to terminate by themselves.
I know that there are Jobs but from what I've read it doesn't seem to be the fitting solution, since jobs are kind of a one-time thing. I need to keep the amount of desired pods until I decrease the amount of desired pods.
Basically a behavior like this:
You can use the kind Deployment in background it uses the ReplicationController and ReplicaSets.
ReplicationController is old version while the ReplicaSets is an updated approach to use. In background Kind : Deployment uses.
You can run the number for desired replicas by setting the numbers into the YAML file.
when you scale the deployment it will spin up the number of replicas and at the time of termination, you can again pass the desired replicas.
For example :
kubectl scale deployment test-deployment --replicas=50
Now running replicas are 50 and you want to scale down
kubectl scale deployment test-deployment --replicas=40
You can also check out the HPA
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

What is the relationship between the HPA and ReplicaSet in Kubernetes?

I can't seem to find an answer to this but what is the relationship between an HPA and ReplicaSet? From what I know we define a Deployment object which defines replicas which creates the RS and the RS is responsible for supervising our pods and scale up and down.
Where does the HPA fit into this picture? Does it wrap over the Deployment object? I'm a bit confused as you define the number of replicas in the manifest for the Deployment object.
Thank you!
When we create a deployment it create a replica set and number of pods (that we gave in replicas). Deployment control the RS, and RS controls pods. Now, HPA is another abstraction which give the instructions to deployment and through RS make sure the pods fullfil the respective scaling.
As far the k8s doc: The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics). Note that Horizontal Pod Autoscaling does not apply to objects that can't be scaled, for example, DaemonSets.
A brief high level overview is: Basically it's all about controller. Every k8s object has a controller, when a deployment object is created then respective controller creates the rs and associated pods, rs controls the pods, deployment controls rs. On the other hand, when hpa controllers sees that at any moment number of pods gets higher/lower than expected then it talks to deployment.
Read more from k8s doc

How many pods can be configured per deployment in kubernetes?

As per the Kubernetes documentation there is 1:1 correspondence between Deployment and ReplicaSets. Similarly depending on the replicas attribute , a ReplicaSet can manage n number of pods of same nature. Is this a correct understanding ?
Logically (assuming Deployment is a wrapper/Controller) I feel Deployment can have multiple replicaSets and each replicaSet can have multiple Pods (same or different kind). If this statement is correct, can some one share an example K8S template ?
1.) Yes, a Deployment is a ReplicaSet, managed at a higher level.
2.) No, a Deployment can not have multiple ReplicaSets, a Deployment pretty much IS a ReplicaSet. Typically you never use a ReplicaSet directly, Deployment is all you need. And no, you can't have different Pod templates in one Deployment or ReplicaSet. The point of replication is to create copies of the same thing.
As to how many pods can be run per Deployment, the limits aren't really per Deployment, unless specified. Typically you'd either set the wanted number of replicas in the Deployment or you use the Horizontal Pod Autoscaler with a minimum and a maximum number of Pods. And unless Node limits are smaller, the following limits apply:
No more than 100 pods per node
No more than 150000 total pods
https://kubernetes.io/docs/setup/best-practices/cluster-large/
As per the Kubernetes documentation there is 1:1 correspondence between Deployment and ReplicaSets. Similarly depending on the replicas attribute , a ReplicaSet can manage n number of pods of same nature. Is this a correct understanding ?
Yes. It will create no of pods equal to value to the replicas field value.
Deployment manages a replica set, you don't/shouldn't interact with the replica set directly.
Logically (assuming Deployment is a wrapper/Controller) I feel Deployment can have multiple replicaSets and each replicaSet can have multiple Pods (same or different kind). If this statement is correct, can some one share an example K8S template ?
When you do a rolling deployment, it creates a new ReplicaSet with the new pods (updated containers), and scales down the pods running in older replica set.
I guess it does not support running two different ReplicaSets(not deployment updates) with different pod/containers.
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
After the deployment has been updated:
Run:
kubectl describe deployments
Output:
.
.
.
OldReplicaSets: <none>
NewReplicaSet: nginx-deployment-1564180365 (3/3 replicas created)