Can a deployment kind with a replica count = 1 ever result in two Pods in the 'Running' phase? - kubernetes

From what I understand, with the above configuration, it is possible to have 2 pods that exist in the cluster associated with the deployment. However, the old Pod is guranteed to be in the 'Terminated' state. An example scenario is updating the image version associated with the deployment.
There should not be any scenario where there are 2 Pods that are associated with the deployment and both are in the 'Running' phase. Is this correct?
In the scenarios I tried, for example, Pod eviction or updating the Pod spec. The existing Pod enters 'Terminating' state and a new Pod is deployed.
This is what I expected. Just wanted to make sure that all possible scenarios around updating Pod spec or Pod eviction cannot end up with two Pods in the 'Running' state as it would violate the replica count = 1 config.

It depends on your update strategy. Many times it's desired to have the new pod running and healthy before you shut down the old pod, otherwise you have downtime which may not be acceptable as per business requirements. By default, it's doing rolling updates.
The defaults look like the below, so if you don't specify anything, that's what will be used.
apiVersion: apps/v1
kind: Deployment
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
So usually, you would have a moment where both pods are running. But Kubernetes will terminate the old pod as soon as the new pod becomes ready, so it will be hard, if not impossible, to literally see both in the state ready.
You can read about it in the docs: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
Deployment ensures that only a certain number of Pods are down while they are being updated. By default, it ensures that at least 75% of the desired number of Pods are up (25% max unavailable).
Deployment also ensures that only a certain number of Pods are created above the desired number of Pods. By default, it ensures that at most 125% of the desired number of Pods are up (25% max surge).
For example, if you look at the above Deployment closely, you will see that it first creates a new Pod, then deletes an old Pod, and creates another new one. It does not kill old Pods until a sufficient number of new Pods have come up, and does not create new Pods until a sufficient number of old Pods have been killed. It makes sure that at least 3 Pods are available and that at max 4 Pods in total are available. In case of a Deployment with 4 replicas, the number of Pods would be between 3 and 5.
This is also explained here: https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/
Users expect applications to be available all the time and developers are expected to deploy new versions of them several times a day. In Kubernetes this is done with rolling updates. Rolling updates allow Deployments' update to take place with zero downtime by incrementally updating Pods instances with new ones. The new Pods will be scheduled on Nodes with available resources.
To get the behaviour, described by you, you would set spec.strategy.type to Recreate.
All existing Pods are killed before new ones are created when .spec.strategy.type==Recreate.

Related

Kubernetes AKS PodDisruptionBudget, HorizontalPodAutoscaler & RollingUpdate Interaction - Without Load

My question is similar to this one, about pdb,hpa and drain
Bear in mind that most of the time my pod is not under load, and so has only 1 replica, but at times, it scales up to ~7 or 8 replica
I have the following Kubernetes objects in an AKS cluster:
Deployment with rollingUpdate.maxUnavailable set to 0, rollingUpdate.maxSurge set to 1
PodDisruptionBudget with minUnavailable set to 1.
HorizontalPodAutoscaler setup to allow auto scaling, with bounds 1-10 lower-upper
Cluster auto-scaling is enabled.
When I trigger a update of the cluster (not an update of the service pod)
AKS first triggers a drain on the node that hosts my pod.
The update fails, because at this time, the number of replicas is 1, and the PodDisruptionBudget is 1, so the evict on the pod fails, and so the drain fails, and AKS eventually fails the update after that.
Other pods on the same node work well, there are new nodes created, and new pods scheduled there, and then the old pods, and the old node are terminated, those pods don't specify a PodDisruptionBudget at all.
My question is is this just an AKS specific thing in how they implemented the Cluster Upgrade - do these constraints work in other kubernetes implementations?
I would expect that the drain should allow my pod to be evicted, but scheduled onto another node, creating the new pod first, as per the rollingUpdate strategy stipulated.
the HorizontalPodAutoscaler should allow a second replica to be created by rollingUpdate.maxSurge=1
I am assuming that the rollingUpdate is never called when I upgrade the cluster, though I am not sure.
The quickest solution I can think of is HorizontalPodAutoscaler = 2-10 so that there's never only one left. but this is wasteful for most of the time.
What is my best option?

pod - How to kill or stop only one pod from n replicas of a deployment

I have a testing scenario to check if the API requests are being handled by another pod if one goes down. I know this is the default behaviour, but I want to stimulate the following scenario.
Pod replicas - 2 (pod A and B)
During my API requests, I want to kill/stop only pod A.
During downtime of A, requests should be handled by B.
I am aware that we can restart the deployment and also scale replicas to 0 and again to 2, but this won't work for me.
Is there any way to kill/stop/crash only pod A?
Any help will be appreciated.
If you want to simulate what happens if one of the pods just gets lost, you can scale down the deployment
kubectl scale deployment the-deployment-name --replicas=1
and Kubernetes will terminate all but one of the pods; you should almost immediately see all of the traffic going to the surviving pod.
But if instead you want to simulate what happens if one of the pods crashes and restarts, you can delete the pod
# kubectl scale deployment the-deployment-name --replicas=2
kubectl get pods
kubectl delete pod the-deployment-name-12345-f7h9j
Once the pod starts getting deleted, the Kubernetes Service should route all of the traffic to the surviving pod(s) (those with Running status). However, the pod is managed by a ReplicaSet that wants there to be 2 replicas, so if one of the pods is deleted, the ReplicaSet will immediately create a new one. This is similar to what would happen if the pod crashes and restarts (in this scenario you'd get the same pod and the same node, if you delete the pod it might come back in a different place).
As you mentioned you can manually kill or restart the pod that is the only solution to test the case or else you can try crashing the one single POD but in the end, it will create the same scenario POD will auto restart.
Or else may you can increase the Graceful shutdown period for deployment so this way POD might take time and stay in terminating state for a good amount of time and you can perform the test.
In kubernetes where pods are controlled by the replicaSet, if you kill a pod it will again be recreated. So the only way to do this is to scale down the number of replicas.
Let's say if your deployment had 4 replicas. You can scale down to 3 by running the command below
kubectl scale deployment <deployment-name> --replicas=3
My example is as show below
kubectl scale deployment hello-world --replicas=3
deployment.apps/hello-world scaled

ReplicationController wait for pods to terminate

I'm currently learning Kubernetes and I'm facing a problem with trying to realize a concept using Kubernetes.
I'm looking for something that works like a ReplicationController where I can tell K8s to start 50 replicas. But when I reduce the amount of replicas I need K8s to wait for the pods to terminate by themselves.
I know that there are Jobs but from what I've read it doesn't seem to be the fitting solution, since jobs are kind of a one-time thing. I need to keep the amount of desired pods until I decrease the amount of desired pods.
Basically a behavior like this:
You can use the kind Deployment in background it uses the ReplicationController and ReplicaSets.
ReplicationController is old version while the ReplicaSets is an updated approach to use. In background Kind : Deployment uses.
You can run the number for desired replicas by setting the numbers into the YAML file.
when you scale the deployment it will spin up the number of replicas and at the time of termination, you can again pass the desired replicas.
For example :
kubectl scale deployment test-deployment --replicas=50
Now running replicas are 50 and you want to scale down
kubectl scale deployment test-deployment --replicas=40
You can also check out the HPA
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

POD affinity rule to schedule pods across all nodes

we are running 6 nodes in K8s cluster. Out of 6, 2 of them running RabbitMQ, Redis & Prometheus we have used node-selector & cordon node so no other pods schedule on that particular nodes.
On renaming other 4 nodes application PODs run, we have around 18-19 micro services.
For GKE there is one open issue in K8s official repo regarding auto scale down: https://github.com/kubernetes/kubernetes/issues/69696#issuecomment-651741837 automatically however people are suggesting approach of setting PDB and we that tested on Dev/Stag.
What we are looking for now is to fix PODs on particular node pool which do not scale, as we are running single replicas of some services.
As of now, we thought of using and apply affinity to those services which are running with single replicas and no requirement of scaling.
while for scalable services we won't specify any type of rule so by default K8s scheduler will schedule pod across different nodes, so this way if any node scale down we dont face any downtime for single running replica service.
Affinity example :
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: do-not-scale
operator: In
values:
- 'true'
We are planning to use affinity type preferredDuringSchedulingIgnoredDuringExecution instead of requiredDuringSchedulingIgnoredDuringExecution.
Note : Here K8s is not creating new replica first on another node during node drain (scaledown of any node) as we are running single replicas with rolling update & minAvailable: 25% strategy.
Why: If PodDisruptionBudget is not specified and we have a deployment with one replica, the pod will be terminated and then a new pod will be scheduled on a new node.
To make sure the application will be available during the node draining process we have to specify PodDisruptionBudget and create more replicas. If we have 1 pod with minAvailable: 30% it will refuse to drain node (scaledown).
Please point out a mistake if you are seeing anything wrong & suggest better option.
First of all, defining PodDisruptionBudget makes not much sense whan having only one replica. minAvailable expressed as a percentage is rounded up to an integer as it represents the minimum number of Pods which need to be available all the time.
Keep in mind that you have no guarantee for any High Availability when launching only one-replica Deployments.
Why: If PodDisruptionBudget is not specified and we have a deployment
with one replica, the pod will be terminated and then a new pod will
be scheduled on a new node.
If you didn't explicitely define in your Deployment's spec the value of maxUnavailable, by default it is set to 25%, which being rounded up to an integer (representing number of Pods/replicas) equals 1. It means that 1 out of 1 replicas is allowed to be unavailable.
If we have 1 pod with minAvailable: 30% it will refuse to drain node
(scaledown).
Single replica with minAvailable: 30% is rounded up to 1 anyway. 1/1 should be still up and running so Pod cannot be evicted and node cannot be drained in this case.
You can try the following solution however I'm not 100% sure if it will work when your Pod is re-scheduled to another node due to it's eviction from the one it is currently running on.
But if you re-create your Pod e.g. because you update it's image to a new version, you can guarantee that at least one replica will be still up and running (old Pod won't be deleted unless the new one enters Ready state) by setting maxUnavailable: 0. As per the docs, by default it is set to 25% which is rounded up to 1. So by default you allow that one of your replicas (which in your case happens to be 1/1) becomes unavailable during the rolling update. If you set it to zero, it won't allow the old Pod to be deleted unless the new one becomes Ready. At the same time maxSurge: 2 allows that 2 replicas temporarily exist at the same time during the update.
Your Deployment definition may begin as follows:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 👈
maxSurge: 2
selector:
...
Compare it with this answer, provided by mdaniel, where I originally found it.

Can a Pod be managed by two different ReplicaSets?

3 pods were running under ReplicationController 'rc1', then I deleted only rc1 (not pds) and created a new ReplicaSet 'rs1' with the same label selector of rc1. So as expected rs1 matched the existing pods created but rc1.
After some time, I created the ReplicationController rc2 with the same manifest file as that of rc1. Now, rc1 is spun up new pods instead of referring pods with same labels.
So I was wondering if it is possible that a pod can be scoped under two different ReplicaSets/ReplicationsControllers?
A ReplicaSet purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
So I was wondering if it is possible that a pod can be scoped under two different ReplicaSets/ReplicationsControllers?
The link a ReplicaSet has to its Pods is via the Pods’ metadata.ownerReferences field, which specifies what resource the current object is owned by. All Pods acquired by a ReplicaSet have their owning ReplicaSet’s identifying information within their ownerReferences field. It’s through this link that the ReplicaSet knows of the state of the Pods it is maintaining and plans accordingly.
A ReplicaSet identifies new Pods to acquire by using its selector. If there is a Pod that has no OwnerReference or the OwnerReference is not a Controller and it matches a ReplicaSet’s selector, it will be immediately acquired by said ReplicaSet. That is explained very well (with examples) in the official documentation.
After some time, I created the ReplicationController rc2 with the same manifest file as that of rc1. Now, rc1 is spun up new pods instead of referring pods with same labels.
Please note that a Deployment that configures a ReplicaSet is now the recommended way to set up replication.
A ReplicationController ensures that a specified number of pod replicas are running at any one time. In other words, a ReplicationController makes sure that a pod or a homogeneous set of pods is always up and available.
If there are too many pods, the ReplicationController terminates the extra pods. If there are too few, the ReplicationController starts more pods. Unlike manually created pods, the pods maintained by a ReplicationController are automatically replaced if they fail, are deleted, or are terminated.
Hope that helps.