k8s replicaSet status is confusing - kubernetes

I have two services:
bayonetta: backend clusterIp service, replicaset=2
hide: frontend nodePort service, replicaset=1
I ran kubectl get all. I see the line 3 and 4 of replicaSet section has everything as 0, why do we have those two lines when nothing is available?
replicaset.apps/bayonetta-deployment-5b75868d89 2 2 2 3h36m
replicaset.apps/bayonetta-deployment-5c65f74c8b 0 0 0 176m
replicaset.apps/hide-deployment-575b6bc68d 0 0 0 3h12m
replicaset.apps/hide-deployment-66d955986b 1 1 1 155m

You probably updated your Deployments, which results in scaling up new ReplicaSets and scaling down the existing ones. See the Kubernetes docs here, with the example:
Run kubectl get rs to see that the Deployment updated the Pods by creating a new ReplicaSet and scaling it up to 3 replicas, as well as scaling down the old ReplicaSet to 0 replicas.
kubectl get rs
The output is similar to this:
NAME DESIRED CURRENT READY AGE
nginx-deployment-1564180365 3 3 3 6s
nginx-deployment-2035384211 0 0 0 36s

K8S maintains multiple versions of ReplicationSets, this enables the rollback of a Deployment because of a bug or some other reason. More about it here (1). K8S maintains revisionHistoryLimit number of ReplicationSets which defaults to 10 (2).

Related

Deploy a feature with zero downtime

How do you deploy a feature with zero downtime in Kubernetes?
kubectl run nginx --image=nginx # creates a deployment
○ → kubectl get deploy
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx 1 1 1 0 7s
Now let’s assume we are going to update the nginx image
kubectl set image deployment nginx nginx=nginx:1.15 # updates the image
Now when we check the replica sets
kubectl get replicasets # get replica sets
NAME DESIRED CURRENT READY AGE
nginx-65899c769f 0 0 0 7m
nginx-6c9655f5bb 1 1 1 13s
From the above, we can notice that one more replica set was added and then the other replica set was brought down
kubectl rollout status deployment nginx
check the status of a deployment rollout
kubectl rollout history deployment nginx
check the revisions in a deployment
○ → kubectl rollout history deployment nginx
deployment.extensions/nginx
REVISION CHANGE-CAUSE
1
2
You should use strategy as rolling update with max surge and max unavailable defined
for morer information go here https://kubernetes.io/docs/concepts/workloads/controllers/deployment/

Start one pod at a time when replica is greater than one

Is there a way to ensure that pods are scaled one at a time when setting replica greater than one?
Example: Replica set to 3
Pod 1 - Initializing , pod 2 - Waiting, pod 3 - Waiting
Pod 1 - Running , pod 2 - Initializing, pod 3 - Waiting
Pod 1 - Running , pod 2 - Running, pod 3 - Initializing
Pod 1 - Running , pod 2 - Running, pod 3 - Running
You can acomplish this behavior using StatefulSets. As it goes from Kubernetes docs
For a StatefulSet with N replicas, when Pods are being deployed, they are created sequentially, in order from {0..N-1}.
When Pods are being deleted, they are terminated in reverse order, from {N-1..0}.
Before a scaling operation is applied to a Pod, all of its predecessors must be Running and Ready.
Before a Pod is terminated, all of its successors must be completely shutdown.
So, as you can see here, new pod is not booted up until previous one is initializing.
Note: this behavior is guranteed by Kubernetes when OrderedReady pod management policy is used (which is default).

HPA not scaling down

I hope you can shed some light on this.
I am facing the same issue as described here: Kubernetes deployment not scaling down even though usage is below threshold
My configuration is almost identical.
I have checked the hpa algorithm, but I cannot find an explanation for the fact that I am having only one replica of my-app3.
Any hints?
kubectl get hpa -A
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-ns1 my-app1 Deployment/my-app1 49%/75%, 2%/75% 1 10 2 20h
my-ns2 my-app2 Deployment/my-app2 50%/75%, 10%/75% 1 10 2 22h
my-ns2 my-app3 Deployment/my-app3 47%/75%, 10%/75% 1 10 1 22h
kubectl top po -A
NAMESPACE NAME CPU(cores) MEMORY(bytes)
my-ns1 pod-app1-8d694bc8f-mkbrh 1m 76Mi
my-ns1 pod-app1-8d694bc8f-qmlnw 1m 72Mi
my-ns2 pod-app2-59d895d96d-86fgm 1m 77Mi
my-ns2 pod-app2-59d895d96d-zr67g 1m 73Mi
my-ns2 pod-app3-6f8cbb68bf-vdhsd 1m 47Mi
Posting this answer as it could be beneficiary for the community members on why exactly Horizontal Pod Autoscaler decided not to scale the amount of replicas in this particular setup.
The formula for amount of replicas workload will have is:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
Following on the describe of HPA:
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-ns1 my-app1 Deployment/my-app1 49%/75%, 2%/75% 1 10 2 20h
my-ns2 my-app2 Deployment/my-app2 50%/75%, 10%/75% 1 10 2 22h
my-ns2 my-app3 Deployment/my-app3 47%/75%, 10%/75% 1 10 1 22h
HPA decides on the amount of replicas on the premise of their current amount.
A side note: In the setup that uses multiple metrics (for example CPU and RAM) it will use the higher metric and act accordingly.
Also please consider that downscaling has a cooldown.
Calculation on each of the Deployments
ceil[] - round a number up:
ceil(4,55) = 5
ceil(4,01) = 5
app1:
Replicas = ceil[2 * (49 / 75)]
Replicas = ceil[2 * 0,6533..]
Replicas = ceil[1,3066..]
Replicas = 2
This example shows that there will be no changes to be amount of replicas.
Amount of replicas would go:
Up when the currentMetricValue (49) would exceed the desiredMetricValue (75)
Down when the currentMetricValue (49) would be less than half of the desiredMetricValue (75)
app2 is in the same situation as app1 so it can be skipped
app3:
Replicas = ceil[1 * (49 / 75)]
Replicas = ceil[1 * 0,6266..]
Replicas = ceil[0,6266..]
Replicas = 1
This example also shows that there will be no changes to be amount of replicas.
Amount of replicas would go:
Up when the currentMetricValue (47) would exceed the desiredMetricValue (75)
Additional resources:
Kubernetes.io: Docs: Tasks: Run application: Horizontal pod autoscale
Indeed from my research it seems that the HPA algorithm works in this way:
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details
Do not know the reason why my-app3 was assigned one replica and the other two apps two replicas, but according to the algorithm it is not needed to scale out at this time.

Kuberenetes hpa patch command not working

I have Kuberenetes cluster hosted in Google Cloud.
I deployed my deployment and added an hpa rule for scaling.
kubectl autoscale deployment MY_DEP --max 10 --min 6 --cpu-percent 60
waiting a minute and run kubectl get hpa command to verify my scale rule - As expected, I have 6 pods running (according to min parameter).
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
MY_DEP Deployment/MY_DEP <unknown>/60% 6 10 6 1m
Now, I want to change the min parameter:
kubectl patch hpa MY_DEP -p '{"spec":{"minReplicas": 1}}'
Wait for 30 minutes and run the command:
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
MY_DEP Deployment/MY_DEP <unknown>/60% 1 10 6 30m
expected replicas: 1, actual replicas: 6
More information:
You can assume that the system has no computing anything (0% CPU
utilization).
I waited for more than an hour. Nothing changed.
The same behavior is seen when i deleted the scaling rule and deployed it
again. The replicas parameter has not changed.
Question:
If I changed the MINPODS parameter to "1" - why I still have 6 pods? How to make Kubernetes to actually change the min pods in my deployment?
If I changed the MINPODS parameter to "1" - why I still have 6 pods?
I believe the answer is because of the <unknown>/60% present in the output. The fine manual states:
Please note that if some of the pod's containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric
and one can see an example of 0% / 50% in the walkthrough page. Thus, I would believe that since kubernetes cannot prove what percentage of CPU is being consumed -- neither above nor below the target -- it takes no action for fear of making whatever the situation is worse.
As for why there is a <unknown>, I would hazard a guess it's the dreaded heapster-to-metrics-server cutover that might be obfuscating that information from the kubernetes API. Regrettably, I don't have first-hand experience testing that theory, in order to offer you concrete steps beyond "see if your cluster is collecting metrics in a place that kubernetes can see them."

Kubernetes monitoring service heapster keeps restarting

I am running a kubernetes cluster using azure's container engine. I have an issue with one of the kubernetes services, the one that does resource monitoring heapster. The pod is relaunched every minute or something like that. I have tried removing the heapster deployment, replicaset and pods, and recreate the deployment. It goes back the the same behaviour instantly.
When I look at the resources with the heapster label it looks a little bit weird:
$ kubectl get deploy,rs,po -l k8s-app=heapster --namespace=kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/heapster 1 1 1 1 17h
NAME DESIRED CURRENT READY AGE
rs/heapster-2708163903 1 1 1 17h
rs/heapster-867061013 0 0 0 17h
NAME READY STATUS RESTARTS AGE
po/heapster-2708163903-vvs1d 2/2 Running 0 0s
For some reason there is two replica sets. The one called rs/heapster-867061013 keeps reappearing even when I delete all of the resources and redeploy them. The above also shows that the pod just started, and this is the issue it keeps getting created then it runs for some seconds and a new one is created. I am new to running kubernetes so I am unsure which logfiles are relevant to this issue.
Logs from heapster container
heapster.go:72] /heapster source=kubernetes.summary_api:""
heapster.go:73] Heapster version v1.3.0
configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version v1
configs.go:62] Using kubelet port 10255
heapster.go:196] Starting with Metric Sink
heapster.go:106] Starting heapster on port 8082
Logs from heapster-nanny container
pod_nanny.go:56] Invoked by [/pod_nanny --cpu=80m --extra-cpu=0.5m --memory=140Mi --extra-memory=4Mi --threshold=5 --deployment=heapster --container=heapster --poll-period=300000 --estimator=exponential]
pod_nanny.go:68] Watching namespace: kube-system, pod: heapster-2708163903-mqlsq, container: heapster.
pod_nanny.go:69] cpu: 80m, extra_cpu: 0.5m, memory: 140Mi, extra_memory: 4Mi, storage: MISSING, extra_storage: 0Gi
pod_nanny.go:110] Resources: [{Base:{i:{value:80 scale:-3} d:{Dec:<nil>} s:80m Format:DecimalSI} ExtraPerNode:{i:{value:5 scale:-4} d:{Dec:<nil>} s: Format:DecimalSI} Name:cpu} {Base:{i:{value:146800640 scale:0} d:{Dec:<nil>} s:140Mi Format:BinarySI} ExtraPerNode:{i:{value:4194304 scale:0} d:{Dec:<nil>} s:4Mi Format:BinarySI} Name:memory}]
It is completely normal and important that the Deployment Controller keeps old ReplicaSet resources in order to do fast rollbacks.
A Deployment resource manages ReplicaSet resources. Your heapster Deployment is configured to run 1 pod - this means it will always try to create one ReplicaSet with 1 pod. In case you make an update to the Deployment (say, a new heapster version), then the Deployment resource creates a new ReplicaSet which will schedule a pod with the new version. At the same time, the old ReplicaSet resource sets its desired pods to 0, but the resource itself is still kept for easy rollbacks. As you can see, the old ReplicaSet rs/heapster-867061013 has 0 pods running. In case you make a rollback, the Deployment deploy/heapster will increase the number of pods in rs/heapster-867061013 to 1 and decrease the number in rs/heapster-2708163903 back to 0. You should also checkout the documentation about the Deployment Controller (in case you haven't done it yet).
Still, it seems odd to me why your newly created Deployment Controller would instantly create 2 ReplicaSets. Did you wait a few seconds (say, 20) after deleting the Deployment Controller and before creating a new one? For me it sometimes takes some time before deletions propagate throughout the whole cluster and if I recreate too quickly, then the same resource is reused.
Concerning the heapster pod recreation you mentioned: pods have a restartPolicy. If set to Never, the pod will be recreated by its ReplicaSet in case it exits (this means a new pod resource is created and the old one is being deleted). My guess is that your heapster pod has this Never policy set. It might exit due to some error and reach a Failed state (you need to check that with the logs). Then after a short while the ReplicaSet creates a new pod.
OK, so it happens to be a problem in the azure container service default kubernetes configuration. I got some help from an azure supporter.
The problem is fixed by adding the label addonmanager.kubernetes.io/mode: EnsureExists to the heapster deployment. Here is the pull request that the supporter referenced: https://github.com/Azure/acs-engine/pull/1133