Kuberenetes hpa patch command not working - kubernetes

I have Kuberenetes cluster hosted in Google Cloud.
I deployed my deployment and added an hpa rule for scaling.
kubectl autoscale deployment MY_DEP --max 10 --min 6 --cpu-percent 60
waiting a minute and run kubectl get hpa command to verify my scale rule - As expected, I have 6 pods running (according to min parameter).
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
MY_DEP Deployment/MY_DEP <unknown>/60% 6 10 6 1m
Now, I want to change the min parameter:
kubectl patch hpa MY_DEP -p '{"spec":{"minReplicas": 1}}'
Wait for 30 minutes and run the command:
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
MY_DEP Deployment/MY_DEP <unknown>/60% 1 10 6 30m
expected replicas: 1, actual replicas: 6
More information:
You can assume that the system has no computing anything (0% CPU
utilization).
I waited for more than an hour. Nothing changed.
The same behavior is seen when i deleted the scaling rule and deployed it
again. The replicas parameter has not changed.
Question:
If I changed the MINPODS parameter to "1" - why I still have 6 pods? How to make Kubernetes to actually change the min pods in my deployment?

If I changed the MINPODS parameter to "1" - why I still have 6 pods?
I believe the answer is because of the <unknown>/60% present in the output. The fine manual states:
Please note that if some of the pod's containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric
and one can see an example of 0% / 50% in the walkthrough page. Thus, I would believe that since kubernetes cannot prove what percentage of CPU is being consumed -- neither above nor below the target -- it takes no action for fear of making whatever the situation is worse.
As for why there is a <unknown>, I would hazard a guess it's the dreaded heapster-to-metrics-server cutover that might be obfuscating that information from the kubernetes API. Regrettably, I don't have first-hand experience testing that theory, in order to offer you concrete steps beyond "see if your cluster is collecting metrics in a place that kubernetes can see them."

Related

Can a deployment kind with a replica count = 1 ever result in two Pods in the 'Running' phase?

From what I understand, with the above configuration, it is possible to have 2 pods that exist in the cluster associated with the deployment. However, the old Pod is guranteed to be in the 'Terminated' state. An example scenario is updating the image version associated with the deployment.
There should not be any scenario where there are 2 Pods that are associated with the deployment and both are in the 'Running' phase. Is this correct?
In the scenarios I tried, for example, Pod eviction or updating the Pod spec. The existing Pod enters 'Terminating' state and a new Pod is deployed.
This is what I expected. Just wanted to make sure that all possible scenarios around updating Pod spec or Pod eviction cannot end up with two Pods in the 'Running' state as it would violate the replica count = 1 config.
It depends on your update strategy. Many times it's desired to have the new pod running and healthy before you shut down the old pod, otherwise you have downtime which may not be acceptable as per business requirements. By default, it's doing rolling updates.
The defaults look like the below, so if you don't specify anything, that's what will be used.
apiVersion: apps/v1
kind: Deployment
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
So usually, you would have a moment where both pods are running. But Kubernetes will terminate the old pod as soon as the new pod becomes ready, so it will be hard, if not impossible, to literally see both in the state ready.
You can read about it in the docs: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
Deployment ensures that only a certain number of Pods are down while they are being updated. By default, it ensures that at least 75% of the desired number of Pods are up (25% max unavailable).
Deployment also ensures that only a certain number of Pods are created above the desired number of Pods. By default, it ensures that at most 125% of the desired number of Pods are up (25% max surge).
For example, if you look at the above Deployment closely, you will see that it first creates a new Pod, then deletes an old Pod, and creates another new one. It does not kill old Pods until a sufficient number of new Pods have come up, and does not create new Pods until a sufficient number of old Pods have been killed. It makes sure that at least 3 Pods are available and that at max 4 Pods in total are available. In case of a Deployment with 4 replicas, the number of Pods would be between 3 and 5.
This is also explained here: https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/
Users expect applications to be available all the time and developers are expected to deploy new versions of them several times a day. In Kubernetes this is done with rolling updates. Rolling updates allow Deployments' update to take place with zero downtime by incrementally updating Pods instances with new ones. The new Pods will be scheduled on Nodes with available resources.
To get the behaviour, described by you, you would set spec.strategy.type to Recreate.
All existing Pods are killed before new ones are created when .spec.strategy.type==Recreate.

Can a node autoscaler automatically start an extra pod when replica count is 1 & minAvailable is also 1?

our autoscaling (horizontal and vertical) works pretty fine, except the downscaling is not working somehow (yeah, we checked the usual suspects like https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#i-have-a-couple-of-nodes-with-low-utilization-but-they-are-not-scaled-down-why ).
Since we want to save resources and have pods which are not ultra-sensitive, we are setting following
Deployment
replicas: 1
PodDisruptionBudget
minAvailable: 1
HorizontalPodAutoscaler
minReplicas: 1
maxReplicas: 10
But it seems now that this is the problem that the autoscaler is not scaling down the nodes (even though the node is only used by 30% by CPU + memory and we have other nodes which have absolutely enough memory + cpu to move these pods).
Is it possible in general that the auto scaler starts an extra pod on the free node and removes the old pod from the old node?
Is it possible in general that the auto scaler starts an extra pod on the free node and removes the old pod from the old node?
Yes, that should be possible in general, but in order for the cluster autoscaler to remove a node, it must be possible to move all pods running on the node somewhere else.
According to docs there are a few type of pods that are not movable:
Pods with restrictive PodDisruptionBudget.
Kube-system pods that:
are not run on the node by default
don't have a pod disruption budget set or their PDB is too restrictive >(since CA 0.6).
Pods that are not backed by a controller object (so not created by >deployment, replica set, job, stateful set etc).
Pods with local storage.
Pods that cannot be moved elsewhere due to various constraints (lack of >resources, non-matching node selectors or affinity, matching anti-affinity, etc)
Pods that have the following annotation set:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false
You could check the cluster autoscaler logs, they may provide a hint to why no scale in happens:
kubectl -n kube-system logs -f deployment.apps/cluster-autoscaler
Without having more information about your setup it is hard to guess what is going wrong, but unless you are using local storage, node selectors or affinity/anti-affinity rules etc Pod disruption policies is a likely candidate. Even if you are not using them explicitly they can still prevent node scale in if they there are pods in the kube-system namespace that are missing pod disruption policies (See this answer for an example of such a scenario in GKE)

HPA could not get CPU metric during GKE node auto-scaling

Cluster information:
Kubernetes version: 1.12.8-gke.10
Cloud being used: GKE
Installation method: gcloud
Host OS: (machine type) n1-standard-1
CNI and version: default
CRI and version: default
During node scaling, HPA couldn't get CPU metric.
At the same time, kubectl top pod and kubectl top node output is:
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
For more details, I'll show you the flow of my problem occurs:
Suddenly many requests arrive at the GKE server. (Using testing tool)
HPA detects current CPU usage above target CPU usage(50%), thus try pod scale up
incrementally.
Insufficient CPU warning occurs when creating pods, thus GKE try node scalie up
incrementally.
Soon the HPA fails to get the metric, and kubectl top node or kubectl top pod
doesn’t get a response.
- At this time one or more OutOfcpu pods are found, and several pods are in
ContainerCreating (from Pending state).
After node scale-up is complete and some time has elapsed (about a few minutes),
HPA starts to fetch the CPU metric successfully and try to scale up/down based on
metric.
Same situation happens when node scale down.
This causes pod scaling to stop and raises some failures on responding to client’s requests. Is this normal?
I think HPA should get CPU metric(or other metrics) on running pods even during node scaling, to keep track of the optimal pod size at the moment. So when node scaling done, HPA create the necessary pods at once (rather than incrementally).
Can I make my cluster work like this?
Maybe your node runs out of one resource either memory or cpu, there are config maps that describe how addons are scaled depending on the cluster size. You need to edit metrics-server-config config map in kube-system namespace:
kubectl edit cm/metrics-server-config -n kube-system
you should add
baseCPU
cpuPerNode
baseMemory
memoryPerNode
to NannyConfiguration, here you can find extensive manual:
Also heapster suffers from the same OOM issue: too many pods to handle all metrics within assigned resources please modify heapster's config map in accordingly:
kubectl edit cm/heapster-config -n kube-system

Doesn't Kubernetes honor HPA configuration when we execute "kubectl scale deploy"?

The Scenario:
I have deployed a service using helm chart, I can see my service, hpa, deployment, pods etc.
In my hpa setting: the min pod count is set to 1.
I can see my Pod is running and able to handle service request.
After a while ---
I have executed -- "kubectl scale deploy --replicas=0"
Once I run the above above command I can see my pod got deleted (although the hpa min pod setting was set to 1), I was expecting after a while hpa will scale up to the min pod count i.e. 1.
However I don't see that happened, I have waited more than an hour and no new pod created by hpa.
I have also tried sending a request to my Kubernetes service and I was thinking now hpa will scale up the pod, since there is no pod to serve the request, however the hps doesn't seem to do that, and I got a response that my Service is not available.
Here is what I can see in kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE**
test Deployment/xxxx /1000% 1 4 0 1h
Interestingly I found that hpa scale down quickly: when I execute "kubectl scale deploy --replicas=2" (please note that the in hpa count is 1), I can see 2 pods gets created quickly however within 5 mins, 1 pod gets removed by hpa.
Is this is expected behavior of Kubernetes (particularly hpa) ?
as in, if we delete all pods by executing --"kubectl scale deploy --replicas=0",
a) the hpa won't block to reduce the replica count less than pod count configured (in hpa config) and
b) the hpa won't scale up (based on the hpa spinning cycle) to the min number of pods as configured.
and essentially c) until we redeploy or execute another round of "kubectl scale deploy" to update the replica count there will be no pods for this service.
Is this expected behavior or a (possible) bug in the Kubernetes codebase ?
I am using Kubernetes 1.8 version.
That was great observation. I was going through documentation of HPA and come across mathematical formula used by HPA to scale pods .and it looks like
TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target)
In your case, current pod utilization is zero as your pods count is zero . So mathematically this equation result into zero. So this this is a reason HPA is not working if pod count is zero.
a: HPA should not block manual scaling of pods as it get trigger only from resources (cpu, memory etc). Once you do scaling using "kubectl scale" or by any other means then HPA will come into picture depending on min, max replica and avg utilization value.
b: HPA scales up to min number of replicas if current count is non zero. I tried it and its working perfectly fine.
c: Yes unless you bring replica count to non-zero value, HPA will not work. So you have to scale up to some non zero value.
Hope this answers your doubts about HPA.

Autoscaling in Google Container Engine

I understand the Container Engine is currently on alpha and not yet complete.
From the docs I assume there is no auto-scaling of pods (e.g. depending on CPU load) yet, correct? I'd love to be able to configure a replication controller to automatically add pods (and VM instances) when the average CPU load reaches a defined threshold.
Is this somewhere on the near future roadmap?
Or is it possible to use the Compute Engine Autoscaler for this? (if so, how?)
As we work towards a Beta release, we're definitely looking at integrating the Google Compute Engine AutoScaler.
There are actually two different kinds of scaling:
Scaling up/down the number of worker nodes in the cluster depending on # of containers in the cluster
Scaling pods up and down.
Since Kubernetes is an OSS project as well, we'd also like to add a Kubernetes native autoscaler that can scale replication controllers. It's definitely something that's on the roadmap. I expect we will actually have multiple autoscaler implementations, since it can be very application specific...
Kubernetes autoscaling: http://kubernetes.io/docs/user-guide/horizontal-pod-autoscaling/
kubectl command: http://kubernetes.io/docs/user-guide/kubectl/kubectl_autoscale/
Example:
kubectl autoscale deployment foo --min=2 --max=5 --cpu-percent=80
You can autoscale your deployment by using kubectl autoscale.
Autoscaling is actually when you want to modify the number of pods automatically as the requirement may arise.
kubectl autoscale deployment task2deploy1 –cpu-percent=50 –min=1 –max=10
kubectl get deployment task2deploy1
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
task2deploy1 1 1 1 1 49s
As the resource consumption increases the number of pods will increase and will be more than the number of pods you specified in your deployment.yaml file but always less than the maximum number of pods specified in the kubectl autoscale command.
kubectl get deployment task2deploy1
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
task2deploy1 7 7 7 3 4m
Similarly, as the resource consumption decreases, the number of pods will go down but never less than the number of minimum pods specified in the kubectl autoscale command.