HPA could not get CPU metric during GKE node auto-scaling - kubernetes

Cluster information:
Kubernetes version: 1.12.8-gke.10
Cloud being used: GKE
Installation method: gcloud
Host OS: (machine type) n1-standard-1
CNI and version: default
CRI and version: default
During node scaling, HPA couldn't get CPU metric.
At the same time, kubectl top pod and kubectl top node output is:
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
For more details, I'll show you the flow of my problem occurs:
Suddenly many requests arrive at the GKE server. (Using testing tool)
HPA detects current CPU usage above target CPU usage(50%), thus try pod scale up
incrementally.
Insufficient CPU warning occurs when creating pods, thus GKE try node scalie up
incrementally.
Soon the HPA fails to get the metric, and kubectl top node or kubectl top pod
doesn’t get a response.
- At this time one or more OutOfcpu pods are found, and several pods are in
ContainerCreating (from Pending state).
After node scale-up is complete and some time has elapsed (about a few minutes),
HPA starts to fetch the CPU metric successfully and try to scale up/down based on
metric.
Same situation happens when node scale down.
This causes pod scaling to stop and raises some failures on responding to client’s requests. Is this normal?
I think HPA should get CPU metric(or other metrics) on running pods even during node scaling, to keep track of the optimal pod size at the moment. So when node scaling done, HPA create the necessary pods at once (rather than incrementally).
Can I make my cluster work like this?

Maybe your node runs out of one resource either memory or cpu, there are config maps that describe how addons are scaled depending on the cluster size. You need to edit metrics-server-config config map in kube-system namespace:
kubectl edit cm/metrics-server-config -n kube-system
you should add
baseCPU
cpuPerNode
baseMemory
memoryPerNode
to NannyConfiguration, here you can find extensive manual:
Also heapster suffers from the same OOM issue: too many pods to handle all metrics within assigned resources please modify heapster's config map in accordingly:
kubectl edit cm/heapster-config -n kube-system

Related

How to implement horizontal auto scaling in GKE autopilot based on a custom metric

I'm running a Kubernetes cluster on GKE autopilot
I have pods that do the following - Wait for a job, run the job (This can take minutes or hours), Then go to Pod Succeeded State which will cause Kubernetes to restart the pod.
The number of pods I need is variable depending on how many users are on the platform. Each user can request a job that needs a pod to run.
I don't want users to have to wait for pods to scale up so I want to keep a number of extra pods ready and waiting to execute.
The application my pods are running can be in 3 states - { waiting for job, running job, completed job}
Scaling up is fine as I can just use the scale API and always request to have a certain percentage of pods in waiting for job state
When scaling down I want to ensure that Kubernetes doesn't kill any pods that are in the running job state.
Should I implement a Custom Horizontal Pod Autoscaler?
Can I configure custom probes for my pod's application state?
I could use also use pod priority or a preStop hook
You can configure horizontal Pod autoscaling to ensure that Kubernetes doesn't kill any pods.
Steps for configuring horizontal pod scaling:
Create the Deployment, apply the nginx.yaml manifest,Run the following command:
kubectl apply -f nginx.yaml
Autoscaling based on resources utilization
1-Go to the Workloads page in Cloud Console.
2-Click the name of the nginx Deployment.
3-Click list Actions > Autoscale.
4-Specify the following values:
-Minimum number of replicas: 1
-Maximum number of replicas: 10
-Auto Scaling metric: CPU
-Target: 50
-Unit: %
5-Click Done.
6-Click Autoscale.
To get a list of Horizontal Pod Autoscalers in the cluster, use the following command:
kubectl get hpa
Guide on how to Configure horizontal pod autoscaling.
You can also refer to this link of auto-scaling rules for the GKE autopilot cluster using a custom metric on the Cloud Console.

Check pod resources consumption

I've got some deployment on a basic k8s cluster withouth defining requests and limits.
Is there any way to check how much the pod is asking for memory and cpu?
Depending on whether the metrics-server is installed in your cluster, you can use:
kubectl top pod
kubectl top node
After installing the Metrics Server, you can query the Resource Metrics API directly for the resource usages of pods and nodes:
All nodes in the cluster:
kubectl get --raw=/apis/metrics.k8s.io/v1beta1/nodes
A specific node:
kubectl get --raw=/apis/metrics.k8s.io/v1beta1/nodes/{node}
All pods in the cluster:
kubectl get --raw=/apis/metrics.k8s.io/v1beta1/pods
All pods in a specific namespace:
kubectl get --raw=/apis/metrics.k8s.io/v1beta1/namespaces/{namespace}/pods
A specific pod:
kubectl get --raw=/apis/metrics.k8s.io/v1beta1/namespaces/{namespace}/pods/{pod}
The API returns you the absolute CPU and memory usages of the pods and nodes.
From this, you should be able to figure out how much resources each pod consumes and how much free resources are left on each node.

kubectl get pod status always ContainerCreating

k8s version: 1.12.1
I created pod with api on node and allocated an IP (through flanneld). When I used the kubectl describe pod command, I could not get the pod IP, and there was no such IP in etcd storage.
It was only a few minutes later that the IP could be obtained, and then kubectl get pod STATUS was Running.
Has anyone ever encountered this problem?
Like MatthiasSommer mentioned in comment, process of creating pod might take a while.
If POD will stay for a longer time in ContainerCreating status you can check what is stopping it change to status Running by command:
kubectl describe pod <pod_name>
Why creating of pod may take a longer time?
Depends on what is included in manifest, pod can share namespace, storage volumes, secrets, assignin resources, configmaps etc.
kube-apiserver validates and configures data for api objects.
kube-scheduler needs to check and collect resurces requrements, constraints, etc and assign pod to the node.
kubelet is running on each node and is ensures that all containers fulfill pod specification and are healty.
kube-proxy is also running on each node and it is responsible for network on pod.
As you see there are many requests, validates, syncs and it need a while to create pod fulfill all requirements.

Kubernetes Horizontal Pod Autoscaler on GKE - "failed to get CPU utilization"

I am fairly new to Kubernetes and GKE (Google Container Engine) as a whole, so I was playing with the horizontal pod autoscaling and cluster autoscaling features by hitting my load balancer hard enough to make it scale up enough pods that it needed more instances, so it scaled those up but then it got to the point that there were some pods in Pending state, but it had also reached the max number of instances for the autoscaling cluster, so they were left in Pending state.
I then stopped the load test hoping it would come down on its own, but it wouldn't. I looked at kubectl describe hpa and I would see errors like:
7m 18s 18 {horizontal-pod-autoscaler } Warning FailedGetMetrics failed to get CPU consumption and request: metrics obtained for 4/5 of pods
7m 18s 18 {horizontal-pod-autoscaler } Warning FailedComputeReplicas failed to get CPU utilization: failed to get CPU consumption and request: metrics obtained for 4/5 of pods
There are actually only 4 pods running (and none in pending state), and looking at the heapster logs (kubectl logs -f heapster-v1.1.0-<id> --namespace=kube-system heapster) I can see it is actually looking for metrics in a pod that doesn't exist anymore (this would be the mystery 5th pod it's complaining about).
The issue with this is that because it is missing the 5th pod, it can't finish getting the current CPU utilization for the 4 pods that are running, and thus horizontal pod autoscaling doesn't work.
Any ideas how to get out of a situation like this?
I've tried removing the hpa and creating it again, but it didn't help.

Autoscaling in Google Container Engine

I understand the Container Engine is currently on alpha and not yet complete.
From the docs I assume there is no auto-scaling of pods (e.g. depending on CPU load) yet, correct? I'd love to be able to configure a replication controller to automatically add pods (and VM instances) when the average CPU load reaches a defined threshold.
Is this somewhere on the near future roadmap?
Or is it possible to use the Compute Engine Autoscaler for this? (if so, how?)
As we work towards a Beta release, we're definitely looking at integrating the Google Compute Engine AutoScaler.
There are actually two different kinds of scaling:
Scaling up/down the number of worker nodes in the cluster depending on # of containers in the cluster
Scaling pods up and down.
Since Kubernetes is an OSS project as well, we'd also like to add a Kubernetes native autoscaler that can scale replication controllers. It's definitely something that's on the roadmap. I expect we will actually have multiple autoscaler implementations, since it can be very application specific...
Kubernetes autoscaling: http://kubernetes.io/docs/user-guide/horizontal-pod-autoscaling/
kubectl command: http://kubernetes.io/docs/user-guide/kubectl/kubectl_autoscale/
Example:
kubectl autoscale deployment foo --min=2 --max=5 --cpu-percent=80
You can autoscale your deployment by using kubectl autoscale.
Autoscaling is actually when you want to modify the number of pods automatically as the requirement may arise.
kubectl autoscale deployment task2deploy1 –cpu-percent=50 –min=1 –max=10
kubectl get deployment task2deploy1
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
task2deploy1 1 1 1 1 49s
As the resource consumption increases the number of pods will increase and will be more than the number of pods you specified in your deployment.yaml file but always less than the maximum number of pods specified in the kubectl autoscale command.
kubectl get deployment task2deploy1
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
task2deploy1 7 7 7 3 4m
Similarly, as the resource consumption decreases, the number of pods will go down but never less than the number of minimum pods specified in the kubectl autoscale command.