CPUThrottlingHigh alert for metrics-server-nanny container in GKE - kubernetes

I noticed some of my clusters were reporting a CPUThrottlingHigh alert for metrics-server-nanny container (image: gke.gcr.io/addon-resizer:1.8.11-gke.0) in GKE. I couldn't see a way to configure this container to give it more CPU because it's automatically deployed as part of the metrics-server pod, and Google automatically resets any changes to the deployment/pod resource settings.
So out of curiosity, I created a small kubernetes cluster in GKE (3 standard nodes) with autoscaling turned on to scale up to 5 nodes. No apps or anything installed. Then I installed the kube-prometheus monitoring stack (https://github.com/prometheus-operator/kube-prometheus) which includes the CPUThrottlingHigh alert. Soon after installing the monitoring stack, this same alert popped up for this container. I don't see anything in the logs of this container or the related metrics-server-nanny container.
Also, I don't notice this same issue on AWS or Azure because while they do have a similar metrics-server pod in the kube-system namespace, they do not contain the sidecar metrics-server-nanny container in the pod.
Has anyone seen this or something similar? Is there a way to give this thing more resources without Google overwriting config changes?

This is a known issue with GKE metrics-server.
You can't fix the error on GKE as GKE controls the metric-server configuration and any changes you make are reverted.
You should silence the alert on GKE or update to a GKE cluster version that fixes this.

This is a known issue in Kubernetes that CFS leads to Throttling Pods that exhibit a spikey CPU usage pattern. As Kubernetes / GKE uses to implement CPU quotas, this is causing pods to get throttled even when they really aren't busy.
Kubernetes uses CFS quotas to enforce CPU limits for the pods running an application. The Completely Fair Scheduler (CFS) is a process scheduler that handles CPU resource allocation for executing processes, based on time period and not on available CPU power.
We have no direct control over CFS via Kubernetes, so the only solution is to disable it. This is done via node config.
Allow users to tune Kubelet configs "CPUManagerPolicy" and "CPUCFSQuota”
The workaround is to temporarily disable Kubernetes CFS quotas entirely (kubelet's flag --cpu-cfs-quota=false)
$ cat node-config.yaml
kubeletConfig:
cpuCFSQuota: false
cpuManagerPolicy: static
$ gcloud container clusters create --node-config=node-config.yaml
gcloud will map the fields from the YAML node config file to the newly added GKE API fields.

Related

Is it possible to schedule a pod to run for say 24 hours and then remove deployment/statefulset? or need to use jobs?

We have a bunch of pods running in dev environment. The pods are auto-provisioned by an application on every business action. The problem is that across various namespaces they are accumulating and eating available resources in EKS.
Is there a way without jenkins/k8s jobs to simply put some parameter on the pod manifest to tell it to self destruct say in 24 hours?
Add to your pod.spec:
activeDeadlineSeconds: 86400
After deadline your Pod will be stopped for good with the status DeadlineExceeded
If I understood your situation properly, you would like to scale your cluster down in order to save resources.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
Last, but not least, you can use tool like Ansible to manage all your kubernetes assets (it can create/manage deployments via playbooks).
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics

GKE and NodeLocal DNSCache

We have a deployment of Kubernetes in Google Cloud Platform. Recently we hit one of the well known issues related on a problem with the kube-dns that happens at high amount of requests https://github.com/kubernetes/kubernetes/issues/56903 (its more related to SNAT/DNAT and contract but the final result is out of service of kube-dns).
After a few days of digging on that topic we found that k8s already have a solution witch is currently in alpha (https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/)
The solution is to create a caching CoreDNS as a daemonset on each k8s node so far so good.
Problem is that after you create the daemonset you have to tell to kubelet to use it with --cluster-dns option and we cant find any way to do that in GKE environment. Google bootstraps the cluster with "configure-sh" script in instance metadata. There is an option to edit the instance template and "hardcode" the required values but that is not an option if you upgrade the cluster or use the horizontal autoscaling all of the modified values will be lost.
The last idea was to use custom startup script that pull configuration and update the metadata server but this is a too complicated task.
As of 2019/12/10, GKE now supports through the gcloud CLI in beta:
Kubernetes Engine
Promoted NodeLocalDNS Addon to beta. Use --addons=NodeLocalDNS with gcloud beta container clusters create. This addon can be enabled or disabled on existing clusters using --update-addons=NodeLocalDNS=ENABLED or --update-addons=NodeLocalDNS=DISABLED with gcloud container clusters update.
See https://cloud.google.com/sdk/docs/release-notes#27300_2019-12-10
You can spin up another kube-dns deployment e.g. in different node-pool and thus having 2x nameserver in the pod's resolv.conf.
This would mitigate the evictions and other failures and generally allow you to completely control your kube-dns service in the whole cluster.
In addition to what was mentioned in this answer - With beta support on GKE, the nodelocal caches now listen on the kube-dns service IP, so there is no need for a kubelet flag change.

Kubernetes Engine: Node keeps getting unhealthy and rebooted for no apparent reason

My Kubernetes Engine cluster keeps rebooting one of my nodes, even though all pods on the node are "well-behaved". I've tried to look at the cluster's Stackdriver logs, but was not able to find a reason. After a while, the continuous reboots usually stop, only to occur again a few hours or days later.
Usually only one single node is affected, while the other nodes are fine, but deleting that node and creating a new one in its place only helps temporarily.
I have already disabled node auto-repair to see if that makes a difference (it was turned on before), and if I recall correctly this started after upgrading my cluster to Kubernetes 1.13 (specifically version 1.13.5-gke). The issue has persisted after upgrading to 1.13.6-gke.0. Even creating a new node pool and migrating to it had no effect.
The cluster consists of four nodes with 1 CPU and 3 GB RAM each. I know that's small for a k8s cluster, but this has worked fine in the past.
I am using the new Stackdriver Kubernetes Monitoring as well as Istio on GKE.
Any pointers as to what could be the reason or where I look for possible causes would be appreciated.
Screenshots of the Node event list (happy to provide other logs; couldn't find anything meaningful in Stackdriver Logging yet):
Posting this answer as a community wiki to give some troubleshooting tips/steps as the underlying issue wasn't found.
Feel free to expand it.
After below steps, the issue with a node rebooting were not present anymore:
Updated the Kubernetes version (GKE)
Uninstalling Istio
Using e2-medium instances as nodes.
As pointed by user #aurelius:
I would start from posting the kubectl describe node maybe there is something going on before your Node gets rebooted and unhealthy. Also do you use resources and limits? Can this restarts be a result of some burstable workload? Also have you tried checking system logs after the restart on the Node itself? Can you post the results? – aurelius Jun 7 '19 at 15:38
Above comment could be a good starting point for troubleshooting issues with the cluster.
Options to troubleshoot the cluster pointed in comment:
$ kubectl describe node focusing on output in:
Conditions - KubeletReady, KubeletHasSufficientMemory, KubeletHasNoDiskPressure, etc.
Allocated resources - Requests and Limits of scheduled workloads
Checking system logs after the restart on the node itself:
GCP Cloud Console (Web UI) -> Logging -> Legacy Logs Viewer/Logs Explorer -> VM Instance/GCE Instance
It could be also beneficiary to check the CPU/RAM usage in:
GCP Cloud Console (Web UI) -> Monitoring -> Metrics Explorer
You can also check if there are any operations on the cluster:
gcloud container operations list
Adding to above points:
Creating a cluster with Istio on GKE
We suggest creating at least a 4 node cluster with the 2 vCPU machine type when using this add-on. You can deploy Istio itself with the default GKE new cluster setup but this may not provide enough resources to explore sample applications.
-- Cloud.google.com: Istio: Docs: Istio on GKE: Installing
Also, the official docs of Istio are stating:
CPU and memory
Since the sidecar proxy performs additional work on the data path, it consumes CPU and memory. As of Istio 1.7, a proxy consumes about 0.5 vCPU per 1000 requests per second.
-- Istio.io: Docs: Performance and scalability: CPU and memory
Additional resources:
Cloud.google.com: Kubernetes Engine: Docs: Troubleshooting
Kubernetes.io: Docs: Debug cluster

Kubernetes Deployment with Zero Down Time

As a leaner of Kubernetes concepts, their working, and deployment with it. I have a couple of cases which I don't know how to achieve. I am looking for advice or some guideline to achieve it.
I am using the Google Cloud Platform. The current running flow is described below. A push to the google source repository triggers Cloud Build which creates a docker image and pushes the image to the running cluster nodes.
Case 1: Now I want that when new pods are up and running. Then traffic is routed to the new pods. Kill old pod but after each pod complete their running request. Zero downtime is what I'm looking to achieve.
Case 2: What will happen if the space of running pod reaches 100 and in the Debian case that the inode count reaches full capacity. Will kubernetes create new pods to manage?
Case 3: How to manage pod to database connection limits?
Like the other answer use Liveness and Readiness probes. Basically, a new pod is added to the service pool then it will only serve traffic after the readiness probe has passed. The old pod is removed from the Service pool, then drained and then terminated. This happens on a rolling fashion one pod at a time.
This really depends on the capacity of your cluster and the ability to schedule pods depending on the limits for the containers in them. For more about setting up limits for containers refer to here. In terms of the inode limit, if you reach it on a node, the kubelet won't be able to run any more pods on that node. The kubelet eviction manager also has a mechanism in where evicts some pods using the most inodes. You can also configure your eviction thresholds on the kubelet.
This would be more a limitation at the OS level combined your stateful application configuration. You can keep this configuration in a ConfigMap. And for example in something for MySql the option would be max_connections.
I can answer case 1 since Ive done it myself.
Use Deployments with readinessProbes & livelinessProbes

Specifying memory in Kubernetes pods for deployment of Docker image

I am exploring about implementation of Kubernetes cluster and deployment into Kubernetes cluster using Jenkins via CI/CD pipeline. When exploring I found that we don't need to define the worker machine node where we need to deploy our pods. Kubernetes master will take care for where to deploy / free pod in worker machine for deployment. We only need to define how much memory need to that pod in definition.
Here my confusion is that, Already we assigned and configured Kubernetes cluster for deployment. That all nodes containing its own memory according to creation of AWS EC2 (since I am planning to use AWS Ec2 - Ubuntu 16.04 LTS).
So why we again need to define memory in pod ? Is that proper way of pod deployment ?
I am only started in CI/CD pipeline world.
Specifying memory and cpu in the pod specification is completely optional. Still there are a couple of aspects to specifying memory and CPU at pod level:
As explained here, if you don't specify CPU/memory - the pod/container can consume all resources on that node and potentially affect other pod/containers running on that node.
Each application should specify the memory and CPU they need for running the application. This information is used by Kubernetes during scheduling the pod on one of the nodes in the cluster where enough resources are available. This information ensures better scheduling decisions.
It enables the Horizontal Pod Autoscaler (HPA) to scale the pods when the resource consumption beyond a certain limit. The details are explained in this doc. Unless there is a memory/cpu limit specified, you can not calculate that the pod is running 80% of that metric and it should be scaled into two replicas.
You can also enable a certain default at namespace level and then only override for specific applications, details here