Kubernetes Engine: Node keeps getting unhealthy and rebooted for no apparent reason

Kubernetes Engine: Node keeps getting unhealthy and rebooted for no apparent reason - kubernetes

My Kubernetes Engine cluster keeps rebooting one of my nodes, even though all pods on the node are "well-behaved". I've tried to look at the cluster's Stackdriver logs, but was not able to find a reason. After a while, the continuous reboots usually stop, only to occur again a few hours or days later.
Usually only one single node is affected, while the other nodes are fine, but deleting that node and creating a new one in its place only helps temporarily.
I have already disabled node auto-repair to see if that makes a difference (it was turned on before), and if I recall correctly this started after upgrading my cluster to Kubernetes 1.13 (specifically version 1.13.5-gke). The issue has persisted after upgrading to 1.13.6-gke.0. Even creating a new node pool and migrating to it had no effect.
The cluster consists of four nodes with 1 CPU and 3 GB RAM each. I know that's small for a k8s cluster, but this has worked fine in the past.
I am using the new Stackdriver Kubernetes Monitoring as well as Istio on GKE.
Any pointers as to what could be the reason or where I look for possible causes would be appreciated.
Screenshots of the Node event list (happy to provide other logs; couldn't find anything meaningful in Stackdriver Logging yet):

Posting this answer as a community wiki to give some troubleshooting tips/steps as the underlying issue wasn't found.
Feel free to expand it.
After below steps, the issue with a node rebooting were not present anymore:
Updated the Kubernetes version (GKE)
Uninstalling Istio
Using e2-medium instances as nodes.
As pointed by user #aurelius:
I would start from posting the kubectl describe node maybe there is something going on before your Node gets rebooted and unhealthy. Also do you use resources and limits? Can this restarts be a result of some burstable workload? Also have you tried checking system logs after the restart on the Node itself? Can you post the results? – aurelius Jun 7 '19 at 15:38
Above comment could be a good starting point for troubleshooting issues with the cluster.
Options to troubleshoot the cluster pointed in comment:
$ kubectl describe node focusing on output in:
Conditions - KubeletReady, KubeletHasSufficientMemory, KubeletHasNoDiskPressure, etc.
Allocated resources - Requests and Limits of scheduled workloads
Checking system logs after the restart on the node itself:
GCP Cloud Console (Web UI) -> Logging -> Legacy Logs Viewer/Logs Explorer -> VM Instance/GCE Instance
It could be also beneficiary to check the CPU/RAM usage in:
GCP Cloud Console (Web UI) -> Monitoring -> Metrics Explorer
You can also check if there are any operations on the cluster:
gcloud container operations list
Adding to above points:
Creating a cluster with Istio on GKE
We suggest creating at least a 4 node cluster with the 2 vCPU machine type when using this add-on. You can deploy Istio itself with the default GKE new cluster setup but this may not provide enough resources to explore sample applications.
-- Cloud.google.com: Istio: Docs: Istio on GKE: Installing
Also, the official docs of Istio are stating:
CPU and memory
Since the sidecar proxy performs additional work on the data path, it consumes CPU and memory. As of Istio 1.7, a proxy consumes about 0.5 vCPU per 1000 requests per second.
-- Istio.io: Docs: Performance and scalability: CPU and memory
Additional resources:
Cloud.google.com: Kubernetes Engine: Docs: Troubleshooting
Kubernetes.io: Docs: Debug cluster

Related

CPUThrottlingHigh alert for metrics-server-nanny container in GKE

I noticed some of my clusters were reporting a CPUThrottlingHigh alert for metrics-server-nanny container (image: gke.gcr.io/addon-resizer:1.8.11-gke.0) in GKE. I couldn't see a way to configure this container to give it more CPU because it's automatically deployed as part of the metrics-server pod, and Google automatically resets any changes to the deployment/pod resource settings.
So out of curiosity, I created a small kubernetes cluster in GKE (3 standard nodes) with autoscaling turned on to scale up to 5 nodes. No apps or anything installed. Then I installed the kube-prometheus monitoring stack (https://github.com/prometheus-operator/kube-prometheus) which includes the CPUThrottlingHigh alert. Soon after installing the monitoring stack, this same alert popped up for this container. I don't see anything in the logs of this container or the related metrics-server-nanny container.
Also, I don't notice this same issue on AWS or Azure because while they do have a similar metrics-server pod in the kube-system namespace, they do not contain the sidecar metrics-server-nanny container in the pod.
Has anyone seen this or something similar? Is there a way to give this thing more resources without Google overwriting config changes?

This is a known issue with GKE metrics-server.
You can't fix the error on GKE as GKE controls the metric-server configuration and any changes you make are reverted.
You should silence the alert on GKE or update to a GKE cluster version that fixes this.

This is a known issue in Kubernetes that CFS leads to Throttling Pods that exhibit a spikey CPU usage pattern. As Kubernetes / GKE uses to implement CPU quotas, this is causing pods to get throttled even when they really aren't busy.
Kubernetes uses CFS quotas to enforce CPU limits for the pods running an application. The Completely Fair Scheduler (CFS) is a process scheduler that handles CPU resource allocation for executing processes, based on time period and not on available CPU power.
We have no direct control over CFS via Kubernetes, so the only solution is to disable it. This is done via node config.
Allow users to tune Kubelet configs "CPUManagerPolicy" and "CPUCFSQuota”
The workaround is to temporarily disable Kubernetes CFS quotas entirely (kubelet's flag --cpu-cfs-quota=false)
$ cat node-config.yaml
kubeletConfig:
cpuCFSQuota: false
cpuManagerPolicy: static
$ gcloud container clusters create --node-config=node-config.yaml
gcloud will map the fields from the YAML node config file to the newly added GKE API fields.

Is it possible to schedule a pod to run for say 24 hours and then remove deployment/statefulset? or need to use jobs?

We have a bunch of pods running in dev environment. The pods are auto-provisioned by an application on every business action. The problem is that across various namespaces they are accumulating and eating available resources in EKS.
Is there a way without jenkins/k8s jobs to simply put some parameter on the pod manifest to tell it to self destruct say in 24 hours?

Add to your pod.spec:
activeDeadlineSeconds: 86400
After deadline your Pod will be stopped for good with the status DeadlineExceeded

If I understood your situation properly, you would like to scale your cluster down in order to save resources.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
Last, but not least, you can use tool like Ansible to manage all your kubernetes assets (it can create/manage deployments via playbooks).
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics

How to keep minkube kube-apiserver from using up a tone of %CPU?

After deploying anything to minikube it seems as though the apiserver starts eating up all the CPU and makes the dashboard mostly unusable until the apiserver dies and gets restarted.
I've read through a bit of the references found in this post: kube-apiserver high CPU and requests
However, those seem to specifically target deployed k8s clusters on many machines, or at least where the master isn't on the same machine.
That's not how it would work with minikube since it's a signle node cluster. Not to mention it typically isn't given a ton of resources (neither CPU or mem).
Is there a way to curb or eliminate this behavior? Perhaps I've missed some important configuration for running on minikube?

Kubernetes automatic shutdown after some idle time

Does kubernetes or Helm support shut down the pods if it is idle for more than a given threshold time?
This would be very useful in the development environment, to provide room for other processes to consume it and save cost.

Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics provided by Heapster application, that must be run in the cluster. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics

How to determine efficiently if a Minion has joined a Kubernetes Cluster

I am working on writing some automation to setup a Kubernetes Cluster. The automation deploys the Kubernetes Master and once that is setup, it starts adding Minions in parallel. What is the most efficient way to determine programmatically if a Minion has joined the Kubernetes Cluster?
Currently I am querying the REST endpoint /v1/api/nodes exposed by the Kubernetes API-Server. My concern is that as the size of the cluster increases, querying the API-Server to pull details about all the minions may be compute and I/O intensive for the API-Server. I also did not find paging support in this API.
Thanks,
Sufian

You should look into kube-register https://github.com/kelseyhightower/kube-register. It uses fleet to register minions as they spin up. You should probably have it as a systemd unit so it runs on start up. Then for status, let the Api-server do it's thing with the polling status. Most clusters probably wouldn't be larger than 9 main nodes (you can have plenty worker nodes, I recommend looking at coreos's etcd docs to see about clustering) due to etcd's latency constraints in it's quorum over RAFT, so I wouldn't worry too much about the size of the cluster.

this is a mix between answer and comment on the other answer (I can not comment yet, sorry...)
As far as I know using the REST endpoint /v1/api/nodes is the best way to check if nodes are registered. How often do you call that endpoint? I wouldn't expect compute or I/O problems too fast.
kube-register was a useful tool to register new CoreOS nodes to the kubernetes cluster, but it is not needed anymore, since the kubelet registers itself in the meanwhile.
I think there is some misunderstanding in the other answer. I think you talk about 2 different clusters:
the etcd cluster: CoreOS recommends to run 3, 5 or 7 etcd instances in a cluster (https://coreos.com/etcd/docs/latest/admin_guide.html#cluster-management). On the remaining nodes you can configure etcd to run as a proxy (https://coreos.com/etcd/docs/latest/proxy.html). This should solve your etcd connection problem.
the kubernetes cluster: here you typically run 1 master and x "worker" nodes, just as you do already.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse