AWS EKS - Dead container cleanup - kubernetes

I am using Terraform to create infrastructure on AWS environment. Out of many services, we are also creating AWS EKS using terraform-aws-modules/eks/aws module. The EKS is primarily used for spinning dynamic containers to handle asynchronous job execution. Once a given task is completed the container releases resources and terminates.
What I have noticed is that, the dead containers lying on the EKS cluster forever. This is causing too many dead containers just sitting on EKS and consuming storage. I came across few blogs which mention that Kubernetes has garbage collection process, but none describes how it can be specified using Terraform or explicitly for AWS EKS.
Hence I am looking for a solution, which will help to specify garbage collection policy for dead containers on AWS EKS. If not achievable via Terraform, I am ok with using kubectl with AWS EKS.

These two kubelet flags will cause the node to clean up docker images when the filesystem reaches those percentages. https://kubernetes.io/docs/concepts/architecture/garbage-collection/#container-image-lifecycle
--image-gc-high-threshold="85"
--image-gc-low-threshold="80"
But you also probably want to set --maximum-dead-containers 1 so that running multiple (same) images doesn't leave dead containers around.
In EKS you can add these flags to the UserData section of your EC2 instance/Autoscaling group.
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh --apiserver-endpoint ..... --kubelet-extra-args '<here>'

Related

Running other non-cluster containers on k8s node

I have a k8s cluster that runs the main workload and has a lot of nodes.
I also have a node (I call it the special node) that some of special container are running on that that is NOT part of the cluster. The node has access to some resources that are required for those special containers.
I want to be able to manage containers on the special node along with the cluster, and make it possible to access them inside the cluster, so the idea is to add the node to the cluster as a worker node and taint it to prevent normal workloads to be scheduled on it, and add tolerations on the pods running special containers.
The idea looks fine, but there may be a problem. There will be some other containers and non-container daemons and services running on the special node that are not managed by the cluster (they belong to other activities that have to be separated from the cluster). I'm not sure that will be a problem, but I have not seen running non-cluster containers along with pod containers on a worker node before, and I could not find a similar question on the web about that.
So please enlighten me, is it ok to have non-cluster containers and other daemon services on a worker node? Does is require some cautions, or I'm just worrying too much?
Ahmad from the above description, I could understand that you are trying to deploy a kubernetes cluster using kudeadm or minikube or any other similar kind of solution. In this you have some servers and in those servers one is having some special functionality like GPU etc., for deploying your special pods you can use node selector and I hope you are already doing this.
Coming to running separate container runtime on one of these nodes you need to consider two points mainly
This can be done and if you didn’t integrated the container runtime with
kubernetes it will be one more software that is running on your server
let’s say you used kubeadm on all the nodes and you want to run docker
containers this will be separate provided you have drafted a proper
architecture and configured separate isolated virtual network
accordingly.
Now comes the storage part, you need to create separate storage volumes
for kubernetes and container runtime separately because if any one
software gets failed or corrupted it should not affect the second one and
also for providing the isolation.
If you maintain proper isolation starting from storage to network then you can run both kubernetes and container runtime separately however it is not a suggested way of implementation for production environments.

CPUThrottlingHigh alert for metrics-server-nanny container in GKE

I noticed some of my clusters were reporting a CPUThrottlingHigh alert for metrics-server-nanny container (image: gke.gcr.io/addon-resizer:1.8.11-gke.0) in GKE. I couldn't see a way to configure this container to give it more CPU because it's automatically deployed as part of the metrics-server pod, and Google automatically resets any changes to the deployment/pod resource settings.
So out of curiosity, I created a small kubernetes cluster in GKE (3 standard nodes) with autoscaling turned on to scale up to 5 nodes. No apps or anything installed. Then I installed the kube-prometheus monitoring stack (https://github.com/prometheus-operator/kube-prometheus) which includes the CPUThrottlingHigh alert. Soon after installing the monitoring stack, this same alert popped up for this container. I don't see anything in the logs of this container or the related metrics-server-nanny container.
Also, I don't notice this same issue on AWS or Azure because while they do have a similar metrics-server pod in the kube-system namespace, they do not contain the sidecar metrics-server-nanny container in the pod.
Has anyone seen this or something similar? Is there a way to give this thing more resources without Google overwriting config changes?
This is a known issue with GKE metrics-server.
You can't fix the error on GKE as GKE controls the metric-server configuration and any changes you make are reverted.
You should silence the alert on GKE or update to a GKE cluster version that fixes this.
This is a known issue in Kubernetes that CFS leads to Throttling Pods that exhibit a spikey CPU usage pattern. As Kubernetes / GKE uses to implement CPU quotas, this is causing pods to get throttled even when they really aren't busy.
Kubernetes uses CFS quotas to enforce CPU limits for the pods running an application. The Completely Fair Scheduler (CFS) is a process scheduler that handles CPU resource allocation for executing processes, based on time period and not on available CPU power.
We have no direct control over CFS via Kubernetes, so the only solution is to disable it. This is done via node config.
Allow users to tune Kubelet configs "CPUManagerPolicy" and "CPUCFSQuota”
The workaround is to temporarily disable Kubernetes CFS quotas entirely (kubelet's flag --cpu-cfs-quota=false)
$ cat node-config.yaml
kubeletConfig:
cpuCFSQuota: false
cpuManagerPolicy: static
$ gcloud container clusters create --node-config=node-config.yaml
gcloud will map the fields from the YAML node config file to the newly added GKE API fields.

Where is KOPS located/running from?

I am new to Docker and Kubernetes, though I have mostly figured out how it all works at this point.
I inherited an app that uses both, as well as KOPS.
One of the last things I am having trouble with is the KOPS setup. I know for absolute certain that Kubernetes is setup via KOPS. There's two KOPS state stores on an S3 bucket (corresponding to a dev and prod cluster respectively)
However while I can find the server that kubectl/kubernetes is running on, absolutely none of the servers I have access to seem to have a kops command.
Am I misunderstanding how KOPS works? Does it not do some sort of dynamic monitoring (would that just be done by ReplicaSet by itself?), but rather just sets a cluster running and it's done?
I can include my cluster.spec or config files, if they're helpful to anyone, but I can't really see how they're super relevant to this question.
I guess I'm just confused - as far as I can tell from my perspective, it looks like KOPS is run once, sets up a cluster, and is done. But then whenever one of my node or master servers goes down, it is self-healing. I would expect that of the node servers, but not the master servers.
This is all on AWS.
Sorry if this is a dumb question, I am just having trouble conceptually understanding what is going on here.
kops is a command line tool, you run it from your own machine (or a jumpbox) and it creates clusters for you, it’s not a long-running server itself. It’s like Terraform if you’re familiar with that, but tailored specifically to spinning up Kubernetes clusters.
kops creates nodes on AWS via autoscaling groups. It’s this construct (which is an AWS thing) that ensures your nodes come back to the desired number.
kops is used for managing Kubernetes clusters themselves, like creating them, scaling, updating, deleting. kubectl is used for managing container workloads that run on Kubernetes. You can create, scale, update, and delete your replica sets with that. How you run workloads on Kubernetes should have nothing to do with how/what tool you (or some cluster admin) use to manage the Kubernetes cluster itself. That is, unless you’re trying to change the “system components” of Kubernetes, like the Kubernetes API or kubedns, which are cluster-admin-level concerns but happen to run on top of Kuberentes as container workloads.
As for how pods get spun up when nodes go down, that’s what Kubernetes as a container orchestrator strives to do. You declare the desired state you want, and the Kubernetes system makes it so. If things crash or fail or disappear, Kubernetes aims to reconcile this difference between actual state and desired state, and schedules desired container workloads to run on available nodes to bring the actual state of the world back in line with your desired state. At a lower level, AWS does similar things — it creates VMs and keeps them running. If Amazon needs to take down a host for maintenance it will figure out how to run your VM (and attach volumes, etc.) elsewhere automatically.

How can Kubernete auto scale nodes?

I am using kubernete to manage docker cluster. Right now, I can set up POD autoscale using Horizontal Pod Scaler, that is fine.
And now I think the next step is to autoscale nodes. I think for HPA, the auto-created pod is only started in the already created nodes, but if all the available nodes are utilized and no available resource for any more pods, I think the next step is to automatically create node and have node join the k8s master.
I googled a lot and there are very limited resources to introduce this topic.
Can anyone please point me to any resource how to implement this requirement.
Thanks
One way to do using AWS and setting up your own Kubernetes cluster is by following these steps :
Create an Instance greater than t2.micro (will be master node).
Initialize the Kubernetes cluster using some tools like Kubeadm. After the initialisation would be completed you would get a join command, which needs to e run on all the nodes who want to join the cluster. (Here is the link)
Now create an Autoscaling Group on AWS with start/boot script containing that join command.
Now whenever the utilisation specified by you in autoscaling group is breached the scaling would happen and the node(s) would automatically join the Kubernetes cluster. This would allow the Kubernetes to schedule pods on the newly joined nodes based on the HPA.
(I would suggest to use Flannel as pod network as it automatically removes the node from Kubernetes cluster when it is not available)
kubernetes operations (kops) helps you create, destroy, upgrade and maintain production-grade, highly available, Kubernetes clusters from the command line.
Features:
Automates the provisioning of Kubernetes clusters in AWS and GCE
Deploys Highly Available (HA) Kubernetes Masters
Most of the managed kubernetes service providers provide auto scaling feature of the nodes
Elastic Kubernetes Service EKS- configure cluster auto scalar
Google Kubernetes Engine
GKE Auto Scalar
Auto scaling feature needs to be supported by the underlying cloud provider. Google cloud supports auto scaling during cluster creation or update by passing flags --enable-autoscaling --min-nodes and --max-nodes to the corresponding gcloud commands.
Examples:
gcloud container clusters create mytestcluster --zone=us-central1-b --enable-autoscaling --min-nodes=3 --max-nodes=10 --num-nodes=5
gcloud container clusters update mytestcluster --enable-autoscaling --min-nodes=1 --max-nodes=15
below link would be helpful
https://medium.com/kubecost/understanding-kubernetes-cluster-autoscaling-675099a1db92

Does Kubernetes provision new VMs for pods on my cloud platform?

I'm currently learning about Kubernetes and still trying to figure it out. I get the general use of it but I think that there still plenty of things I'm missing, here's one of them. If I want to run Kubernetes on my public cloud, like GCE or AWS, will Kubernetes spin up new VMs by itself in order to make more compute for new pods that might be needed? Or will it only use a certain amount of VMs that were pre-configured as the compute pool. I heard Brendan say, in his talk in CoreOS fest, that Kubernetes sees the VMs as a "sea of compute" and the user doesn't have to worry about which VM is running which pod - I'm interested to know where that pool of compute comes from, is it configured when setting up Kubernetes? Or will it scale by itself and create new machines as needed?
I hope I managed to be coherent.
Thanks!
Kubernetes supports scaling, but not auto-scaling. The addition and removal of new pods (VMs) in a Kubernetes cluster is performed by replication controllers. The size of a replication controller can be changed by updating the replicas field. This can be performed in a couple ways:
Using kubectl, you can use the scale command.
Using the Kubernetes API, you can update your config with a new value in the replicas field.
Kubernetes has been designed for auto-scaling to be handled by an external auto-scaler. This is discussed in responsibilities of the replication controller in the Kubernetes docs.