GKE and NodeLocal DNSCache - kubernetes

We have a deployment of Kubernetes in Google Cloud Platform. Recently we hit one of the well known issues related on a problem with the kube-dns that happens at high amount of requests https://github.com/kubernetes/kubernetes/issues/56903 (its more related to SNAT/DNAT and contract but the final result is out of service of kube-dns).
After a few days of digging on that topic we found that k8s already have a solution witch is currently in alpha (https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/)
The solution is to create a caching CoreDNS as a daemonset on each k8s node so far so good.
Problem is that after you create the daemonset you have to tell to kubelet to use it with --cluster-dns option and we cant find any way to do that in GKE environment. Google bootstraps the cluster with "configure-sh" script in instance metadata. There is an option to edit the instance template and "hardcode" the required values but that is not an option if you upgrade the cluster or use the horizontal autoscaling all of the modified values will be lost.
The last idea was to use custom startup script that pull configuration and update the metadata server but this is a too complicated task.

As of 2019/12/10, GKE now supports through the gcloud CLI in beta:
Kubernetes Engine
Promoted NodeLocalDNS Addon to beta. Use --addons=NodeLocalDNS with gcloud beta container clusters create. This addon can be enabled or disabled on existing clusters using --update-addons=NodeLocalDNS=ENABLED or --update-addons=NodeLocalDNS=DISABLED with gcloud container clusters update.
See https://cloud.google.com/sdk/docs/release-notes#27300_2019-12-10

You can spin up another kube-dns deployment e.g. in different node-pool and thus having 2x nameserver in the pod's resolv.conf.
This would mitigate the evictions and other failures and generally allow you to completely control your kube-dns service in the whole cluster.

In addition to what was mentioned in this answer - With beta support on GKE, the nodelocal caches now listen on the kube-dns service IP, so there is no need for a kubelet flag change.

Related

CPUThrottlingHigh alert for metrics-server-nanny container in GKE

I noticed some of my clusters were reporting a CPUThrottlingHigh alert for metrics-server-nanny container (image: gke.gcr.io/addon-resizer:1.8.11-gke.0) in GKE. I couldn't see a way to configure this container to give it more CPU because it's automatically deployed as part of the metrics-server pod, and Google automatically resets any changes to the deployment/pod resource settings.
So out of curiosity, I created a small kubernetes cluster in GKE (3 standard nodes) with autoscaling turned on to scale up to 5 nodes. No apps or anything installed. Then I installed the kube-prometheus monitoring stack (https://github.com/prometheus-operator/kube-prometheus) which includes the CPUThrottlingHigh alert. Soon after installing the monitoring stack, this same alert popped up for this container. I don't see anything in the logs of this container or the related metrics-server-nanny container.
Also, I don't notice this same issue on AWS or Azure because while they do have a similar metrics-server pod in the kube-system namespace, they do not contain the sidecar metrics-server-nanny container in the pod.
Has anyone seen this or something similar? Is there a way to give this thing more resources without Google overwriting config changes?
This is a known issue with GKE metrics-server.
You can't fix the error on GKE as GKE controls the metric-server configuration and any changes you make are reverted.
You should silence the alert on GKE or update to a GKE cluster version that fixes this.
This is a known issue in Kubernetes that CFS leads to Throttling Pods that exhibit a spikey CPU usage pattern. As Kubernetes / GKE uses to implement CPU quotas, this is causing pods to get throttled even when they really aren't busy.
Kubernetes uses CFS quotas to enforce CPU limits for the pods running an application. The Completely Fair Scheduler (CFS) is a process scheduler that handles CPU resource allocation for executing processes, based on time period and not on available CPU power.
We have no direct control over CFS via Kubernetes, so the only solution is to disable it. This is done via node config.
Allow users to tune Kubelet configs "CPUManagerPolicy" and "CPUCFSQuota”
The workaround is to temporarily disable Kubernetes CFS quotas entirely (kubelet's flag --cpu-cfs-quota=false)
$ cat node-config.yaml
kubeletConfig:
cpuCFSQuota: false
cpuManagerPolicy: static
$ gcloud container clusters create --node-config=node-config.yaml
gcloud will map the fields from the YAML node config file to the newly added GKE API fields.

Network policy among pods

My scenary is like the image below:
After a couple of days trying to find a way to block connections among pods based on a rule i found the Network Policy. But it's not working for me neither at Google Cloud Platform or Local Kubernetes!
My scenary is quite simple, i need a way to block connections among pods based in a rule (e.g. namespace, workload label and so on). At the first glance i tought the will work for me, but i don't know why it's not working at the Google Cloud, even when i create a cluster from the scratch with the option "Network policy" enable.
Network policy will allow you to do exactly what you described on picture. You can allow or block based on labels or namespaces.
It's difficult to help you when you don't explain what you exactly did and what is not working. Update your question with actual network policy YAML you created and ideally also send kubectl get pod --show-labels from the namespace with the pods.
What do you mean by 'local kubernetes' is also unclear but it depends largely on network CNI you're using as it must support network policies. For example Calico or Cilium support it. Minikube in it's default setting don't so you should follow i.e. this guide: https://medium.com/#atsvetkov906090/enable-network-policy-on-minikube-f7e250f09a14
You can use Istio Sidecar to solve this : https://istio.io/latest/docs/reference/config/networking/sidecar/
Another Istio solution is the usage of AuthorizationPolicy : https://istio.io/latest/docs/reference/config/security/authorization-policy/
Just to update because I was involved in the problem of this post.
The problem was with pods that had the istio injected. In this case, all pods in the namespace, because it had istio-injection=enabled.
The NetworkPolicy rule was not taking effect when the selection was made by a matchselector, egress or ingress, and the pods involved were already running before NetworkPolicy was created. By killing the pod and then starting it, the pods that had the label match had access normally. I don't know if there is a way to say to refresh the sidecar inside the pods without having to restart it.
Pods started after the creation of NetworkPolicy did not give the problem of this post.

EKS Fargate pod isolation

In ECS with Fargate, we can manage service isolation via security group. However that is no longer the case with EKS on Fargate.
Is there a way where pods on the same cluster can be isolated from each other like a Network Policy? I know this is possible with kubernetes but it needs to be implemented by the network plugin. Tried to install the network provider listed here without success as it needs daemonset (limitation of eks fargate: Cannot run Daemonsets, Privileged pods, or pods that use HostNetwork or HostPort.)
This is something we are tracking in this roadmap item. There isn't a viable workaround for now. As you pointed out when using EC2 we'd suggest to use the Calico network policy engine but with Fargate there is no DaemonSet support and it can't be used.
Given the SG associated to a pod is defined at the cluster level, one way to try to mitigate this would be to spread like-pods across different clusters where the pod SG is configured for that specific type of workload BUT this will mean more work and higher control plane costs.

What is the minimum Google Kubernetes Engine cluster size / Configuration for Istio?

I tried to launch Istio on Google Kubernetes Engine using the Google Cloud Deployment Manager as described in the Istio Quick Start Guide.
My goal is to have a cluster as small as possible for a few very lightweight microservices.
Unfortunately, Istio pods in the cluster failed to boot up correctly when using a
1 node GKE
g1-small or
n1-standard-1
cluster.
For example, istio-pilot fails and the status is "0 of 1 updated replicas available - Unschedulable".
I did not find any hints that the resources of my cluster are exceeded so I am wondering:
What is the minimum GKE cluster size to successfully run Istio (and a few lightweight microservices)?
What I found is the issue Istio#216 but it did not contain the answer. Also, of course, the cluster size depends on the microservices but I am basically interested in the minimum cluster to start with.
As per this page
If you use GKE, please ensure your cluster has at least 4 standard GKE nodes. If you use Minikube, please ensure you have at least 4GB RAM.

DaemonSets on Google Container Engine (Kubernetes)

I have a Google Container Engine cluster with 21 nodes, there is one pod in particular that I need to always be running on a node with a static IP address (for outbound purposes).
Kubernetes supports DaemonSets
This is a way to have a pod be deployed to a specific node (or in a set of nodes) by giving the node a label that matches the nodeSelector in the DaemonSet. You can then assign a static IP to the VM instance that the labeled node is on. However, GKE doesn't appear to support the DaemonSet kind.
$ kubectl create -f go-daemonset.json
error validating "go-daemonset.json": error validating data: the server could not find the requested resource; if you choose to ignore these errors, turn validation off with --validate=false
$ kubectl create -f go-daemonset.json --validate=false
unable to recognize "go-daemonset.json": no kind named "DaemonSet" is registered in versions ["" "v1"]
When will this functionality be supported and what are the workarounds?
If you only want to run the pod on a single node, you actually don't want to use a DaemonSet. DaemonSets are designed for running a pod on every node, not a single specific node.
To run a pod on a specific node, you can use a nodeSelector in the pod specification, as documented in the Node Selection example in the docs.
edit: But for anyone reading this that does want to run something on every node in GKE, there are two things I can say:
First, DaemonSet will be enabled in GKE in version 1.2, which is planned for March. It isn't enabled in GKE in version 1.1 because it wasn't considered stable enough at the time 1.1 was cut.
Second, if you want to run something on every node before 1.2 is out, we recommend creating a replication controller with a number of replicas greater than your number of nodes and asking for a hostPort in the container spec. The hostPort will ensure that no more than one pod from the RC will be run per node.
DaemonSets is still alpha feature and Google Container Engine supports only production Kubernetes features. Workaround: build your own Kubernetes cluster (GCE, AWS, bare metal, ...) and enable alpha/beta features.