How to Add or Repair kube-dns in EKS? - kubernetes

I'm running 1.10.13 on EKS on two clusters. I'm aware this will soon be obsolete for coredns on 1.11+.
One of our clusters has a functioning kube-dns deployment.
The other cluster does not have kube-dns objects running.
I've pulled kube-dns serviceAccount, clusterRole, clusterRoleBinding, deployment, and service manifests from here using kubectl get <k8s object> --export.
Now I plan on applying those files to a different cluster.
However, I still see a kube-dns secret and I'm not sure how that is created or where I can get it.
This all seems pretty roundabout. What is the proper way of installing or repairing kube-dns on an EKS cluster?

I believe the secret is usually part of the ServiceAccount, you'd still need to delete if it's there.
To create kube-dns you can try applying the official manifest:
$ kubectl apply -f https://storage.googleapis.com/kubernetes-the-hard-way/kube-dns.yaml`
Like you mentioned, you should consider moving to coredns as soon as possible.

Related

Issues setting up Prometheus on EKS - Pods in Pending State (Seems to be dependent on PVCs waiting on Volume being created)

I have an EKS cluster for my university project and I want to setup Prometheus on the cluster. To do this I am using helm with the following commands (see this tutorial https://archive.eksworkshop.com/intermediate/240_monitoring/deploy-prometheus/):
kubectl create namespace prometheus
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus \
--namespace prometheus \
--set alertmanager.persistentVolume.storageClass="gp2" \
--set server.persistentVolume.storageClass="gp2"
When I check the status of the prometheus pods, the alert-manager and server seem to be in an infinite Pending state:
When I describe the prometheus-alertmanager-0 pod I see the following VolumeBinding error:
When I describe the prometheus-server-5d858bd4bd-6xmws pod I see the following VolumeBinding error:
I can also see there are 2 pvcs in Pending state:
When I describe the prometheus-server pvc, I can see its waiting for a volume to be created:
Im familiar with Kubernetes basics but pvcs is not something that I have used before. Is the solution here to create a "volume" and if so how do I do that?, would that solve the issue?, or am I way off the mark?
Should I try to install Prometheus in a different way?
Any help on this greatly appreciated
Note: Although similar this is not a duplicate of Prometheus server in pending state after installation using Helm. For one the errors highlighted there are different errors, also other manual steps such as creating volumes were also performed (which I have not done), Finally, I am following the specific tutorial referenced and also I am asking if I should try to setup Prometheus a different way if there is a simpler way

kubernetes HPA deleted automatically

I wanted to setup HPA for a deployment on my kubernetes cluster (1.14.0 on bare metal) so I followed the instructions to setup metrics-server here: https://github.com/kubernetes-sigs/metrics-server.
After deploying metrics-server, I am able issue commands like kubectl top nodes and deploy HPA's using kubectl autoscale deployment <deployment-name> --min=1 ...
Currently, the issue I am facing is the HPA's created from kubectl autoscale ... seem to be deleted automatically for some reason after around 4-5 mins. So, I feel like there is some important information/step I am missing related to HPA on kubernetes? But I couldn't find any further information related to this particular issue when searching online...

k3s cleanup of HelmChart?

I have followed instructions from this blog post to set up a k3s cluster on a couple of raspberry pi 4:
I'm now trying to get my hands dirty with traefik as front, but I'm having issues with the way it has been deployed as a 'HelmChart' I think.
From the k3s docs
It is also possible to deploy Helm charts. k3s supports a CRD
controller for installing charts. A YAML file specification can look
as following (example taken from
/var/lib/rancher/k3s/server/manifests/traefik.yaml):
So I have been starting up my k3s with the --no-deploy traefik option to manually add it with settings. So I therefore manually apply a yaml like this:
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: traefik
namespace: kube-system
spec:
chart: https://%{KUBERNETES_API}%/static/charts/traefik-1.64.0.tgz
set:
rbac.enabled: "true"
ssl.enabled: "true"
kubernetes.ingressEndpoint.useDefaultPublishedService: "true"
dashboard:
enabled: true
domain: "traefik.k3s1.local"
But when trying to iterate over settings to get it working as I want, I'm having trouble tearing it down. If I try kubectl delete -f on this yaml it just hangs indefinitely. And I can't seem to find a clean way to delete all the resources manually either.
I've been resorting now to just reinstall my entire cluster over and over because I can't seem to cleanup properly.
Is there a way to delete all the resources created by a chart like this without the helm cli (which I don't even have)?
Are you sure that kubectl delete -f is hanging?
I had the same issue as you and it seemed like kubectl delete -f was hanging, but it was really just taking a long time.
As far as I can tell, when you issue the kubectl delete -f a pod in the kube-system namespace with a name of helm-delete-* should spin up and try to delete the resources deployed via helm. You can get the full name of that container by running kubectl -n kube-system get pods, find the one with kube-delete-<name of yaml>-<id>. Then use the pod name to look at the logs using kubectl -n kube-system logs kube-delete-<name of yaml>-<id>.
An example of what I did was:
kubectl delete -f jenkins.yaml # seems to hang
kubectl -n kube-system get pods # look at pods in kube-system namespace
kubectl -n kube-system logs helm-delete-jenkins-wkjct # look at the delete logs
I see two options here:
Use the --now flag to delete your yaml file with minimal delay.
Use --grace-period=0 --force flags to force delete the resource.
There are other options but you'll need Helm CLI for them.
Please let me know if that helped.

How to set kube-proxy settings using kubectl on AKS

I keep reading documentation that gives parameters for kube-proxy, but does not explain how where these parameters are supposed to be used. I create my cluster using az aks create with the azure-cli program, then I get credentials and use kubectl. So far everything I've done has involved yaml for services and deployments and such, but I can't figure out where all this kube-proxy stuff fits into all of this.
I've googled for days. I've opened question issues on github with AKS. I've asked on the kubernetes slack channel, but nobody has responded.
The kube-proxy on all your Kubernetes nodes runs as a Kubernetes DaemonSet and its configuration is stored on a Kubernetes ConfigMap. To make any changes or add/remove options you will have to edit the kube-proxy DaemonSet or ConfigMap on the kube-system namespace.
$ kubectl -n kube-system edit daemonset kube-proxy
or
$ kubectl -n kube-system edit configmap kube-proxy
For a reference on the kube-proxy command line options you can refer to here.

How to override kops' default kube-dns add-on spec

We're running kops->terraform k8s clusters in AWS, and because of the number of k8s jobs we have in flight at once, the kube-dns container is getting OOMkilled, forcing us to raise the memory limit.
As for automating this so it both survives cluster upgrades and is automatically done for new clusters created from the same template, I don't see a way to override the canned kops spec. The only options I can see involve some update (kubectl edit deployment kube-dns, delete the kube-dns add-on deployment and use our own, overwrite the spec uploaded to the kops state store, etc.) that probably needs to be done each time after using kops to update the cluster.
I've checked the docs and even the spec source and no other options stand out. Is there a way to pass a custom kube-dns deployment spec to kops? Or tell it not to install the kube-dns add-on?