We're running kops->terraform k8s clusters in AWS, and because of the number of k8s jobs we have in flight at once, the kube-dns container is getting OOMkilled, forcing us to raise the memory limit.
As for automating this so it both survives cluster upgrades and is automatically done for new clusters created from the same template, I don't see a way to override the canned kops spec. The only options I can see involve some update (kubectl edit deployment kube-dns, delete the kube-dns add-on deployment and use our own, overwrite the spec uploaded to the kops state store, etc.) that probably needs to be done each time after using kops to update the cluster.
I've checked the docs and even the spec source and no other options stand out. Is there a way to pass a custom kube-dns deployment spec to kops? Or tell it not to install the kube-dns add-on?
Related
i am running an AKS cluster and i have deployed Prometheus and Alertmanager via deployment resources in k8s and they also are controlled by replicaset.The issue is that sometimes the restart of Alertmanger get stuck.It is related to accessMode of PVC.During restart,k8s will start the new pod in a different node from the currently node where the running pod is assigned,depending on resource utilization on the node.In simple words it means,same PVC is accessed from 2 different pods assigned to different nodes.This is not allowed because in the config of PVC i am using accessMode ReadWriteOnce.Looking this comment in github for prometheus operator seems to be by design that option accessMode ReadWriteMany is not allowed.
So my questions, why such design and what could happen if i change accessMode to ReadWriteMany?Any practical experience?
A while back a GKE cluster got created which came with a daemonset of:
kubectl get daemonsets --all-namespaces
...
kube-system prometheus-to-sd 6 6 6 3 6 beta.kubernetes.io/os=linux 355d
Can I delete this daemonset without issue?
What is it being used for?
What functionality would I be losing without it?
TL;DR
Even if you delete it, it will be back.
A little bit more explanation
Citing explanation by user #Yasen what prometheus-to-sd is:
prometheus-to-sd is a simple component that can scrape metrics stored in prometheus text format from one or multiple components and push them to the Stackdriver. Main requirement: k8s cluster should run on GCE or GKE.
Github.com: Prometheus-to-sd
Assuming that the command deleting this daemonset will be:
$ kubectl delete daemonset prometheus-to-sd --namespace=kube-system
Executing this command will indeed delete the daemonset but it will be back after a while.
prometheus-to-sd daemonset is managed by Addon-Manager which will recreate deleted daemonset back to original state.
Below is the part of the prometheus-to-sd daemonset YAML definition which states that this daemonset is managed by addonmanager:
labels:
addonmanager.kubernetes.io/mode: Reconcile
You can read more about it by following: Github.com: Kubernetes: addon-manager
Deleting this daemonset is strictly connected to the monitoring/logging solution you are using with your GKE cluster. There are 2 options:
Stackdriver logging/monitoring
Legacy logging/monitoring
Stackdriver logging/monitoring
You need to completely disable logging and monitoring of your GKE cluster to delete this daemonset.
You can do it by following a path:
GCP -> Kubernetes Engine -> Cluster -> Edit -> Kubernetes Engine Monitoring -> Set to disabled.
Legacy logging/monitoring
If you are using a legacy solution which is available to GKE version 1.14, you need to disable the option of Legacy Stackdriver Monitoring by following the same path as above.
Let me know if you have any questions in that.
TL;DR - it's ok
Assuming your context, I suppose, it's ok to shutdown prometheus component of your cluster.
Except cases when reports, alerts and monitoring - are critical parts of your system.
Let dive in the sources of GCP
As per source code at GoogleCloudPlatform:
prometheus-to-sd is a simple component that can scrape metrics stored in prometheus text format from one or multiple components and push them to the Stackdriver. Main requirement: k8s cluster should run on GCE or GKE.
Prometheus
From their Prometheus Github Page:
The Prometheus monitoring system and time series database.
To get a picture what is it for - you can read awesome guide on Prometheus: Prometheus Monitoring : The Definitive Guide in 2019 – devconnected
Also, there are hundreds of videos on their Youtube channel Prometheus Monitoring
Your questions
So, answering to your questions:
Can I delete this daemonset without issue?
It depends. As I said, you can. Except cases when reports, alerts and monitoring - are critical parts of your system.
What is it being used for
It's a TSDB for monitoring
what functionality would I be loosing without it?
metrics
→ therefore dashboards
→ therefore alerting
I wanted to setup HPA for a deployment on my kubernetes cluster (1.14.0 on bare metal) so I followed the instructions to setup metrics-server here: https://github.com/kubernetes-sigs/metrics-server.
After deploying metrics-server, I am able issue commands like kubectl top nodes and deploy HPA's using kubectl autoscale deployment <deployment-name> --min=1 ...
Currently, the issue I am facing is the HPA's created from kubectl autoscale ... seem to be deleted automatically for some reason after around 4-5 mins. So, I feel like there is some important information/step I am missing related to HPA on kubernetes? But I couldn't find any further information related to this particular issue when searching online...
I'm setting up a GKE cluster on Google Kubernetes Engine to run some heavy jobs. I have a render-pool of big machines that I want to autoscale from 0 to N (using the cluster autoscaler). My default-pool is a cheap g1-small to run the system pods (those never go away so the default pool can't autoscale to 0, too bad).
My problem is that the render-pool doesn't want to scale down to 0. It has some system pods running on it; are those the problem? The default pool has plenty of resources to run all of them as far as I can tell. I've read the autoscaler FAQ, and it looks like it should delete my node after 10 min of inactivity. I've waited an hour though.
I created the render pool like this:
gcloud container node-pools create render-pool-1 --cluster=test-zero-cluster-2 \
--disk-size=60 --machine-type=n2-standard-8 --image-type=COS \
--disk-type=pd-standard --preemptible --num-nodes=1 --max-nodes=3 --min-nodes=0 \
--enable-autoscaling
The cluster-autoscaler-status configmap says ScaleDown: NoCandidates and it is probing the pool frequently, as it should.
What am I doing wrong, and how do I debug it? Can I see why the autoscaler doesn't think it can delete the node?
As pointed out in the comments, some pods, under specific circumstances will prevent the CA from downscaling.
In GKE, you have logging pods (fluentd), kube-dns, monitoring, etc., all considered system pods. This means that any node where they're scheduled, will not be a candidate for downscaling.
Considering this, it all boils down to creating an scenario where all the previous conditions for downscaling are met.
Since you only want to scale down an specific node-pool, I'd use Taints and tolerations to keep system pods in the default pool.
For GKE specifically, you can pick each app by their k8s-app label, for instance:
$ kubectl taint nodes GPU-NODE k8s-app=heapster:NoSchedule
This will prevent the tainted nodes from scheduling Heapster.
Not recommended but, you can go broader and try to get all the GKE system pods using kubernetes.io/cluster-service instead:
$ kubectl taint nodes GPU-NODE kubernetes.io/cluster-service=true:NoSchedule
Just be careful as the scope of this label is broader and you'll have to keep track of oncoming changes, as this label is possibily going to be deprecated someday.
Another thing that you might want to consider is using Pod Disruption Budgets. This might be more effective in stateless workloads, but setting it very tight is likely to cause inestability.
The idea of a PDB is to tell GKE what's the very minimal amount of pods that can be run at any given time, allowing the CA to evict them. It can be applied to system pods like below:
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: dns-pdb
spec:
minAvailable: 1
selector:
matchLabels:
k8s-app: kube-dns
This tell GKE that, although there's usually 3 replicas of kube-dns, the application might be able to take 2 disruptions and sustain temporarily with only 1 replica, allowing the CA to evict these pods and reschedule them in other nodes.
As you probably noticed, this will put stress on DNS resolution in the cluster (in this particular example), so be careful.
Finally and regarding how to debug the CA. For now, consider that GKE is a managed version of Kubernetes where you don't really have direct access to tweak some features (for better or worse). You cannot set flags in the CA and access to logs could be through GCP support. The idea is to protect the workloads running in the cluster rather than to be cost-wise.
Downscaling in GKE is more about using different features in Kubernetes together until the CA conditions for downscaling are met.
I'm running 1.10.13 on EKS on two clusters. I'm aware this will soon be obsolete for coredns on 1.11+.
One of our clusters has a functioning kube-dns deployment.
The other cluster does not have kube-dns objects running.
I've pulled kube-dns serviceAccount, clusterRole, clusterRoleBinding, deployment, and service manifests from here using kubectl get <k8s object> --export.
Now I plan on applying those files to a different cluster.
However, I still see a kube-dns secret and I'm not sure how that is created or where I can get it.
This all seems pretty roundabout. What is the proper way of installing or repairing kube-dns on an EKS cluster?
I believe the secret is usually part of the ServiceAccount, you'd still need to delete if it's there.
To create kube-dns you can try applying the official manifest:
$ kubectl apply -f https://storage.googleapis.com/kubernetes-the-hard-way/kube-dns.yaml`
Like you mentioned, you should consider moving to coredns as soon as possible.