Is the prometheus-to-sd required for GKE? Can I delete it? - kubernetes

A while back a GKE cluster got created which came with a daemonset of:
kubectl get daemonsets --all-namespaces
...
kube-system prometheus-to-sd 6 6 6 3 6 beta.kubernetes.io/os=linux 355d
Can I delete this daemonset without issue?
What is it being used for?
What functionality would I be losing without it?

TL;DR
Even if you delete it, it will be back.
A little bit more explanation
Citing explanation by user #Yasen what prometheus-to-sd is:
prometheus-to-sd is a simple component that can scrape metrics stored in prometheus text format from one or multiple components and push them to the Stackdriver. Main requirement: k8s cluster should run on GCE or GKE.
Github.com: Prometheus-to-sd
Assuming that the command deleting this daemonset will be:
$ kubectl delete daemonset prometheus-to-sd --namespace=kube-system
Executing this command will indeed delete the daemonset but it will be back after a while.
prometheus-to-sd daemonset is managed by Addon-Manager which will recreate deleted daemonset back to original state.
Below is the part of the prometheus-to-sd daemonset YAML definition which states that this daemonset is managed by addonmanager:
labels:
addonmanager.kubernetes.io/mode: Reconcile
You can read more about it by following: Github.com: Kubernetes: addon-manager
Deleting this daemonset is strictly connected to the monitoring/logging solution you are using with your GKE cluster. There are 2 options:
Stackdriver logging/monitoring
Legacy logging/monitoring
Stackdriver logging/monitoring
You need to completely disable logging and monitoring of your GKE cluster to delete this daemonset.
You can do it by following a path:
GCP -> Kubernetes Engine -> Cluster -> Edit -> Kubernetes Engine Monitoring -> Set to disabled.
Legacy logging/monitoring
If you are using a legacy solution which is available to GKE version 1.14, you need to disable the option of Legacy Stackdriver Monitoring by following the same path as above.
Let me know if you have any questions in that.

TL;DR - it's ok
Assuming your context, I suppose, it's ok to shutdown prometheus component of your cluster.
Except cases when reports, alerts and monitoring - are critical parts of your system.
Let dive in the sources of GCP
As per source code at GoogleCloudPlatform:
prometheus-to-sd is a simple component that can scrape metrics stored in prometheus text format from one or multiple components and push them to the Stackdriver. Main requirement: k8s cluster should run on GCE or GKE.
Prometheus
From their Prometheus Github Page:
The Prometheus monitoring system and time series database.
To get a picture what is it for - you can read awesome guide on Prometheus: Prometheus Monitoring : The Definitive Guide in 2019 – devconnected
Also, there are hundreds of videos on their Youtube channel Prometheus Monitoring
Your questions
So, answering to your questions:
Can I delete this daemonset without issue?
It depends. As I said, you can. Except cases when reports, alerts and monitoring - are critical parts of your system.
What is it being used for
It's a TSDB for monitoring
what functionality would I be loosing without it?
metrics
→ therefore dashboards
→ therefore alerting

Related

RabbitMQ and Prometheus - cannot get prometheus to scrape from targets

I'm trying to set up monitoring of RabbitMQ using Prometheus in a Kubernetes cluster.
I've been following the guide on how to set this up using both the Prometheus and RabbitMQ operators for Kubernetes. However when I deploy the PodMonitor and ServiceMonitors to the cluster, Prometheus doesn't seem to be scraping the metrics as expected.
I've correctly set the metadata.labels.release property in the YAML for these two resources, and can see them listed in Prometheus' Status -> Service Discovery UI, but the 'active targets' always reports 0.
My current suspicion is that there are no prometheus or prometheus-tls ports declared against the rabbitMQ service in the cluster, which is what the ServiceMonitor is expecting to scrape from. Presumably declaring this port on the Service is controlled by the rabbitMQ cluster operator. The documentation doesn't mention any additional steps to set up these ports, so I'm not sure if I am understanding the problem correctly.
Update:
I have confirmed my suspicions by manually configuring the prometheus port by copying the additionalPorts example from the operator's GitHub repo, but it's still not clear to me whether this is expected.
Now the first ServiceMonitor correctly reports 2/2 targets are up. A second ServiceMonitor is also configured by following the guide, but it still doesn't work. It is trying to scrape metrics from a /metrics/detailed endpoint and getting a 404 error.

Kubernetes: Monitoring throughput of each Ingress

We're having a bare metal K8s cluster with an NGINX Ingress Controller.
Is there a way to tell how much traffic is transmitted/received of each Ingress?
Thanks!
Ingress Controllers are implemented as standard Kubernetes applications. Any monitoring method adopted by organizations can be applied to Ingress controllers to track the health and lifetime of k8s workloads. To track network traffic statistics, controller-specific mechanisms should be used.
To observe Kubernetes Ingress traffic you can send your statistic to Prometheus and view them in Grafana (widely adopted open source software for data visualization).
Here is a monitoring guide from the ingress-nginx project, where you can read how do do it step by step. Start with installing those tools.
To deploy Prometheus in Kubernetes run the below command:
kubectl apply --kustomize github.com/kubernetes/ingress-nginx/deploy/prometheus/
To install grafana run this one:
kubectl apply --kustomize github.com/kubernetes/ingress-nginx/deploy/grafana/
Follow the next steps in the mentioned before monitoring guide.
See also this article and this similar question.

Installed prometheus-community / helm-charts but I can't get metrics on "default" namespace

I recently learned about helm and how easy it is to deploy the whole prometheus stack for monitoring a Kubernetes cluster, so I decided to try it out on a staging cluster at my work.
I started by creating a dedicates namespace on the cluster for monitoring with:
kubectl create namespace monitoring
Then, with helm, I added the prometheus-community repo with:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
Next, I installed the chart with a prometheus release name:
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring
At this time I didn't pass any custom configuration because I'm still trying it out.
After the install is finished, it all looks good. I can access the prometheus dashboard with:
kubectl port-forward prometheus-prometheus-kube-prometheus-prometheus-0 9090 -n monitoring
There, I see a bunch of pre-defined alerts and rules that are monitoring but the problem is that I don't quite understand how to create new rules to check the pods in the default namespace, where I actually have my services deployed.
I am looking at http://localhost:9090/graph to play around with the queries and I can't seem to use any that will give me metrics on my pods in the default namespace.
I am a bit overwhelmed with the amount of information so I would like to know what did I miss or what am I doing wrong here?
The Prometheus Operator includes several Custom Resource Definitions (CRDs) including ServiceMonitor (and PodMonitor). ServiceMonitor's are used to define services to the Operator to be monitored.
I'm familiar with the Operator although not the Helm deployment but I suspect you'll want to create ServiceMonitors to generate metrics for your apps in any (including default) namespace.
See: https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions
ServiceMonitors and PodMonitors are CRDs for Prometheus Operator. When working directly with Prometheus helm chart (without operator), you need have to configure your targets directly in values.yaml by editing the scrape_configs section.
It is more complex to do it, so take a deep breath and start by reading this: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config

Prometheus is not collecting pod metrics

I deployed Prometheus and Grafana into my cluster.
When I open the dashboards I don't get data for pod CPU usage.
When I check Prometheus UI, it shows pods 0/0 up, however I have many pods running in my cluster.
What could be the reason? I have node exporter running in all of nodes.
Am getting this for kube-state-metrics,
I0218 14:52:42.595711 1 builder.go:112] Active collectors: configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,jobs,limitranges,namespaces,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets
I0218 14:52:42.595735 1 main.go:208] Starting metrics server: 0.0.0.0:8080
Here is my Prometheus config file:
https://gist.github.com/karthikeayan/41ab3dc4ed0c344bbab89ebcb1d33d16
I'm able to hit and get data for:
http://localhost:8080/api/v1/nodes/<my_worker_node>/proxy/metrics/cadvisor
As it was mentioned by karthikeayan in comments:
ok, i found something interesting in the values.yaml comments, prometheus.io/scrape: Only scrape pods that have a value of true, when i remove this relabel_config in k8s configmap, i got the data in prometheus ui.. unfortunately k8s configmap doesn't have comments, i believe helm will remove the comments before deploying it.
And just for clarification:
kube-state-metrics vs. metrics-server
The metrics-server is a project that has been inspired by Heapster and is implemented to serve the goals of the Kubernetes Monitoring Pipeline. It is a cluster level component which periodically scrapes metrics from all Kubernetes nodes served by Kubelet through Summary API. The metrics are aggregated, stored in memory and served in Metrics API format. The metric-server stores the latest values only and is not responsible for forwarding metrics to third-party destinations.
kube-state-metrics is focused on generating completely new metrics from Kubernetes' object state (e.g. metrics based on deployments, replica sets, etc.). It holds an entire snapshot of Kubernetes state in memory and continuously generates new metrics based off of it. And just like the metric-server it too is not responsibile for exporting its metrics anywhere.
Having kube-state-metrics as a separate project also enables access to these metrics from monitoring systems such as Prometheus.

How to override kops' default kube-dns add-on spec

We're running kops->terraform k8s clusters in AWS, and because of the number of k8s jobs we have in flight at once, the kube-dns container is getting OOMkilled, forcing us to raise the memory limit.
As for automating this so it both survives cluster upgrades and is automatically done for new clusters created from the same template, I don't see a way to override the canned kops spec. The only options I can see involve some update (kubectl edit deployment kube-dns, delete the kube-dns add-on deployment and use our own, overwrite the spec uploaded to the kops state store, etc.) that probably needs to be done each time after using kops to update the cluster.
I've checked the docs and even the spec source and no other options stand out. Is there a way to pass a custom kube-dns deployment spec to kops? Or tell it not to install the kube-dns add-on?