Stackdriver Prometheus sidecar with Prometheus Operator helm chart - kubernetes

We have setup Prometheus + Grafana on our GKE cluster using the stable/prometheus-operator helm chart. Now we want to export some metrics to Stackdriver because we have installed custom metrics Stackdriver adapter. We are already using some Pub/Sub metrics from Stackdriver for autoscaling few deployments. Now we also want to use some Prometheus metrics (mainly nginx request rate) in the autoscaling of other deployments.
So, my first question: Can we use Prometheus adapter in parallel with Stackdriver adapter for autoscaling in the same cluster?
If not, we will need to install Stackdriver Prometheus Sidecar for exporting the Prometheus metrics to Stackdriver and then use them for autoscaling via Stackdriver adapter.
From the instructions here, it looks like we need to install Stackdriver sidecar on same pod on which Prometheus is running. I gave it a try. When I run the patch.sh script, I got the message back: statefulset.apps/prometheus-prom-operator-prometheus-o-prometheus patched but when I inspected the statefulset again, it didn't have the Stackdriver sidecar container in it. Since this statefulset is created by a Helm chart, we probably can't modify it directly. Is there a recommended way of doing this in Helm?

Thanks to this comment on GitHub, I figured it out. There are so many configuration options accepted by this Helm chart that I missed it while reading the docs.
So, turns out that this Helm chart accepts a configuration option prometheus.prometheusSpec.containers. Its description in the docs says: "Containers allows injecting additional containers. This is meant to allow adding an authentication proxy to a Prometheus pod". But obviously, it is not limited to the authentication proxy and you can pass any container spec here and it will be added to Prometheus StatefulSet created by this Helm chart.
Here is the sample configuration I used. Some key points:
Please replace the values in angle brackets with your actual values.
Feel free to remove the arg --include. I added it because nginx_http_requests_total is the only Prometheus metric I want to send to Stackdriver for now. Check Managing costs for Prometheus-derived metrics for more details about it.
To figure out the name of volume to use in volumeMounts:
List down StatefulSets in Prometheus Operator namespace. Assuming that you installed it in monitoring namespace: kubectl get statefulsets -n monitoring
Describe the Prometheus StatefulSet assuming that its name is prometheus-prom-operator-prometheus-o-prometheus: kubectl describe statefulset prometheus-prom-operator-prometheus-o-prometheus -n monitoring
In details of this StatefulSet, find container named prometheus. Note the value passed to it in arg --storage.tsdb.path
Find the volume that is mounted on this container on same path. In my case, it was prometheus-prom-operator-prometheus-o-prometheus-db so I mounted the same volume on my Stackdriver sidecar container as well.
prometheus:
prometheusSpec:
containers:
- name: stackdriver-sidecar
image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.7.5
imagePullPolicy: Always
args:
- --stackdriver.project-id=<GCP PROJECT ID>
- --prometheus.wal-directory=/prometheus/wal
- --stackdriver.kubernetes.location=<GCP PROJECT REGION>
- --stackdriver.kubernetes.cluster-name=<GKE CLUSTER NAME>
- --include=nginx_http_requests_total
ports:
- name: stackdriver
containerPort: 9091
volumeMounts:
- name: prometheus-prom-operator-prometheus-o-prometheus-db
mountPath: /prometheus
Save this yaml to a file. Let's assume you saved it to prom-config.yaml
Now, find the release name you have used to install Prometheus Operator Helm chart on your cluster:
helm list
Assuming that release name is prom-operator, you can update this release according to the config composed above by running this command:
helm upgrade -f prom-config.yaml prom-operator stable/prometheus-operator
I hope you found this helpful.

Related

How to configure istio helm chart to use external kube-prometheus-stack?

I have deployed the istio service mesh on the GKE cluster using base & istiod helm charts using this documents in the istio-system namespace.
I have deployed Prometheus, grafana & alert-manager using kube-prometheus-stack helm chart.
Every pod of this workload is working fine; I didn't see any error. Somehow I didn't get any metrics in Prometheus UI related to istio workload. Because of that, I didn't see any network graph in kiali dashboard.
Can anyone help me resolve this issue?
Istio expects Prometheus to discover which pods are exposing metrics through the use of the Kubernetes annotations prometheus.io/scrape, prometheus.io/port, and prometheus.io/path.
The Prometheus community has decided that those annotations, while popular, are insufficiently useful to be enabled by default. Because of this the kube-prometheus-stack helm chart does not discover pods using those annotations.
To get your installation of Prometheus to scrape your Istio metrics you need to either configure Istio to expose metrics in a way that your installation of Prometheus expects (you'll have to check the Prometheus configuration for that, I do not know what it does by default) or add a Prometheus scrape job which will do discovery using the above annotations.
Details about how to integrate Prometheus with Istio are available here and an example Prometheus configuration file is available here.
Need to add additionalScrapConfigs for istio in kube-prometheus-stack helm chart values.yaml.
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- {{ add your scrap config for istio }}

Installed prometheus-community / helm-charts but I can't get metrics on "default" namespace

I recently learned about helm and how easy it is to deploy the whole prometheus stack for monitoring a Kubernetes cluster, so I decided to try it out on a staging cluster at my work.
I started by creating a dedicates namespace on the cluster for monitoring with:
kubectl create namespace monitoring
Then, with helm, I added the prometheus-community repo with:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
Next, I installed the chart with a prometheus release name:
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring
At this time I didn't pass any custom configuration because I'm still trying it out.
After the install is finished, it all looks good. I can access the prometheus dashboard with:
kubectl port-forward prometheus-prometheus-kube-prometheus-prometheus-0 9090 -n monitoring
There, I see a bunch of pre-defined alerts and rules that are monitoring but the problem is that I don't quite understand how to create new rules to check the pods in the default namespace, where I actually have my services deployed.
I am looking at http://localhost:9090/graph to play around with the queries and I can't seem to use any that will give me metrics on my pods in the default namespace.
I am a bit overwhelmed with the amount of information so I would like to know what did I miss or what am I doing wrong here?
The Prometheus Operator includes several Custom Resource Definitions (CRDs) including ServiceMonitor (and PodMonitor). ServiceMonitor's are used to define services to the Operator to be monitored.
I'm familiar with the Operator although not the Helm deployment but I suspect you'll want to create ServiceMonitors to generate metrics for your apps in any (including default) namespace.
See: https://github.com/prometheus-operator/prometheus-operator#customresourcedefinitions
ServiceMonitors and PodMonitors are CRDs for Prometheus Operator. When working directly with Prometheus helm chart (without operator), you need have to configure your targets directly in values.yaml by editing the scrape_configs section.
It is more complex to do it, so take a deep breath and start by reading this: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config

Is the prometheus-to-sd required for GKE? Can I delete it?

A while back a GKE cluster got created which came with a daemonset of:
kubectl get daemonsets --all-namespaces
...
kube-system prometheus-to-sd 6 6 6 3 6 beta.kubernetes.io/os=linux 355d
Can I delete this daemonset without issue?
What is it being used for?
What functionality would I be losing without it?
TL;DR
Even if you delete it, it will be back.
A little bit more explanation
Citing explanation by user #Yasen what prometheus-to-sd is:
prometheus-to-sd is a simple component that can scrape metrics stored in prometheus text format from one or multiple components and push them to the Stackdriver. Main requirement: k8s cluster should run on GCE or GKE.
Github.com: Prometheus-to-sd
Assuming that the command deleting this daemonset will be:
$ kubectl delete daemonset prometheus-to-sd --namespace=kube-system
Executing this command will indeed delete the daemonset but it will be back after a while.
prometheus-to-sd daemonset is managed by Addon-Manager which will recreate deleted daemonset back to original state.
Below is the part of the prometheus-to-sd daemonset YAML definition which states that this daemonset is managed by addonmanager:
labels:
addonmanager.kubernetes.io/mode: Reconcile
You can read more about it by following: Github.com: Kubernetes: addon-manager
Deleting this daemonset is strictly connected to the monitoring/logging solution you are using with your GKE cluster. There are 2 options:
Stackdriver logging/monitoring
Legacy logging/monitoring
Stackdriver logging/monitoring
You need to completely disable logging and monitoring of your GKE cluster to delete this daemonset.
You can do it by following a path:
GCP -> Kubernetes Engine -> Cluster -> Edit -> Kubernetes Engine Monitoring -> Set to disabled.
Legacy logging/monitoring
If you are using a legacy solution which is available to GKE version 1.14, you need to disable the option of Legacy Stackdriver Monitoring by following the same path as above.
Let me know if you have any questions in that.
TL;DR - it's ok
Assuming your context, I suppose, it's ok to shutdown prometheus component of your cluster.
Except cases when reports, alerts and monitoring - are critical parts of your system.
Let dive in the sources of GCP
As per source code at GoogleCloudPlatform:
prometheus-to-sd is a simple component that can scrape metrics stored in prometheus text format from one or multiple components and push them to the Stackdriver. Main requirement: k8s cluster should run on GCE or GKE.
Prometheus
From their Prometheus Github Page:
The Prometheus monitoring system and time series database.
To get a picture what is it for - you can read awesome guide on Prometheus: Prometheus Monitoring : The Definitive Guide in 2019 – devconnected
Also, there are hundreds of videos on their Youtube channel Prometheus Monitoring
Your questions
So, answering to your questions:
Can I delete this daemonset without issue?
It depends. As I said, you can. Except cases when reports, alerts and monitoring - are critical parts of your system.
What is it being used for
It's a TSDB for monitoring
what functionality would I be loosing without it?
metrics
→ therefore dashboards
→ therefore alerting

Grafana dashboard not displaying pod name instead pod_name

I have deployed application on kubernetes cluster and for monitoring using prometheus and grafana. For kubernetes pods monitoring using Grafana dashboard: Kubernetes cluster monitoring (via Prometheus) https://grafana.com/grafana/dashboards/315
I had imported the dashboard using id 315 and its reflecting without pod name and containers name instead getting pod_name . Can anyone pls help how can i get pod name and container name in dashboard.
Provided tutorial was updated 2 years ago.
Current version of Kubernetes is 1.17. As per tags, tutorial was tested on Prometheus v. 1.3.0, Kubernetes v.1.4.0 and Grafana v.3.1.1 which are quite old at the moment.
In requirements you have statement:
Prometheus will use metrics provided by cAdvisor via kubelet service (runs on each node of Kubernetes cluster by default) and via kube-apiserver service only.
In Kubernetes 1.16 metrics labels like pod_name and container_name was removed. Instead of that you need to use pod and container. You can verify it here.
Any Prometheus queries that match pod_name and container_name labels (e.g. cadvisor or kubelet probe metrics) must be updated to use pod and container instead.
Please check this Github Thread about dashboard bug for more information.
Solution
Please change pod_name to pod in your query.
Kubernetes version v1.16.0 has Removed cadvisor metric labels pod_name and container_name to match instrumentation guidelines. Any Prometheus queries that match pod_name and container_name labels (e.g. cadvisor or kubelet probe metrics) must be updated to use pod and container instead.
You can check:
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.16.md#metrics-changes

not able to run hpa, get metrics to api metrics

I am trying to run horizontal pod autoscaler in kubernetes, want to auto scale my pods based on cpu default metrics.
For that I installed metrics server after that I can see metrics - metrics.k8s.io/v1beta1 (kubectl api-versions). Then I tried deploying prometheus-operator. But upon runnning kubectl top node/pod - error I am getting is
error: Metrics not available for pod default/web-deployment-658cd556f8-ztf6c, age: 35m23.264812635s" and "error: metrics not available yet"
Do I need to run heapster?
#batman, as you said enabling minikube metrics-server add-on is enough in case of using minikube.
In general case, if using metrics-server you edited the metrics server deployment by running: kubectl edit deployment metrics-server -n kube-system
Under spec: -> containers: add following flag:
spec:
containers:
- command:
- /metrics-server
- --kubelet-insecure-tls
As described on metrics-server github:
--kubelet-insecure-tls: skip verifying Kubelet CA certificates. Not recommended for production usage, but can be useful in test clusters
with self-signed Kubelet serving certificates.
Here you can find tutorial describing HPA using custom metrics and Prometheus.
In minikube, we have to enable metrics-server add-on.
minikube addons list
minikube addons enable metrics-server
Then create hpa, deployment and boom!!
Anyone has done autoscaling based on custom metrics? like based on no. of http requests?