I followed a bunch of tutorials on how to monitor Kubernetes with prometheus and Grafana
All referring to a deprecated helm operator
According to the tutorials Grafana comes out of the box complete with cluster monitoring.
In practice Grafana is not installed with the chart
helm install prometheus-operator stable/prometheus -n monitor
nor is it installed with the newer community repo
helm install prometheus-operator prometheus-community/prometheus -n monitor
I installed the Grafana chart independently
helm install grafana-operator grafana/grafana -n monitor
And through the UI tried to connect using inner cluster URLs
prometheus-operator-server.monitor.svc.cluster.local:80
prometheus-operator-alertmanager.monitor.svc.cluster.local:80
the UI test indicates success but produces no metrics.
Is there a ready made Helm operator with out of the box Grafana?
How can Grafana interact with Prometeus?
You've used the wrong charts. Currently the project is named kube-prometheus-stack:
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
If you look at values.yaml you'll notice switches for everything, including prometheus, all the exporters, grafana, all the standard dashboards, alerts for kubernetes and so on. It's all installed by one chart. And it's all linked together out of the box.
They only additional thing you might need is an Ingress/ELB for grafana, prometheus, and alertmanager to be able to open them without port-forwarding (don't forget to add ouath2-proxy or smth else cause it's all opened with no password by default).
I wouldn't bother, look at PaaS like Datadog, NewRelic etc. What you are describing becomes a costly nightmare at scale. It's just not worth the hassle for what you get ihmo.
I am using istio.1.6 and i was trying to store metrics from istio prometheus to external prometheus based on istio best practise doc.But in the first step, I have to edit my configuration and add recording rules.I tried to edit the configmap of istio prometheus and added the recording rules.Edit is successful but when i try to see the rules in prometheus dashboard ,they donot appear(which i believe means the config didnot apply).I also tried to just delete the pod and see if the new pod has new configurations but still the problem.
What am i doing wrong? Any suggestions and answers is appreciated.
The problem was that the way I added the recording rules.I added rules in rules.yaml but forgot to mention it in rule_files field of the prometheus config file.I didn't know how to do prometheus configuration and that was the problem.
I also refered this github example
Also check out this post on prometheus federation
I am looking for some recommendations as to how Prometheus configuration (like providing our custom alert rules files and reloading the config dynamically if the alert rules files sees some modifications ) using Prometheus operator (helm charts ) ??
Installing Prometheus using Prometheus operator generates default configuration for it so trying to custom feed it using values.yaml file of Prometheus operator.
Any suggestions ?
Thanks.
Prometheus-Operator exposes CRD/s called PromethesRule/s. If you intend to provide your own custom rules, the best thing would be to start out with some simple rules just to get used to the vector centered language PromQL. Also you could expose Prometheus Web UI just to get used to the interface, have a look at metrics, play with the console, evaluate your targets ...
You do not have to worry about rules being picked up, it happens automatically if everything is fine with your Targets and also the TCP/IP needed ports are opened.
Also in case your face some challenge their github page is quite responsive and they have a 'support' label for opened features.
The github repo of Prometheus Operator https://github.com/coreos/prometheus-operator/ project says that
The Prometheus Operator makes the Prometheus configuration Kubernetes native and manages and operates Prometheus and Alertmanager clusters. It is a piece of the puzzle regarding full end-to-end monitoring.
kube-prometheus combines the Prometheus Operator with a collection of manifests to help getting started with monitoring Kubernetes itself and applications running on top of it.
Can someone elaborate this?
I've always had this exact same question/repeatedly bumped into both, but tbh reading the above answer didn't clarify it for me/I needed a short explanation. I found this github issue that just made it crystal clear to me.
https://github.com/coreos/prometheus-operator/issues/2619
Quoting nicgirault of GitHub:
At last I realized that prometheus-operator chart was packaging
kube-prometheus stack but it took me around 10 hours playing around to
realize this.
**Here's my summarized explanation:
"kube-prometheus" and "Prometheus Operator Helm Chart" both do the same thing:
Basically the Ingress/Ingress Controller Concept, applied to Metrics/Prometheus Operator.
Both are a means of easily configuring, installing, and managing a huge distributed application (Kubernetes Prometheus Stack) on Kubernetes:**
What is the Entire Kube Prometheus Stack you ask? Prometheus, Grafana, AlertManager, CRDs (Custom Resource Definitions), Prometheus Operator(software bot app), IaC Alert Rules, IaC Grafana Dashboards, IaC ServiceMonitor CRDs (which auto-generate Prometheus Metric Collection Configuration and auto hot import it into Prometheus Server)
(Also when I say easily configuring I mean 1,000-10,000++ lines of easy for humans to understand config that generates and auto manage 10,000-100,000 lines of machine config + stuff with sensible defaults + monitoring configuration self-service, distributed configuration sharding with an operator/controller to combine config + generate verbose boilerplate machine-readable config from nice human-readable config.
If they achieve the same end goal, you might ask what's the difference between them?
https://github.com/coreos/kube-prometheus
https://github.com/helm/charts/tree/master/stable/prometheus-operator
Basically, CoreOS's kube-prometheus deploys the Prometheus Stack using Ksonnet.
Prometheus Operator Helm Chart wraps kube-prometheus / achieves the same end result but with Helm.
So which one to use?
Doesn't matter + they achieve the same end result + shouldn't be crazy difficult to start with 1 and switch to the other.
Helm tends to be faster to learn/develop basic mastery of.
Ksonnet is harder to learn/develop basic mastery of, but:
it's more idempotent (better for CICD automation) (but it's only a difference of 99% idempotent vs 99.99% idempotent.)
has built-in templating which means that if you have multiple clusters you need to manage / that you want to always keep consistent with each other. Then you can leverage ksonnet's templating to manage multiple instances of the Kube Prometheus Stack (for multiple envs) using a DRY code base with lots of code reuse. (If you only have a few envs and Prometheus doesn't need to change often it's not completely unreasonable to keep 4 helm values files in sync by hand. I've also seen Jinja2 templating used to template out helm values files, but if you're going to bother with that you may as well just consider ksonnet.)
Kubernetes operator are kubernetes specific application(pods) that configure, manage and optimize other Kubernetes deployments automatically. They are implemented as a custom controller.
According to official coreOS website:
Operators were introduced by CoreOS as a class of software that operates other software, putting operational knowledge collected by humans into software.
The prometheus operator provides the easy way to deploy configure and monitor your prometheus instances on kubernetes cluster. To do so, prometheus operator introduces three types of custom resource definition(CRD) in kubernetes.
Prometheus
Alertmanager
ServiceMonitor
Now, with the help of above CRD's, you can directly create a prometheus instance by providing kind: Prometheus and the prometheus instance is ready to serve, likewise you can do for AlertManager. Without this you would have to setup the deployment for prometheus with its image, configuration and many more things.
The Prometheus Operator serves to make running Prometheus on top of Kubernetes as easy as possible, while preserving Kubernetes-native configuration options.
Now, kube-prometheus implemented the prometheus operator and provides you minimum yaml files to create your basic setup of prometheus, alertmanager and grafana by running a single command.
git clone https://github.com/coreos/prometheus-operator.git
kubectl apply -f prometheus-operator/contrib/kube-prometheus/manifests/
By running above command in kube-prometheus directory, you will get a monitoring namespace which will have an instance of alertmanager, prometheus and grafana for UI. This is enough setup for most of the basic implementation and if you need any more specifics according to your application, you can add more yamls of exporter you need.
Kube-prometheus is more of a contribution to prometheus-operator project, which implements the prometheus operator functionality very well and provide you a complete monitoring setup for your kubernetes cluster. You can start with kube-prometheus and extend the functionality of your monitoring setup according to your application from there.
You can learn more about prometheus-operator here
As of today, 28-09-2020, this is the way to install Prometheus in a Kubernetes cluster
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#kube-prometheus-stack
According to official documentation, kube-prometheus-stack is a rename of prometheus-operator.
As I understood, kube-prometheus-stack also has preinstalled grafana dashboards and prometheus rules.
Note: This chart was formerly named prometheus-operator chart, now
renamed to more clearly reflect that it installs the kube-prometheus
project stack, within which Prometheus Operator is only one component.
Taken from https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
Architecturally the container runs docker
The default container logs are managed by Docker, and the default log driver uses JSON-file
log-driver": "json-file
https://docs.docker.com/config/containers/logging/configure/
If the default jSON-file is used to manage container logs, log rotation is not performed by default. Therefore, the default JSON-file log driver the log files stored by the log driver can result in a large amount of disk space for containers that generate a large amount of output, which can cause disk space to run out.
In this case, save the log to ES, store it separately, and periodically delete the index using curator kubernetes
And run a scheduled task in K8S to delete the index periodically
Another solution for disk space is to periodically delete old logs from jSON-files
Typically we set the size and number of logs
This will set up a maximum of 10 log files, each with a maximum size of 20 Mb. Therefore, the container has a maximum of 200 Mb of logs
"log-driver": "json-file", "log-opts": { "max-size": "20m", "max-file": "10" },
Note: In general, the default Docker log is placed
/var/lib/docker/containers/
But in the same case kubernetes also saves logs and creates a directory structure to help you find pods-based logs, so you can find container logs for each Pod running on a node
/var/log/pods/<namespace>_<pod_name>_<pod_id>/<container_name>/
When removing pod, / var/lib/container under the docker/containers/log and k8s created under/var/log/pods/pod log will be deleted
For example, if the POD is restarted during production, the pod log will be deleted whether it is on the original node or jumped to another node
Therefore, this log needs to be saved in ES for centralized management. Many R&D projects will check the log for troubleshooting in most cases
I use prometheus to monitor kuernetes cluster. When i use sum(container_fs_reads_total), the result is 0 . How can I find pod's filesystem reads per seconds
Prometheus graphing dashboard may or may not be getting the values for that metric.
Since this is part of cadvisor and this
Verify the k8s pods associated with cadvisor are up and running.
Check to see that your cadvisor web site has data under /containers for the metric.
Verify in the config map for Prometheus that you are scraping/containers inside the scrape_config.
Once you have the Prometheus Dashboard up, go to the Graph tab and see if the metric has any values for the last couple of days or so.
Then check the targets tab and make sure the cadvisor host is a target and is up.
Those are some suggestions to narrow down your search for verifying the data is being collected and scraped.