Configure Prometheus for monitoring multiple microservices - docker-compose

I want to monitor a Spring Boot Microservices application running on Docker-Compose with about 20 microservices with Prometheus and Grafana.
What is the best approach:
1- Having one job with multiple targets for each microservice?
scrape_configs:
- job_name: 'services-job'
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets: ['service-one:8080']
labels:
group: 'service-one'
- targets: ['service-two:8081']
labels:
group: 'service-two'
2- Having multiple jobs with single target for each service?
scrape_configs:
- job_name: 'service-one-job'
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets: ['service-one:8080']
labels:
group: 'service-one'
- job_name: 'service-two-job'
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets: ['service-two:8081']
labels:
group: 'service-two'

The way you group your targets by job has nothing to do with the number of endpoints to scrape.
You need to group all the targets with the same purpose in the same job. That's exactly what the documentation says :
A collection of instances with the same purpose, a process replicated for scalability or reliability for example, is called a job.

Related

Missing Kube State Metrics in remote write with Prometheus

Hey I'm currently trying to determine uptime of a pod with kube state metrics, specifically when a pod has started or stopped. I am using a Prometheus Deployment with Kube State metrics in order to determine when a pod has been started and stopped.
Specifically I want to get the following metrics:
kube_pod_completion_time
kube_pod_created
As a test I've configured Prometheus to gather metrics with the following config.yml file:
global:
scrape_interval: 10m
scrape_timeout: 10s
evaluation_interval: 10m
scrape_configs:
- job_name: kubernetes-nodes-cadvisor
honor_timestamps: true
scrape_interval: 10m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
enable_http2: true
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
action: replace
metric_relabel_configs:
- source_labels: [__name__]
regex: '(container_cpu_usage_seconds_total|container_fs_reads_bytes_total|container_fs_writes_bytes_total|container_memory_max_usage_bytes|container_network_receive_bytes_total|container_network_transmit_bytes_total)'
action: keep
kubernetes_sd_configs:
- role: node
kubeconfig_file: ''
follow_redirects: true
enable_http2: true
- job_name: 'kube-state-metrics'
scrape_interval: 10m
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
metric_relabel_configs:
- source_labels: [__name__]
regex: '(kube_pod_labels|kube_pod_created|kube_pod_completion_time|kube_pod_container_resource_limits)'
action: keep
remote_write:
- url: http://example.com
remote_timeout: 30s
follow_redirects: true
enable_http2: true
oauth2:
token_url: https://example.com
client_id: myCoolID
client_secret: myCoolPassword
queue_config:
capacity: 2500
max_shards: 200
min_shards: 1
max_samples_per_send: 10
batch_send_deadline: 5s
min_backoff: 30ms
max_backoff: 5s
metadata_config:
send: false
Additionally I also have the following test pod deployment running:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: busy-box-test
spec:
replicas: 1
selector:
matchLabels:
app: busy-box-test
template:
metadata:
labels:
app: busy-box-test
spec:
containers:
- command:
- sleep
- '300'
image: busybox
name: test-box
However when I go to search for metrics regarding kube_pod_completion_time I cannot find any in my remote write source, while I do have all the other metrics specified in the regex. (kube_pod_labels|kube_pod_created ... kube_pod_container_resource_limits)
Additionally I've tried the following command to see if they are present in the cluster:
kubectl get --raw '/metrics' | grep kube_ and kubectl get --raw 'kube-state-metrics.kube-system.svc.cluster.local:8080' but I don't find anything definitive. I suspect the command is looking in the wrong location
So beyond if I am missing something obvious I missed I have the following open questions:
Is there an endpoint I should hit inside the cluster which should return the completion time? Is there an issue with the polling interval being once every 10 minutes for a pod that comes up and down every 5? (If anyone knows how long a terminated history will stick around in kube state metrics that would be great to know as well)
I've included the configuration for kube state metrics here: https://gist.github.com/twosdai/12607c8459bdb73fc98edbbcb17b5eb5 in order to keep the post a bit more concise. The cluster is running in AWS EKS Version: 1.22

How to pass sensitive data to helm values file that is committed?

I am installing kube-prometheus-stack with Helm and I am adding some custome scraping configuration to Prometheus which requires authentication. I need to pass basic_auth with username and password in the values.yaml file.
The thing is that I need to commit the values.yaml file to a repo so I am wondering how can I have the username and password set on values file, maybe from a secret in Kubernetes or some other way?
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: myjob
scrape_interval: 20s
metrics_path: /metrics
static_configs:
- targets:
- myservice.default.svc.cluster.local:80
basic_auth:
username: prometheus
password: prom123456
Scrape config support specifying password_file parameter, so you can mount your own secret in volumes and volumemMounts:
Disclaimer, haven't tested it myself, not using a kube-prometheus-stack, but i guess something like this should work:
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: myjob
scrape_interval: 20s
metrics_path: /metrics
static_configs:
- targets:
- myservice.default.svc.cluster.local:80
basic_auth:
password_file: /etc/scrape_passwordfile
# Additional volumes on the output StatefulSet definition.
volumes:
- name: scrape_passwordfile
secret:
secretName: scrape_passwordfile
optional: false
# Additional VolumeMounts on the output StatefulSet definition.
volumeMounts:
- name: scrape_passwordfile
mountPath: "/etc/scrape_passwordfile"
Another option is to ditch additionalScrapeConfigs and use additionalScrapeConfigsSecretto store whole config inside secret
## If additional scrape configurations are already deployed in a single secret file you can use this section.
## Expected values are the secret name and key
## Cannot be used with additionalScrapeConfigs
additionalScrapeConfigsSecret: {}
# enabled: false
# name:
# key:

Why doesn't prometheus.exe execute?

I am currently working on scraping metrics from WebLogic server using weblogic monitoring exporter . I am trying to display these metrics using prometheus. My prometheus.yml file contents are:-
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'wls-exporter'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
metrics_path: '/wls-exporter/metrics'
static_configs:
- targets: ['localhost:7001']
basic_auth:
username: 'weblogic'
password: 'password1'
Now , whenever I execute prometheus.exe, nothing happens.
So what am I doing wrong in here?
PS:- I am on Windows 7
Based on your last log I suggest try running prometheus with 'storage.tsdb.no-lockfile=true'
I had several cases on win7 that data folder got corrupted and prometheus not starting. When I used the above flag I had no issues running prometheus on win7 or win10

Grafana template showing data only for remote and local hosts data but not for containers when drop down doesnot have IP showing

I have Prometheus with node-exporter, cdvisor and grafana on same instance.
I have other instances with node and cadvisor for collecting metrics to grafana.
Now I have created a grafana template that accepts the Instance name:
As We have 2 instance here : The template is showing following in drop down
ip address of second instance
Node-exporter incase of first instance
So when selecting the instance with IP it works great but incase of instance showing with name node-exporter its not working. It works if I manually pass code-advisor to the query .
Here is the query:
count(container_last_seen{instance=~"$server:.*",image!=""})
Here is the prometheus.yml file where all the targets are set as the
node-exporter runs in the same instance where prometheus is I have
used localhost there. Please check bellow
prometheus.yml
global:
scrape_interval: 5s
external_labels:
monitor: 'my-monitor'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'lab2'
static_configs:
- targets: ['52.32.2.X:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['52.32.2.X:8080','cadvisor:8080']
If I try to edit targets and add localhost instead of node-exporter it doesnot even show up in drop down than
The node selections is working well for the HOST metrics but not for the containers metrics.
NOTE: It is working for the containers whose IP is shown in drop down but not for host not showing ip

Unable to run haproxy_exporter on localhost:9101

I am using haproxy_exporter in prometheus which runs on default port 9101.
After configuring files i am not able to run this on default port.
Config file for haproxy:
frontend frontend
bind :1234
use_backend backend
backend backend
server server 0.0.0.0:9000
frontend monitoring
bind :1235
no log
stats uri /
stats enable
Config file for prometheus
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'codelab-monitor'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'production'
static_configs:
- targets: ['localhost:8080', 'localhost:8081']
labels:
group: 'production'
- job_name: 'canary'
static_configs:
- targets: ['localhost:8082']
labels:
group: 'canary'
- job_name: 'test'
static_configs:
- targets: ['localhost:9091']
- job_name: 'test1'
static_configs:
- targets: ['localhost:9091']
- job_name: 'test2'
static_configs:
- targets: ['localhost:9091']
- job_name: 'haproxy'
static_configs:
- targets: ['localhost:9188']
Please, can anyone help me out with this?
You should not setup stats on a frontend node, but on a listener (keyword listen, not frontend):
listen monitoring
mode http
bind *:1235
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /
stats auth username:password
I strongly recommend that you also use a username/password to access you stats.
Finally, you can scrape data from haproxy with haproxy_exporter with command:
haproxy_exporter -haproxy.scrape-uri="http://username:password#<haproxy-dns>:1235/?stats;csv"
If everything is fine with your setup, you should be able to query the haproxy exporter with this curl:
curl http://localhost:9101/metrics
And the output should contain:
haproxy_up 1
If the ouput is haproxy_up 0, then there is a communication issue between haproxy and the haproxy_exporter, double check the -haproxy.scrape-uri value.