Filtering through scraping with Grafana Loki - grafana

i am trying to get logs from a single namespace through promtail and scrape_configs, but i am not getting results. I am installing in k8s with
helm install loki grafana/loki-stack -n loki-test -f
~/loki-stack-values.yml
and the contents of my values file are:
loki:
enabled: true
promtail:
enabled: true
pipelineStages:
- cri: {}
- json:
expressions:
is_even: is_even
level: level
version: version
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: mongodb-test
# [...]
- job_name: kubernetes-pods-app
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: mongodb-test
grafana:
enabled: true
sidecar:
datasources:
enabled: true
image:
tag: 8.3.5
My expectation was that i will only get logs from the mongodb-test namespace, but i can view from any namespace present.
Also tried with drop, but it did not do anything.
What should i do here?
Thank you so much

Using match statement to drop other namespaces under pipeline stages worked for me. In your case,
config:
snippets:
pipelineStages:
- cri: {}
- match:
selector: '{namespace!~"mongodb-test"}'
action: drop
common:
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace

Related

Promtail + Loki - Only shows some namespaces not all

we recently decided to install loki and promtail via the loki-stack helm chart. Loki and promtail kind of work. We get some logs from Promtail and we can visualize them in grafana but our development namespace is nowhere to be found in loki. Promtail shows the development pod as an active target and promtail already collected the logs from the pod but we cant seem to get them into loki somehow... Any ideas?
tl;dr
set loki.monitoring.selfMonitoring.grafanaAgent.installOperator to false
This problem is caused by grafana-agent which is installed by default as a sub-chart of grafana/loki chart...
agent creates secret 'loki-logs-config' (loki in this case is Helm release name) which contains following configuration:
agent.yml: |+
logs:
configs:
- clients:
- external_labels:
cluster: loki
url: http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push
name: monitoring/loki
scrape_configs:
- job_name: podLogs/monitoring/loki
kubernetes_sd_configs:
- namespaces:
names:
- monitoring
role: pod
pipeline_stages:
- cri: {}
relabel_configs:
- source_labels:
- job
target_label: __tmp_prometheus_job_name
- action: keep
regex: loki
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_instance
- action: keep
regex: loki
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_name
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: service
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- replacement: monitoring/loki
target_label: job
- replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: __host__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
replacement: monitoring/$1
source_labels:
- __meta_kubernetes_pod_controller_name
target_label: job
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
replacement: loki
target_label: cluster
positions_directory: /var/lib/grafana-agent/data
server: {}
As you can see under kubernetes_sd_configs there is namespaces list with value of monitoring - I have no idea why is it there, but that's the namespace I've installed this chart into.
You won't see this secret after executing helm template - it seems that Grafana Agent creates it somehow after startup.
It has label app.kubernetes.io/managed-by=grafana-agent-operator
Pretty magical if you ask me...
The solution for me was disabling disabling installation of Grafana Agent:
loki:
loki:
commonConfig:
replication_factor: 1
storage:
type: 'filesystem'
auth_enabled: false
monitoring:
dashboards:
enabled: false
selfMonitoring:
enabled: true
grafanaAgent:
installOperator: false
lokiCanary:
enabled: false
Note: top loki element in the code block above is needed only if you add grafana/loki chart as subchart to your chart
IMO enabling beta feature (Grafana Agent is v0.30.0 today) in a Chart used as a reference in Loki's doc is insane :)

Missing Kube State Metrics in remote write with Prometheus

Hey I'm currently trying to determine uptime of a pod with kube state metrics, specifically when a pod has started or stopped. I am using a Prometheus Deployment with Kube State metrics in order to determine when a pod has been started and stopped.
Specifically I want to get the following metrics:
kube_pod_completion_time
kube_pod_created
As a test I've configured Prometheus to gather metrics with the following config.yml file:
global:
scrape_interval: 10m
scrape_timeout: 10s
evaluation_interval: 10m
scrape_configs:
- job_name: kubernetes-nodes-cadvisor
honor_timestamps: true
scrape_interval: 10m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
enable_http2: true
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
action: replace
metric_relabel_configs:
- source_labels: [__name__]
regex: '(container_cpu_usage_seconds_total|container_fs_reads_bytes_total|container_fs_writes_bytes_total|container_memory_max_usage_bytes|container_network_receive_bytes_total|container_network_transmit_bytes_total)'
action: keep
kubernetes_sd_configs:
- role: node
kubeconfig_file: ''
follow_redirects: true
enable_http2: true
- job_name: 'kube-state-metrics'
scrape_interval: 10m
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
metric_relabel_configs:
- source_labels: [__name__]
regex: '(kube_pod_labels|kube_pod_created|kube_pod_completion_time|kube_pod_container_resource_limits)'
action: keep
remote_write:
- url: http://example.com
remote_timeout: 30s
follow_redirects: true
enable_http2: true
oauth2:
token_url: https://example.com
client_id: myCoolID
client_secret: myCoolPassword
queue_config:
capacity: 2500
max_shards: 200
min_shards: 1
max_samples_per_send: 10
batch_send_deadline: 5s
min_backoff: 30ms
max_backoff: 5s
metadata_config:
send: false
Additionally I also have the following test pod deployment running:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: busy-box-test
spec:
replicas: 1
selector:
matchLabels:
app: busy-box-test
template:
metadata:
labels:
app: busy-box-test
spec:
containers:
- command:
- sleep
- '300'
image: busybox
name: test-box
However when I go to search for metrics regarding kube_pod_completion_time I cannot find any in my remote write source, while I do have all the other metrics specified in the regex. (kube_pod_labels|kube_pod_created ... kube_pod_container_resource_limits)
Additionally I've tried the following command to see if they are present in the cluster:
kubectl get --raw '/metrics' | grep kube_ and kubectl get --raw 'kube-state-metrics.kube-system.svc.cluster.local:8080' but I don't find anything definitive. I suspect the command is looking in the wrong location
So beyond if I am missing something obvious I missed I have the following open questions:
Is there an endpoint I should hit inside the cluster which should return the completion time? Is there an issue with the polling interval being once every 10 minutes for a pod that comes up and down every 5? (If anyone knows how long a terminated history will stick around in kube state metrics that would be great to know as well)
I've included the configuration for kube state metrics here: https://gist.github.com/twosdai/12607c8459bdb73fc98edbbcb17b5eb5 in order to keep the post a bit more concise. The cluster is running in AWS EKS Version: 1.22

How to scrape metrics from postgresql helm deployment with kube-prometheus-stack

I have seamlessly been using kube-prometheus-stack for monitoring.
after realizing there is metrics: section in values.yaml file, I wanted to enable metrics for both existing deployments;
postgresql (https://github.com/bitnami/charts/tree/master/bitnami/postgresql)
cassandra (https://github.com/bitnami/charts/tree/master/bitnami/cassandra)
for cassandra deployment
altering the value of key enabled: from false to true was sufficient. After upgrading helm release with new values, a sidecar container is created. I confirmed that I do see the metrics displayed at /metrics. I see targets are listed and being scraped in Prometheus.
metrics:
enabled: true
but for postgresql deployment
doing the same did not work; setting
metrics:
enabled: true
resulted in creation of a sidecar container, metrics are displayable at /metrics. But it is not listed nor being scraped in Prometheus.
so my question is: why is this same setting giving the desired result for cassandra deployment but not for postgresql? am I missing something and what else do I need to check?
Also, I don't need to enable serviceMonitor for these deployments(?) because prometheus can scrape pods based on annotations, right?
Any help is appreciated.
annotation config
additionalScrapeConfigs in values.yaml file of kube-prometheus-stack is edited to work with prometheus.io/* annotations. (ref: Monitor custom kubernetes pod metrics using Prometheus)
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_scrape ]
action: keep
regex: true
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_path ]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [ __address__, __meta_kubernetes_pod_annotation_prometheus_io_port ]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [ __meta_kubernetes_namespace ]
action: replace
target_label: kubernetes_namespace
- source_labels: [ __meta_kubernetes_pod_name ]
action: replace
target_label: kubernetes_pod_name
versions
kubectl version:
v1.22.6
chart versions:
kube-prometheus-stack-35.0.3
postgresql-11.1.28
cassandra-9.1.9

Changing Prometheus job label in scraper for cAdvisor breaks Grafana dashboards

I installed Prometheus on my Kubernetes cluster with Helm, using the community chart kube-prometheus-stack - and I get some beautiful dashboards in the bundled Grafana instance. I now wanted the recommender from the Vertical Pod Autoscaler to use Prometheus as a data source for historic metrics, as described here. Meaning, I had to make a change to the Prometheus scraper settings for cAdvisor, and this answer pointed me in the right direction, as after making that change I can now see the correct job tag on metrics from cAdvisor.
Unfortunately, now some of the charts in the Grafana dashboards are broken. It looks like it no longer picks up the CPU metrics - and instead just displays "No data" for the CPU-related charts.
So, I assume I have to tweak the charts to be able to pick up the metrics correctly again, but I don't see any obvious places to do this in Grafana?
Not sure if it is relevant for the question, but I am running my Kubernetes cluster on Azure Kubernetes Service (AKS).
This is the full values.yaml I supply to the Helm chart when installing Prometheus:
kubeControllerManager:
enabled: false
kubeScheduler:
enabled: false
kubeEtcd:
enabled: false
kubeProxy:
enabled: false
kubelet:
serviceMonitor:
# Diables the normal cAdvisor scraping, as we add it with the job name "kubernetes-cadvisor" under additionalScrapeConfigs
# The reason for doing this is to enable the VPA to use the metrics for the recommender
# https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/FAQ.md#how-can-i-use-prometheus-as-a-history-provider-for-the-vpa-recommender
cAdvisor: false
prometheus:
prometheusSpec:
retention: 15d
storageSpec:
volumeClaimTemplate:
spec:
# the azurefile storage class is created automatically on AKS
storageClassName: azurefile
accessModes: ["ReadWriteMany"]
resources:
requests:
storage: 50Gi
additionalScrapeConfigs:
- job_name: 'kubernetes-cadvisor'
scheme: https
metrics_path: /metrics/cadvisor
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
Kubernetes version: 1.21.2
kube-prometheus-stack version: 18.1.1
helm version: version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"dirty", GoVersion:"go1.16.5"}
Unfortunately, I don't have access to Azure AKS, so I've reproduced this issue on my GKE cluster. Below I'll provide some explanations that may help to resolve your problem.
First you can try to execute this node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate rule to see if it returns any result:
If it doesn't return any records, please read the following paragraphs.
Creating a scrape configuration for cAdvisor
Rather than creating a completely new scrape configuration for cadvisor, I would suggest using one that is generated by default when kubelet.serviceMonitor.cAdvisor: true, but with a few modifications such as changing the label to job=kubernetes-cadvisor.
In my example, the 'kubernetes-cadvisor' scrape configuration looks like this:
NOTE: I added this config under the additionalScrapeConfigs in the values.yaml file (the rest of the values.yaml file may be like yours).
- job_name: 'kubernetes-cadvisor'
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__metrics_path__]
separator: ;
regex: (.*)
target_label: metrics_path
replacement: $1
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
namespaces:
names:
- kube-system
Modifying Prometheus Rules
By default, Prometheus rules fetching data from cAdvisor use job="kubelet" in their PromQL expressions:
After changing job=kubelet to job=kubernetes-cadvisor, we also need to modify this label in the Prometheus rules:
NOTE: We just need to modify the rules that have metrics_path="/metrics/cadvisor (these are rules that retrieve data from cAdvisor).
$ kubectl get prometheusrules prom-1-kube-prometheus-sta-k8s.rules -o yaml
...
- name: k8s.rules
rules:
- expr: |-
sum by (cluster, namespace, pod, container) (
irate(container_cpu_usage_seconds_total{job="kubernetes-cadvisor", metrics_path="/metrics/cadvisor", image!=""}[5m])
) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (
1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
)
record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
...
here we have a few more rules to modify...
After modifying Prometheus rules and waiting some time, we can see if it works as expected. We can try to execute node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate as in the beginning.
Additionally, let's check out our Grafana to make sure it has started displaying our dashboards correctly:

Prometheus only scrapes one pod

I'm using Prometheus to scrape metrics from my pods. The application I'm interested in is replicated a couple of times with one service providing access. Prometheus uses this service to scrape the metrics. In my app the metrics are setup as follows:
import * as Prometheus from 'prom-client';
const httpRequestDurationMicroseconds = new Prometheus.Histogram({
name: 'transaction_amounts',
help: 'Amount',
labelNames: ['amount'],
buckets: [0, 5, 15, 50, 100, 200, 300, 400, 500, 10000],
});
const totalPayments = new Prometheus.Counter('transaction_totals', 'Total payments');
I'm using helm to install Prometheus and the scrape config looks like this:
prometheus.yml:
rule_files:
- /etc/config/rules
- /etc/config/alerts
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
- job_name: transactions
scrape_interval: 1s
static_configs:
- targets:
- transaction-metrics-service:3001
I can see the metrics inside prometheus, but it seems to be from just one pod. For example, in Prometheus, when I query for transaction_totals it gives:
I don't think that the instance label can uniquely identify my pods. What should I do to be able to query all pods?
Instead of using a static_config that scrapes just one host, try using kubernetes_sd_configs Kubernetes Service Discovery as provided by Prometheus.
Your config file would look something like this:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# only scrape when annotation prometheus.io/scrape: 'true' is set
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
and then add the annotation to your Kubernetes Deployment yaml config like this:
kind: Deployment
...
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "<< PORT OF YOUR CONTAINER >>"
You can see a full working example here.
add prometheus annotations to your service, since prom will only scrape a service that:
Exposes the exporter port
Has a prometheus.io/scrape: "true" annotation
Has a prometheus.io/port: "<exporter_port_here>" annotation
here is an official example
the scraped pod is probably prometheus itself