Promtail + Loki - Only shows some namespaces not all - grafana

we recently decided to install loki and promtail via the loki-stack helm chart. Loki and promtail kind of work. We get some logs from Promtail and we can visualize them in grafana but our development namespace is nowhere to be found in loki. Promtail shows the development pod as an active target and promtail already collected the logs from the pod but we cant seem to get them into loki somehow... Any ideas?

tl;dr
set loki.monitoring.selfMonitoring.grafanaAgent.installOperator to false
This problem is caused by grafana-agent which is installed by default as a sub-chart of grafana/loki chart...
agent creates secret 'loki-logs-config' (loki in this case is Helm release name) which contains following configuration:
agent.yml: |+
logs:
configs:
- clients:
- external_labels:
cluster: loki
url: http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push
name: monitoring/loki
scrape_configs:
- job_name: podLogs/monitoring/loki
kubernetes_sd_configs:
- namespaces:
names:
- monitoring
role: pod
pipeline_stages:
- cri: {}
relabel_configs:
- source_labels:
- job
target_label: __tmp_prometheus_job_name
- action: keep
regex: loki
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_instance
- action: keep
regex: loki
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_name
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: service
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- replacement: monitoring/loki
target_label: job
- replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: __host__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
replacement: monitoring/$1
source_labels:
- __meta_kubernetes_pod_controller_name
target_label: job
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
replacement: loki
target_label: cluster
positions_directory: /var/lib/grafana-agent/data
server: {}
As you can see under kubernetes_sd_configs there is namespaces list with value of monitoring - I have no idea why is it there, but that's the namespace I've installed this chart into.
You won't see this secret after executing helm template - it seems that Grafana Agent creates it somehow after startup.
It has label app.kubernetes.io/managed-by=grafana-agent-operator
Pretty magical if you ask me...
The solution for me was disabling disabling installation of Grafana Agent:
loki:
loki:
commonConfig:
replication_factor: 1
storage:
type: 'filesystem'
auth_enabled: false
monitoring:
dashboards:
enabled: false
selfMonitoring:
enabled: true
grafanaAgent:
installOperator: false
lokiCanary:
enabled: false
Note: top loki element in the code block above is needed only if you add grafana/loki chart as subchart to your chart
IMO enabling beta feature (Grafana Agent is v0.30.0 today) in a Chart used as a reference in Loki's doc is insane :)

Related

Kubernetes and logs from localhost with promtail

I have kubernetes installed on a viral machine. I have put grafana,loki and promtail with helm charts on it.I want to take logs from /var/log/syslog on the virtual machine.
I tried
extraScrapeConfigs: |
- job_name: syslog
syslog:
listen_address: 0.0.0.0:1514
label_structured_data: yes
labels:
job: "syslog"
relabel_configs:
- source_labels: ["__syslog_connection_ip_address"]
target_label: "ip_address"
- source_labels: ["__syslog_message_severity"]
target_label: "severity"
- source_labels: ["__syslog_message_facility"]
target_label: "facility"
- source_labels: ["__syslog_message_hostname"]
target_label: "host"
as I added with rsyslog.conf on VM _#nodeip:1514
but nothing happens.Any idea how it can happen

Filtering through scraping with Grafana Loki

i am trying to get logs from a single namespace through promtail and scrape_configs, but i am not getting results. I am installing in k8s with
helm install loki grafana/loki-stack -n loki-test -f
~/loki-stack-values.yml
and the contents of my values file are:
loki:
enabled: true
promtail:
enabled: true
pipelineStages:
- cri: {}
- json:
expressions:
is_even: is_even
level: level
version: version
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: mongodb-test
# [...]
- job_name: kubernetes-pods-app
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: mongodb-test
grafana:
enabled: true
sidecar:
datasources:
enabled: true
image:
tag: 8.3.5
My expectation was that i will only get logs from the mongodb-test namespace, but i can view from any namespace present.
Also tried with drop, but it did not do anything.
What should i do here?
Thank you so much
Using match statement to drop other namespaces under pipeline stages worked for me. In your case,
config:
snippets:
pipelineStages:
- cri: {}
- match:
selector: '{namespace!~"mongodb-test"}'
action: drop
common:
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace

How to scrape metrics from postgresql helm deployment with kube-prometheus-stack

I have seamlessly been using kube-prometheus-stack for monitoring.
after realizing there is metrics: section in values.yaml file, I wanted to enable metrics for both existing deployments;
postgresql (https://github.com/bitnami/charts/tree/master/bitnami/postgresql)
cassandra (https://github.com/bitnami/charts/tree/master/bitnami/cassandra)
for cassandra deployment
altering the value of key enabled: from false to true was sufficient. After upgrading helm release with new values, a sidecar container is created. I confirmed that I do see the metrics displayed at /metrics. I see targets are listed and being scraped in Prometheus.
metrics:
enabled: true
but for postgresql deployment
doing the same did not work; setting
metrics:
enabled: true
resulted in creation of a sidecar container, metrics are displayable at /metrics. But it is not listed nor being scraped in Prometheus.
so my question is: why is this same setting giving the desired result for cassandra deployment but not for postgresql? am I missing something and what else do I need to check?
Also, I don't need to enable serviceMonitor for these deployments(?) because prometheus can scrape pods based on annotations, right?
Any help is appreciated.
annotation config
additionalScrapeConfigs in values.yaml file of kube-prometheus-stack is edited to work with prometheus.io/* annotations. (ref: Monitor custom kubernetes pod metrics using Prometheus)
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_scrape ]
action: keep
regex: true
- source_labels: [ __meta_kubernetes_pod_annotation_prometheus_io_path ]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [ __address__, __meta_kubernetes_pod_annotation_prometheus_io_port ]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [ __meta_kubernetes_namespace ]
action: replace
target_label: kubernetes_namespace
- source_labels: [ __meta_kubernetes_pod_name ]
action: replace
target_label: kubernetes_pod_name
versions
kubectl version:
v1.22.6
chart versions:
kube-prometheus-stack-35.0.3
postgresql-11.1.28
cassandra-9.1.9

Scape metrics from multiple containers in prometheus with Istio

Our application is deployed in the istio service mesh and we are trying to scrape metrics at container level using the prometheus.io annotations.
So we have enabled spring boot metrics in our application and we are able to fetch the metrics on the given path '/manage/prometheus'.
We have enabled Prometheus annotations in the deployment file of our application as follows:
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8080'
prometheus.io/path: '/manage/prometheus'
This works fine when there is a single container in the pod.
But for pods that have multiple containers, we are unable to scrape the metrics with the container port.
Following are the workarounds we tried:
Following the reference https://gist.github.com/bakins/5bf7d4e719f36c1c555d81134d8887eb we tried to add the relabel configs for scraping data at container level:
prometheus-config.yaml
scrape-configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: (.*)
- source_labels: [ __address__, __meta_kubernetes_pod_container_port_number]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
- action: drop
regex: Pending|Succeeded|Failed
source_labels:
- __meta_kubernetes_pod_phase
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
source_labels:
- __meta_kubernetes_pod_container_port_number
target_label: container_port
But after applying the above configuration we are getting the error as:
Get "http://10.x.x.x:8080/stats/prometheus": read tcp 10.y.y.y:45542->10.x.x.x:8080: read: connection reset by peer
So 10.x.x.x is the pod IP and 8080 is the container port, it is not able to scrape using the container port.
We tried the above configuration after removing the istio mesh i.e. by removing the istio sidecar from all the microservices pods and we could see container level metrics being scraped.
Istio’s proxy is somewhere blocking the metrics to be scraped at the container level.
Have anyone faced this similar issue?

Changing Prometheus job label in scraper for cAdvisor breaks Grafana dashboards

I installed Prometheus on my Kubernetes cluster with Helm, using the community chart kube-prometheus-stack - and I get some beautiful dashboards in the bundled Grafana instance. I now wanted the recommender from the Vertical Pod Autoscaler to use Prometheus as a data source for historic metrics, as described here. Meaning, I had to make a change to the Prometheus scraper settings for cAdvisor, and this answer pointed me in the right direction, as after making that change I can now see the correct job tag on metrics from cAdvisor.
Unfortunately, now some of the charts in the Grafana dashboards are broken. It looks like it no longer picks up the CPU metrics - and instead just displays "No data" for the CPU-related charts.
So, I assume I have to tweak the charts to be able to pick up the metrics correctly again, but I don't see any obvious places to do this in Grafana?
Not sure if it is relevant for the question, but I am running my Kubernetes cluster on Azure Kubernetes Service (AKS).
This is the full values.yaml I supply to the Helm chart when installing Prometheus:
kubeControllerManager:
enabled: false
kubeScheduler:
enabled: false
kubeEtcd:
enabled: false
kubeProxy:
enabled: false
kubelet:
serviceMonitor:
# Diables the normal cAdvisor scraping, as we add it with the job name "kubernetes-cadvisor" under additionalScrapeConfigs
# The reason for doing this is to enable the VPA to use the metrics for the recommender
# https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/FAQ.md#how-can-i-use-prometheus-as-a-history-provider-for-the-vpa-recommender
cAdvisor: false
prometheus:
prometheusSpec:
retention: 15d
storageSpec:
volumeClaimTemplate:
spec:
# the azurefile storage class is created automatically on AKS
storageClassName: azurefile
accessModes: ["ReadWriteMany"]
resources:
requests:
storage: 50Gi
additionalScrapeConfigs:
- job_name: 'kubernetes-cadvisor'
scheme: https
metrics_path: /metrics/cadvisor
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
Kubernetes version: 1.21.2
kube-prometheus-stack version: 18.1.1
helm version: version.BuildInfo{Version:"v3.6.3", GitCommit:"d506314abfb5d21419df8c7e7e68012379db2354", GitTreeState:"dirty", GoVersion:"go1.16.5"}
Unfortunately, I don't have access to Azure AKS, so I've reproduced this issue on my GKE cluster. Below I'll provide some explanations that may help to resolve your problem.
First you can try to execute this node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate rule to see if it returns any result:
If it doesn't return any records, please read the following paragraphs.
Creating a scrape configuration for cAdvisor
Rather than creating a completely new scrape configuration for cadvisor, I would suggest using one that is generated by default when kubelet.serviceMonitor.cAdvisor: true, but with a few modifications such as changing the label to job=kubernetes-cadvisor.
In my example, the 'kubernetes-cadvisor' scrape configuration looks like this:
NOTE: I added this config under the additionalScrapeConfigs in the values.yaml file (the rest of the values.yaml file may be like yours).
- job_name: 'kubernetes-cadvisor'
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__metrics_path__]
separator: ;
regex: (.*)
target_label: metrics_path
replacement: $1
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
modulus: 1
target_label: __tmp_hash
replacement: $1
action: hashmod
- source_labels: [__tmp_hash]
separator: ;
regex: "0"
replacement: $1
action: keep
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
namespaces:
names:
- kube-system
Modifying Prometheus Rules
By default, Prometheus rules fetching data from cAdvisor use job="kubelet" in their PromQL expressions:
After changing job=kubelet to job=kubernetes-cadvisor, we also need to modify this label in the Prometheus rules:
NOTE: We just need to modify the rules that have metrics_path="/metrics/cadvisor (these are rules that retrieve data from cAdvisor).
$ kubectl get prometheusrules prom-1-kube-prometheus-sta-k8s.rules -o yaml
...
- name: k8s.rules
rules:
- expr: |-
sum by (cluster, namespace, pod, container) (
irate(container_cpu_usage_seconds_total{job="kubernetes-cadvisor", metrics_path="/metrics/cadvisor", image!=""}[5m])
) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (
1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
)
record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
...
here we have a few more rules to modify...
After modifying Prometheus rules and waiting some time, we can see if it works as expected. We can try to execute node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate as in the beginning.
Additionally, let's check out our Grafana to make sure it has started displaying our dashboards correctly: