How to pass sensitive data to helm values file that is committed? - kubernetes

I am installing kube-prometheus-stack with Helm and I am adding some custome scraping configuration to Prometheus which requires authentication. I need to pass basic_auth with username and password in the values.yaml file.
The thing is that I need to commit the values.yaml file to a repo so I am wondering how can I have the username and password set on values file, maybe from a secret in Kubernetes or some other way?
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: myjob
scrape_interval: 20s
metrics_path: /metrics
static_configs:
- targets:
- myservice.default.svc.cluster.local:80
basic_auth:
username: prometheus
password: prom123456

Scrape config support specifying password_file parameter, so you can mount your own secret in volumes and volumemMounts:
Disclaimer, haven't tested it myself, not using a kube-prometheus-stack, but i guess something like this should work:
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: myjob
scrape_interval: 20s
metrics_path: /metrics
static_configs:
- targets:
- myservice.default.svc.cluster.local:80
basic_auth:
password_file: /etc/scrape_passwordfile
# Additional volumes on the output StatefulSet definition.
volumes:
- name: scrape_passwordfile
secret:
secretName: scrape_passwordfile
optional: false
# Additional VolumeMounts on the output StatefulSet definition.
volumeMounts:
- name: scrape_passwordfile
mountPath: "/etc/scrape_passwordfile"
Another option is to ditch additionalScrapeConfigs and use additionalScrapeConfigsSecretto store whole config inside secret
## If additional scrape configurations are already deployed in a single secret file you can use this section.
## Expected values are the secret name and key
## Cannot be used with additionalScrapeConfigs
additionalScrapeConfigsSecret: {}
# enabled: false
# name:
# key:

Related

I created a serviceemointer using jsonexporter in Prometheus environment, but the metrics could not be verified. Is there a way to check the metric?

I am a beginner who is using Prometheus and Grapana to monitor the value of REST API.
Prometheus, json-exporrter, and grafana both used the Helm chart, Prometheus installed as default values.yaml, and json-exporter installed as custom values.yaml.
I checked that the prometheus set the service monitor of json-exporter as a target, but I couldn't check its metrics.
How can I check the metrics? Below is the environment , screenshots and code.
environment :
kubernetes : v1.22.9
helm : v3.9.2
prometheus-json-exporter helm chart : v0.5.0
kube-prometheus-stack helm chart : 0.58.0
screenshots :
https://drive.google.com/drive/folders/1vfjbidNpE2_yXfxdX8oX5eWh4-wAx7Ql?usp=sharing
values.yaml
in custom_jsonexporter_values.yaml
# Default values for prometheus-json-exporter.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
replicaCount: 1
image:
repository: quay.io/prometheuscommunity/json-exporter
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: ""
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
# Specifies whether a service account should be created
create: true
# Annotations to add to the service account
annotations: []
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name: ""
podAnnotations: []
podSecurityContext: {}
# fsGroup: 2000
# podLabels:
# Custom labels for the pod
securityContext: {}
# capabilities:
# drop:
# - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000
service:
type: ClusterIP
port: 7979
targetPort: http
name: http
serviceMonitor:
## If true, a ServiceMonitor CRD is created for a prometheus operator
## https://github.com/coreos/prometheus-operator
##
enabled: true
namespace: monitoring
scheme: http
# Default values that will be used for all ServiceMonitors created by `targets`
defaults:
additionalMetricsRelabels: {}
interval: 60s
labels:
release: prometheus
scrapeTimeout: 60s
targets:
- name : pi2
url: http://xxx.xxx.xxx.xxx:xxxx
labels: {} # Map of labels for ServiceMonitor. Overrides value set in `defaults`
interval: 60s # Scraping interval. Overrides value set in `defaults`
scrapeTimeout: 60s # Scrape timeout. Overrides value set in `defaults`
additionalMetricsRelabels: {} # Map of metric labels and values to add
ingress:
enabled: false
className: ""
annotations: []
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
- host: chart-example.local
paths:
- path: /
pathType: ImplementationSpecific
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80
# targetMemoryUtilizationPercentage: 80
nodeSelector: []
tolerations: []
affinity: []
configuration:
config: |
---
modules:
default:
metrics:
- name: used_storage_byte
path: '{ .used }'
help: used storage byte
values:
used : '{ .used }'
labels: {}
- name: free_storage_byte
path: '{ .free }'
help: free storage byte
labels: {}
values :
free : '{ .free }'
- name: total_storage_byte
path: '{ .total }'
help: total storage byte
labels: {}
values :
total : '{ .total }'
prometheusRule:
enabled: false
additionalLabels: {}
namespace: ""
rules: []
additionalVolumes: []
# - name: password-file
# secret:
# secretName: secret-name
additionalVolumeMounts: []
# - name: password-file
# mountPath: "/tmp/mysecret.txt"
# subPath: mysecret.txt
Firstly you can check the targets page on the Prometheus UI to see if a) your desired target is even defined and b) if the endpoint is reachable and being scraped.
However, you may need to troubleshoot a little if either of the above is not the case:
It is important to understand what is happening. You have deployed a Prometheus Operator to the cluster. If you have used the default values from the helm chart, you also deployed a Prometheus custom resource(CR). This instance is what is telling the Prometheus Operator how to ultimately configure the Prometheus running inside the pod. Certain things are static, like global metric relabeling for example, but most are dynamic, such as picking up new targets to actually scrape. Inside the Prometheus CR you will find options to specify serviceMonitorSelector and serviceMonitorNamespaceSelector (The behaviour is the same also for probes and podmonitors so I'm just going over it once). Assuming you leave the default set like serviceMonitorNamespaceSelector: {}, Prometheus Operator will look for ServiceMonitors in all namespaces on the cluster to which it has access via its serviceAccount. The serviceMonitorSelector field lets you specify a label and value combination that must be present on a serviceMonitor that must be present for it to be picked up. Once a or multiple serviceMonitors are found, that match the criteria in the selectors, Prometheus Operator adjusts the configuration in the actual Prometheus instance(tl;dr version) so you end up with proper scrape targets.
Step 1 for trouble shooting: Do your selectors match the labels and namespace of the serviceMonitors? Actually check those. The default on the prometheus operator helm chart expects a label release: prometheus-operator and in your config, you don't seem to add that to your json-exporter's serviceMonitor.
Step 2: The same behaviour as outline for how serviceMonitors are picked up, is happening in turn inside the serviceMonitor itself, so make sure that your service actually matches what is specced out in the serviceMonitor.
To deep dive further into the options you have and what the fields do, check the API documentation.

Deploy Traefik with ArgoCD and additional values file

I'm trying to install Traefik on a K8s cluster using ArgoCD to deploy the official Helm chart. But I also need it to us an additional "values.yml" file. When I try to specify in the Application yaml file what additional values file to use, it fails to file not found for it.
Here is what I'm using:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: argo-traefik-chart
namespace: argocd
spec:
project: default
source:
path: traefik
repoURL: https://github.com/traefik/traefik-helm-chart.git
targetRevision: HEAD
helm:
valueFiles:
- /traefik-values.yml
destination:
server: https://kubernetes.default.svc
namespace: 2195-leaf-dev-traefik
syncPolicy:
syncOptions:
- CreateNamespace=true
automated:
prune: true
selfHeal: true
Here is the traefik-value.yml file.
additionalArguments:
# Configure your CertificateResolver here...
#
# HTTP Challenge
# ---
# Generic Example:
# - --certificatesresolvers.generic.acme.email=your-email#example.com
# - --certificatesresolvers.generic.acme.caServer=https://acme-v02.api.letsencrypt.org/directory
# - --certificatesresolvers.generic.acme.httpChallenge.entryPoint=web
# - --certificatesresolvers.generic.acme.storage=/ssl-certs/acme-generic.json
#
# Prod / Staging Example:
# - --certificatesresolvers.staging.acme.email=your-email#example.com
# - --certificatesresolvers.staging.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory
# - --certificatesresolvers.staging.acme.httpChallenge.entryPoint=web
# - --certificatesresolvers.staging.acme.storage=/ssl-certs/acme-staging.json
# - --certificatesresolvers.production.acme.email=your-email#example.com
# - --certificatesresolvers.production.acme.caServer=https://acme-v02.api.letsencrypt.org/directory
# - --certificatesresolvers.production.acme.httpChallenge.entryPoint=web
# - --certificatesresolvers.production.acme.storage=/ssl-certs/acme-production.json
#
# DNS Challenge
# ---
# Cloudflare Example:
# - --certificatesresolvers.cloudflare.acme.dnschallenge.provider=cloudflare
# - --certificatesresolvers.cloudflare.acme.email=your-email#example.com
# - --certificatesresolvers.cloudflare.acme.dnschallenge.resolvers=1.1.1.1
# - --certificatesresolvers.cloudflare.acme.storage=/ssl-certs/acme-cloudflare.json
#
# Generic (replace with your DNS provider):
# - --certificatesresolvers.generic.acme.dnschallenge.provider=generic
# - --certificatesresolvers.generic.acme.email=your-email#example.com
# - --certificatesresolvers.generic.acme.storage=/ssl-certs/acme-generic.json
logs:
# Configure log settings here...
general:
level: DEBUG
ports:
# Configure your entrypoints here...
web:
# (optional) Permanent Redirect to HTTPS
redirectTo: websecure
websecure:
tls:
enabled: true
# (optional) Set a Default CertResolver
# certResolver: cloudflare
#env:
# Set your environment variables here...
#
# DNS Challenge Credentials
# ---
# Cloudflare Example:
# - name: CF_API_EMAIL
# valueFrom:
# secretKeyRef:
# key: email
# name: cloudflare-credentials
# - name: CF_API_KEY
# valueFrom:
# secretKeyRef:
# key: apiKey
# name: cloudflare-credentials
# Just to do it for now
envFrom:
- secretRef:
name: traefik-secrets
# Disable Dashboard
ingressRoute:
dashboard:
enabled: true
# Persistent Storage
persistence:
enabled: true
name: ssl-certs
size: 1Gi
path: /ssl-certs
# deployment:
# initContainers:
# # The "volume-permissions" init container is required if you run into permission issues.
# # Related issue: https://github.com/containous/traefik/issues/6972
# - name: volume-permissions
# image: busybox:1.31.1
# command: ["sh", "-c", "chmod -Rv 600 /ssl-certs/*"]
# volumeMounts:
# - name: ssl-certs
# mountPath: /ssl-certs
# Set Traefik as your default Ingress Controller, according to Kubernetes 1.19+ changes.
ingressClass:
enabled: true
isDefaultClass: true
The traefik-values.yml file is in the same sub-directory as this file. I fire this of with kubectl apply -f but when I got to look at it in the Argo GUI, it shows an error. I'll paste the entire thing below, but it looks like the important part is this:
` failed exit status 1: Error: open .traefik-values.yml: no such file or directory
It's putting a period before the name of the file. I tried different ways of specifying the file: .traefik-values.yml and ./treafik-values.yml. Those get translated to:
: Error: open .traefik/.traefik-values.yml: no such file or directory
When I do a helm install using the exact same traefik-values.yml file, I get exactly what I expect. And when I run the Argo without the alternate file, it deploys but with out the needed options of course.
Any ideas?
I assume this is because Argo will look for traefik-values.yml file in the repoURL (so, not in the location where Application file is), and it obviously doesn't exist there.
You can check more about this issue here. There you can also find a couple of proposed solutions. Some of them are:
a plugin to do a helm template with your values files
a custom CI pipeline to take the content of your values.yaml file and add it to Application manifest
putting values directly in Application manifest, skipping the values.yaml file altogether
having a chart that depends on a chart like here (I don't like this one as it is downloading twice from two different locations, plus this issue)
play around with kustomize
or wait for ArgoCD 2.5, it seems it will include a native solution out of the box, according to mentioned github issue

Auto-scrape realm metrics from Keycloak with Prometheus-Operator

I installed Keycloak using the bitnami/keycloak Helm chart (https://bitnami.com/stack/keycloak/helm).
As I'm also using Prometheus-Operator for monitoring I enabled the metrics endpoint and the service monitor:
keycloak:
...
metrics:
enabled: true
serviceMonitor:
enabled: true
namespace: monitoring
additionalLabels:
release: my-prom-operator-release
As I'm way more interested in actual realm metrics I installed the keycloak-metrics-spi provider (https://github.com/aerogear/keycloak-metrics-spi) by setting up an init container that downloads it to a shared volume.
keycloak:
...
extraVolumeMounts:
- name: providers
mountPath: /opt/bitnami/keycloak/providers
extraVolumes:
- name: providers
emptyDir: {}
...
initContainers:
- name: metrics-spi-provider
image: SOME_IMAGE_WITH_WGET_INSTALLED
imagePullPolicy: Always
command:
- sh
args:
- -c
- |
KEYCLOAK_METRICS_SPI_VERSION=2.5.2
wget --no-check-certificate -O /providers/keycloak-metrics-spi-${KEYCLOAK_METRICS_SPI_VERSION}.jar \
https://github.com/aerogear/keycloak-metrics-spi/releases/download/${KEYCLOAK_METRICS_SPI_VERSION}/keycloak-metrics-spi-${KEYCLOAK_METRICS_SPI_VERSION}.jar
chmod +x /providers/keycloak-metrics-spi-${KEYCLOAK_METRICS_SPI_VERSION}.jar
touch /providers/keycloak-metrics-spi-${KEYCLOAK_METRICS_SPI_VERSION}.jar.dodeploy
volumeMounts:
- name: providers
mountPath: /providers
The provider enables metrics endpoints on the regular public-facing http port instead of the http-management port, which is not great for me. But I can block external access to them in my reverse proxy.
What I'm missing is some kind of auto-scraping of those endpoints. Right now I created an additional template, that creates a new service monitor for each element of a predefined list in my chart:
values.yaml
keycloak:
...
metrics:
extraServiceMonitors:
- realmName: master
- realmName: my-realm
servicemonitor-metrics-spi.yaml
{{- range $serviceMonitor := .Values.keycloak.metrics.extraServiceMonitors }}
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: {{ $.Release.Name }}-spi-{{ $serviceMonitor.realmName }}
...
spec:
endpoints:
- port: http
path: /auth/realms/{{ $serviceMonitor.realmName }}/metrics
...
{{- end }}
Is there a better way of doing this? So that Prometheus can auto-detect all my realms and scrape their endpoints?
Thanks in advance!
As commented by #jan-garaj there is no need to query all the endpoints. All return the accumulated data of all realms. So it is enough to just scrape the endpoint of one realm (e.g. the master realm).
Thanks a lot!
It might help someone, the bitnami image so the helm chart already include the metrics-spi-provider. So do not need any further installation action but the metrics must be enabled in values.

How to use Helm to install Kibana with the Logtrail plugin enabled?

I'm new to Helm and Kubernetes and cannot figure out how to use helm install --name kibana --namespace logging stable/kibana with the Logtrail plugin enabled. I can see there's an option in the values.yaml file to enable plugins during installation but I cannot figure out how to set it.
I've tried this without success:
helm install --name kibana --namespace logging stable/kibana \
--set plugins.enabled=true,plugins.value=logtrail,0.1.30,https://github.com/sivasamyk/logtrail/releases/download/v0.1.30/logtrail-6.5.4-0.1.30.zip
Update:
As Ryan suggested, it's best to provide such complex settings via a custom values file. But as it turned out, the above mentioned settings are not the only ones that one would have to provide to get the Logtrail plugin working in Kibana. Some configuration for Logtrail must be set before doing the helm install. And here's how to set it. In your custom values file set the following:
extraConfigMapMounts:
- name: logtrail
configMap: logtrail
mountPath: /usr/share/kibana/plugins/logtrail/logtrail.json
subPath: logtrail.json
After that the full content of your custom values file should look similar to this:
image:
repository: "docker.elastic.co/kibana/kibana-oss"
tag: "6.5.4"
pullPolicy: "IfNotPresent"
commandline:
args: []
env: {}
# All Kibana configuration options are adjustable via env vars.
# To adjust a config option to an env var uppercase + replace `.` with `_`
# Ref: https://www.elastic.co/guide/en/kibana/current/settings.html
#
# ELASTICSEARCH_URL: http://elasticsearch-client:9200
# SERVER_PORT: 5601
# LOGGING_VERBOSE: "true"
# SERVER_DEFAULTROUTE: "/app/kibana"
files:
kibana.yml:
## Default Kibana configuration from kibana-docker.
server.name: kibana
server.host: "0"
elasticsearch.url: http://elasticsearch:9200
## Custom config properties below
## Ref: https://www.elastic.co/guide/en/kibana/current/settings.html
# server.port: 5601
# logging.verbose: "true"
# server.defaultRoute: "/app/kibana"
deployment:
annotations: {}
service:
type: ClusterIP
externalPort: 443
internalPort: 5601
# authProxyPort: 5602 To be used with authProxyEnabled and a proxy extraContainer
## External IP addresses of service
## Default: nil
##
# externalIPs:
# - 192.168.0.1
#
## LoadBalancer IP if service.type is LoadBalancer
## Default: nil
##
# loadBalancerIP: 10.2.2.2
annotations: {}
# Annotation example: setup ssl with aws cert when service.type is LoadBalancer
# service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-east-1:EXAMPLE_CERT
labels: {}
## Label example: show service URL in `kubectl cluster-info`
# kubernetes.io/cluster-service: "true"
## Limit load balancer source ips to list of CIDRs (where available)
# loadBalancerSourceRanges: []
ingress:
enabled: false
# hosts:
# - kibana.localhost.localdomain
# - localhost.localdomain/kibana
# annotations:
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
# tls:
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
serviceAccount:
# Specifies whether a service account should be created
create: false
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
# If set and create is false, the service account must be existing
name:
livenessProbe:
enabled: false
initialDelaySeconds: 30
timeoutSeconds: 10
readinessProbe:
enabled: false
initialDelaySeconds: 30
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 5
# Enable an authproxy. Specify container in extraContainers
authProxyEnabled: false
extraContainers: |
# - name: proxy
# image: quay.io/gambol99/keycloak-proxy:latest
# args:
# - --resource=uri=/*
# - --discovery-url=https://discovery-url
# - --client-id=client
# - --client-secret=secret
# - --listen=0.0.0.0:5602
# - --upstream-url=http://127.0.0.1:5601
# ports:
# - name: web
# containerPort: 9090
resources: {}
# limits:
# cpu: 100m
# memory: 300Mi
# requests:
# cpu: 100m
# memory: 300Mi
priorityClassName: ""
# Affinity for pod assignment
# Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
# affinity: {}
# Tolerations for pod assignment
# Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations: []
# Node labels for pod assignment
# Ref: https://kubernetes.io/docs/user-guide/node-selection/
nodeSelector: {}
podAnnotations: {}
replicaCount: 1
revisionHistoryLimit: 3
# To export a dashboard from a running Kibana 6.3.x use:
# curl --user <username>:<password> -XGET https://kibana.yourdomain.com:5601/api/kibana/dashboards/export?dashboard=<some-dashboard-uuid> > my-dashboard.json
# A dashboard is defined by a name and a string with the json payload or the download url
dashboardImport:
timeout: 60
xpackauth:
enabled: false
username: myuser
password: mypass
dashboards: {}
# k8s: https://raw.githubusercontent.com/monotek/kibana-dashboards/master/k8s-fluentd-elasticsearch.json
# List of plugins to install using initContainer
# NOTE : We notice that lower resource constraints given to the chart + plugins are likely not going to work well.
plugins:
# set to true to enable plugins installation
enabled: false
# set to true to remove all kibana plugins before installation
reset: false
# Use <plugin_name,version,url> to add/upgrade plugin
values:
- logtrail,0.1.30,https://github.com/sivasamyk/logtrail/releases/download/v0.1.30/logtrail-6.5.4-0.1.30.zip
# - elastalert-kibana-plugin,1.0.1,https://github.com/bitsensor/elastalert-kibana-plugin/releases/download/1.0.1/elastalert-kibana-plugin-1.0.1-6.4.2.zip
# - logtrail,0.1.30,https://github.com/sivasamyk/logtrail/releases/download/v0.1.30/logtrail-6.4.2-0.1.30.zip
# - other_plugin
persistentVolumeClaim:
# set to true to use pvc
enabled: false
# set to true to use you own pvc
existingClaim: false
annotations: {}
accessModes:
- ReadWriteOnce
size: "5Gi"
## If defined, storageClassName: <storageClass>
## If set to "-", storageClassName: "", which disables dynamic provisioning
## If undefined (the default) or set to null, no storageClassName spec is
## set, choosing the default provisioner. (gp2 on AWS, standard on
## GKE, AWS & OpenStack)
##
# storageClass: "-"
# default security context
securityContext:
enabled: false
allowPrivilegeEscalation: false
runAsUser: 1000
fsGroup: 2000
extraConfigMapMounts:
- name: logtrail
configMap: logtrail
mountPath: /usr/share/kibana/plugins/logtrail/logtrail.json
subPath: logtrail.json
And the last thing you should do is add this ConfigMap resource to Kubernetes:
apiVersion: v1
kind: ConfigMap
metadata:
name: logtrail
namespace: logging
data:
logtrail.json: |
{
"version" : 1,
"index_patterns" : [
{
"es": {
"default_index": "logstash-*"
},
"tail_interval_in_seconds": 10,
"es_index_time_offset_in_seconds": 0,
"display_timezone": "local",
"display_timestamp_format": "MMM DD HH:mm:ss",
"max_buckets": 500,
"default_time_range_in_days" : 0,
"max_hosts": 100,
"max_events_to_keep_in_viewer": 5000,
"fields" : {
"mapping" : {
"timestamp" : "#timestamp",
"hostname" : "kubernetes.host",
"program": "kubernetes.pod_name",
"message": "log"
},
"message_format": "{{{log}}}"
},
"color_mapping" : {
}
}]
}
After that you're ready to helm install with the values file specified via the -f flag.
Getting input with --set that matches to what the example in the values file has is a bit tricky. Following the example we want the values to be:
plugins:
enabled: true
values:
- logtrail,0.1.30,https://github.com/sivasamyk/logtrail/releases/download/v0.1.30/logtrail-6.4.2-0.1.30.zip
The plugin.values here is tricky because it is an array, which means you need to enclose with {}. And the relevant entry contains commas, which have to be escaped with backslash. To get it to match you can use:
helm install --name kibana --namespace logging stable/kibana --set plugins.enabled=true,plugins.values={"logtrail\,0.1.30\,https://github.com/sivasamyk/logtrail/releases/download/v0.1.30/logtrail-6.5.4-0.1.30.zip"}
If you add --dry-run --debug then you can see what the computed values are for any command you run, including with --set, so this can help check the match. This kind of value is easier to set with a custom values file referenced with -f as it avoids having to work out how the --set evaluates to values.

How to configure prometheus with alertmanager?

docker-compose.yml:
This is the docker-compose to run the prometheus, node-exporter and alert-manager service. All the services are running great. Even the health status in target menu of prometheus shows ok.
version: '2'
services:
prometheus:
image: prom/prometheus
privileged: true
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alertmanger/alert.rules:/alert.rules
command:
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- '9090:9090'
node-exporter:
image: prom/node-exporter
ports:
- '9100:9100'
alertmanager:
image: prom/alertmanager
privileged: true
volumes:
- ./alertmanager/alertmanager.yml:/alertmanager.yml
command:
- '--config.file=/alertmanager.yml'
ports:
- '9093:9093'
prometheus.yml
This is the prometheus config file with targets and alerts target sets. The alertmanager target url is working fine.
global:
scrape_interval: 5s
external_labels:
monitor: 'my-monitor'
# this is where I have simple alert rules
rule_files:
- ./alertmanager/alert.rules
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
alerting:
alertmanagers:
- static_configs:
- targets: ['some-ip:9093']
alert.rules:
Just a simple alert rules to show alert when service is down
ALERT service_down
IF up == 0
alertmanager.yml
This is to send the message on slack when alerting occurs.
global:
slack_api_url: 'https://api.slack.com/apps/A90S3Q753'
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
username: 'tara gurung'
channel: '#general'
api_url: 'https://hooks.slack.com/services/T52GRFN3F/B90NMV1U2/QKj1pZu3ZVY0QONyI5sfsdf'
Problems:
All the containers are working fine I am not able to figure out the exact problem.What am I really missing. Checking the alerts in prometheus shows.
Alerts
No alerting rules defined
Your ./alertmanager/alert.rules file is not included in your docker config, so it is not available in the container. You need to add it to the prometheus service:
prometheus:
image: prom/prometheus
privileged: true
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alertmanager/alert.rules:/alertmanager/alert.rules
command:
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- '9090:9090'
And probably give an absolute path inside prometheus.yml:
rule_files:
- "/alertmanager/alert.rules"
You also need to make sure you alerting rules are valid. Please see the prometheus docs for details and examples. You alert.rules file should look something like this:
groups:
- name: example
rules:
# Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 5m
Once you have multiple files, it may be better to add the entire directory as a volume rather than individual files.
If you need answers to this question see the explanation on this link
How to make alert rules visible on Prometheus User Interface?
Your alert rules inside the prometheus.yml should look like this
rule_files:
- "/etc/prometheus/alert.rules.yml"
You need to stop the alertmanager and prometheus containers and run this
docker run -d --name prometheus_ops -p 9191:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml -v $(pwd)/alert.rules.yml:/etc/prometheus/alert.rules.yml prom/prometheus
Verify if you can see the alert.rule config path : Prometheus container ID and go to cd /etc/prometheus
docker exec -it fa99f733f69b sh