Prometheus alertmanager slack notification newlines issue - kubernetes

I've defined an alert for my kubernetes pods as described below to notify through slack.
I used the example described in the official documentation for ranging over all received alerts to loop over multiple alerts and render them on my slack channel
I do get notifications but the new lines do not get rendered correctly somehow.
I'm new to prometheus any help is greatly appreciated.
Thanks.
detection:
# Alert If:
# 1. Pod is not in a running state.
# 2. Container is killed because it's out of memory.
# 3. Container is evicted.
rules:
groups:
- name: not-running
rules:
- alert: PodNotRunning
expr: kube_pod_status_phase{phase!="Running"} > 0
for: 0m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is not running."
description: 'Kubernetes pod {{ $labels.pod }} is not running.'
- alert: KubernetesContainerOOMKilledOrEvicted
expr: kube_pod_container_status_last_terminated_reason{reason=~"OOMKilled|Evicted"} > 0
for: 0m
labels:
severity: warning
annotations:
summary: "kubernetes container killed/evicted (instance {{ $labels.instance }})"
description: "Container {{ $labels.container }} in pod {{ $labels.namespace }}/{{ $labels.pod }}
has been OOMKilled/Evicted."
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 3m
repeat_interval: 4h
receiver: slack-channel
routes:
- match:
alertname: PodNotRunning
- match:
alertname: KubernetesContainerOOMKilledOrEvicted
notifications:
receivers:
- name: slack-channel
slack_configs:
- channel: kube-alerts
title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
How it gets rendered on the actual slack channel:
Title: inst-1 down.\ninst-2 down.\ninst-3 down.\ninst-4 down.
Text: inst-1 down.\ninst-2 down.\ninst-3 down.\ninst-4 down
How I though it would render:
Title: inst-1 down.
Text: inst-1 down.
Title: inst-2 down.
Text: inst-2 down.
Title: inst-3 down.
Text: inst-3 down.
Title: inst-4 down.
Text: inst-4 down.

Use {{ "\n" }} instead of plain \n
example:
...
slack_configs:
- channel: kube-alerts
title: "{{ range .Alerts }}{{ .Annotations.summary }}{{ "\n" }}{{ end }}"
text: "{{ range .Alerts }}{{ .Annotations.description }}{{ "\n" }}{{ end }}"

Related

Helm chart not picking up correct values

I'm trying to assign static IPs for Load Balancers in GKE to services by storing them in the values.yaml file as:
ip:
sandbox:
service1: xxx.xxx.201.74
service2: xxx.xxx.80.114
dev:
service1: xxx.xxx.249.203
service2: xxx.xxx.197.77
test:
service1: xxx.xxx.123.212
service2: xxx.xxx.194.133
prod:
service1: xxx.xx.244.211
service2: xxx.xxx.207.177
All works fine till I want to deploy to prod and that will fail as:
Error: UPGRADE FAILED: template: chart-v1/templates/service2-service.yaml:24:28: executing "chart-v1/templates/service2-service.yaml" at <.Values.ip.prod.service2>: nil pointer evaluating interface {}.service2
helm.go:94: [debug] template: chart-v1/templates/service2-service.yaml:24:28: executing "chart-v1/templates/service2-service.yaml" at <.Values.ip.prod.service2>: nil pointer evaluating interface {}.service2
and the part for service2-service.yaml looks like:
apiVersion: v1
kind: Service
metadata:
annotations:
appName: {{ include "common.fullname" . }}
componentName: service2
labels:
io.kompose.service: service2
name: service2
spec:
ports:
- name: "{{ .Values.service.service2.ports.name }}"
port: {{ .Values.service.service2.ports.port }}
protocol: {{ .Values.service.service2.ports.protocol }}
targetPort: {{ .Values.service.service2.ports.port }}
type: LoadBalancer
{{ if eq .Values.target.deployment.namespace "sandbox" }}
loadBalancerIP: {{ .Values.ip.sandbox.service2 }}
{{ else if eq .Values.target.deployment.namespace "dev" }}
loadBalancerIP: {{ .Values.ip.dev.service2 }}
{{ else if eq .Values.target.deployment.namespace "test" }}
loadBalancerIP: {{ .Values.ip.test.service2 }}
{{ else if eq .Values.target.deployment.namespace "prod" }}
loadBalancerIP: {{ .Values.ip.prod.service2 }}
{{ else }}
{{ end }}
selector:
io.kompose.service: service2
status:
loadBalancer: {}
Any clue why is complaining that is nil (empty)?
it could be due to the function changing the context and defined in values.yaml
Normally with range, we can use the $ for global scope, appName: {{ include "common.fullname" $ }}
When tested the same template by keeping the static value of the appName it worked for me, so there is no issue with access from values.yaml unless nil is getting set at .Values.ip.prod.service2.
in other case as you mentioned {{ (.Values.ip.prod).service2 }} multiple level nesting will solve issue.

How to add a PersistentVolumeClaim to a deployment running GitLab AutoDevops?

What am I trying to achieve?
We are using a self-hosted GitLab instance and use GitLab AutoDevops to deploy our projects to a Kubernetes cluster. At the time of writing, we are only using one node within the cluster. For one of our projects it is important that the built application (i.e. the pod(s)) is able to access (read only) files stored on the Kubernetes cluster's node itself.
What have I tried?
Created a (hostPath) PersistentVolume (PV) on our cluster
Created a PersistentVolumeClaim (PVC) on our cluster (named "test-api-claim")
Now GitLab AutoDevops uses a default helm chart to deploy the applications. In order to modify it's behavior, I've added this chart to the project's repository (GitLab AutoDevops automatically uses the chart in a project's ./chart directory if found). So my line of thinking was to modify the chart so that the deployed pods use the PV and PVC which I created manually on the cluster.
Therefore I modified the deployment.yaml file that can be found here. As you can see in the following code-snippet, I have added the volumeMounts & volumes keys (not present in the default/original file). Scroll to the end of the snippet to see the added keys.
{{- if not .Values.application.initializeCommand -}}
apiVersion: {{ default "extensions/v1beta1" .Values.deploymentApiVersion }}
kind: Deployment
metadata:
name: {{ template "trackableappname" . }}
annotations:
{{ if .Values.gitlab.app }}app.gitlab.com/app: {{ .Values.gitlab.app | quote }}{{ end }}
{{ if .Values.gitlab.env }}app.gitlab.com/env: {{ .Values.gitlab.env | quote }}{{ end }}
labels:
app: {{ template "appname" . }}
track: "{{ .Values.application.track }}"
tier: "{{ .Values.application.tier }}"
chart: "{{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}"
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
{{- if or .Values.enableSelector (eq (default "extensions/v1beta1" .Values.deploymentApiVersion) "apps/v1") }}
selector:
matchLabels:
app: {{ template "appname" . }}
track: "{{ .Values.application.track }}"
tier: "{{ .Values.application.tier }}"
release: {{ .Release.Name }}
{{- end }}
replicas: {{ .Values.replicaCount }}
{{- if .Values.strategyType }}
strategy:
type: {{ .Values.strategyType | quote }}
{{- end }}
template:
metadata:
annotations:
checksum/application-secrets: "{{ .Values.application.secretChecksum }}"
{{ if .Values.gitlab.app }}app.gitlab.com/app: {{ .Values.gitlab.app | quote }}{{ end }}
{{ if .Values.gitlab.env }}app.gitlab.com/env: {{ .Values.gitlab.env | quote }}{{ end }}
{{- if .Values.podAnnotations }}
{{ toYaml .Values.podAnnotations | indent 8 }}
{{- end }}
labels:
app: {{ template "appname" . }}
track: "{{ .Values.application.track }}"
tier: "{{ .Values.application.tier }}"
release: {{ .Release.Name }}
spec:
imagePullSecrets:
{{ toYaml .Values.image.secrets | indent 10 }}
containers:
- name: {{ .Chart.Name }}
image: {{ template "imagename" . }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
{{- if .Values.application.secretName }}
envFrom:
- secretRef:
name: {{ .Values.application.secretName }}
{{- end }}
env:
{{- if .Values.postgresql.managed }}
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: app-postgres
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: app-postgres
key: password
- name: POSTGRES_HOST
valueFrom:
secretKeyRef:
name: app-postgres
key: privateIP
{{- end }}
- name: DATABASE_URL
value: {{ .Values.application.database_url | quote }}
- name: GITLAB_ENVIRONMENT_NAME
value: {{ .Values.gitlab.envName | quote }}
- name: GITLAB_ENVIRONMENT_URL
value: {{ .Values.gitlab.envURL | quote }}
ports:
- name: "{{ .Values.service.name }}"
containerPort: {{ .Values.service.internalPort }}
livenessProbe:
{{- if eq .Values.livenessProbe.probeType "httpGet" }}
httpGet:
path: {{ .Values.livenessProbe.path }}
scheme: {{ .Values.livenessProbe.scheme }}
port: {{ .Values.service.internalPort }}
{{- else if eq .Values.livenessProbe.probeType "tcpSocket" }}
tcpSocket:
port: {{ .Values.service.internalPort }}
{{- else if eq .Values.livenessProbe.probeType "exec" }}
exec:
command:
{{ toYaml .Values.livenessProbe.command | indent 14 }}
{{- end }}
initialDelaySeconds: {{ .Values.livenessProbe.initialDelaySeconds }}
timeoutSeconds: {{ .Values.livenessProbe.timeoutSeconds }}
readinessProbe:
{{- if eq .Values.readinessProbe.probeType "httpGet" }}
httpGet:
path: {{ .Values.readinessProbe.path }}
scheme: {{ .Values.readinessProbe.scheme }}
port: {{ .Values.service.internalPort }}
{{- else if eq .Values.readinessProbe.probeType "tcpSocket" }}
tcpSocket:
port: {{ .Values.service.internalPort }}
{{- else if eq .Values.readinessProbe.probeType "exec" }}
exec:
command:
{{ toYaml .Values.readinessProbe.command | indent 14 }}
{{- end }}
initialDelaySeconds: {{ .Values.readinessProbe.initialDelaySeconds }}
timeoutSeconds: {{ .Values.readinessProbe.timeoutSeconds }}
resources:
{{ toYaml .Values.resources | indent 12 }}
{{- end -}}
volumeMounts:
- mountPath: /data
name: test-pvc
volumes:
- name: test-pvc
persistentVolumeClaim:
claimName: test-api-claim
What is the problem?
Now when I trigger the Pipeline to deploy the application (using AutoDevops with my modified helm chart), I am getting this error:
Error: YAML parse error on auto-deploy-app/templates/deployment.yaml: error converting YAML to JSON: yaml: line 71: did not find expected key
Line 71 in the script refers to the valueFrom.secretKeyRef.name in the yaml:
- name: POSTGRES_HOST
valueFrom:
secretKeyRef:
name: app-postgres
key: privateIP
The weird thing is that when I delete the volumes and volumeMounts keys, it works as expected (and the valueFrom.secretKeyRef.name is still presented and causes no trouble..).
I am not using tabs in the yaml file and I double checked the indentation.
Two questions
Could there be something wrong with my yaml?
Does anyone know of another solution to achieve my desired behavior? (adding PVC to the deployment so that pods actually use it?)
General information
We use GitLab EE 13.12.11
For auto-deploy-image (which provides the helm chart I am referring to) we use version 1.0.7
Thanks in advance and have a nice day!
it seems that adding persistence is now supported in the default helm chart.
Check the pvc.yaml and deployment.yaml.
Given that, it should be enough to edit values in .gitlab/auto-deploy-values.yaml to meet your needs. Check defaults in values.yaml for more context.

Is it possible to avoid sending repeated Slack notifications for already fired alert?

Disclaimer: First time I use Prometheus.
I am trying to send a Slack notification every time a Job ends successfully.
To achieve this, I installed kube-state-metrics, Prometheus and AlertManager.
Then I created the following rule:
rules:
- alert: KubeJobCompleted
annotations:
identifier: '{{ $labels.instance }}'
summary: Job Completed Successfully
description: Job *{{ $labels.namespace }}/{{ $labels.job_name }}* is completed successfully.
expr: |
kube_job_spec_completions{job="kube-state-metrics"} - kube_job_status_succeeded{job="kube-state-metrics"} == 0
labels:
severity: information
And added the AlertManager receiver text (template) :
{{ define "custom_slack_message" }}
{{ range .Alerts }}
{{ .Annotations.description }}
{{ end }}
{{ end }}
My current result: Everytime a new job completes successfully, I receive a Slack notification with the list of all Job that completed successfully.
I don't mind receiving the whole list at first but after that I would like to receive notifications that contain only the newly completed job(s) in the specified group interval.
Is it possible?
Just add extra rule which will just display last completed job(s):
line: for: <10m> - which will list just lastly completed job(s) in 10 minutes:
rules:
- alert: KubeJobCompleted
annotations:
identifier: '{{ $labels.instance }}'
summary: Job Completed Successfully
description: Job *{{ $labels.namespace }}/{{ $labels.job_name }}* is completed successfully.
expr: |
kube_job_spec_completions{job="kube-state-metrics"} - kube_job_status_succeeded{job="kube-state-metrics"} == 0
for: 10m
labels:
severity: information
I ended up using kube_job_status_completion_time and time() to dismiss past events (avoid refiring event upon repeat time).
rules:
- alert: KubeJobCompleted
annotations:
identifier: '{{ $labels.instance }}'
summary: Job Completed Successfully
description: Job *{{ $labels.namespace }}/{{ $labels.job_name }}* is completed successfully.
expr: |
time() - kube_job_status_completion_time < 60 and kube_job_spec_completions{job="kube-state-metrics"} - kube_job_status_succeeded{job="kube-state-metrics"} == 0
labels:
severity: information

How to store the status of an expr in alert rules to use that in annotations?

I am setting up alerts for prometheus for whenever a node goes in "NotReady" my Kubernetes cluster. I get notified on Slack whenever that happens. The problem is I get notified with the same description "Node xxxx is in NotReady" even when it comes back up. I am trying to use a variable for the ready status of the node and use that in the annotations part.
I have tried using "vars" and "when" to assign it to a variable to use it in annotations.
- name: NodeNotReady
rules:
- alert: K8SNodeNotReadyAlert
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 3m
vars:
- ready_status: "Ready"
when: kube_node_status_condition{condition="Ready",status="true"} == 1
- ready_status: "Not Ready"
when: kube_node_status_condition{condition="Ready",status="true"} == 0
labels:
severity: warning
annotations:
description: Node {{ $labels.node }} status is in {{ ready_status }}.
summary: Node status {{ ready_status }} Alert!
I want to get these alerts :
1. When node is NotReady: "Node prom-node status is in NotReady."
2. When node is Ready: "Node prom-node status is in NotReady."
So the thing that you're looking for is over here. So you should end up with sth like this in description:
Node {{ $labels.node }} status is in {{ if eq $value 1 }} Ready {{ else }} Not Ready {{ end }} status.
Also worth to read this before making more alerts.

Custom alert for pod memory utilisation in Prometheus

I created alert rules for pod memory utilisation, in Prometheus. Alerts are showing perfectly on my slack channel, but it do not contain the name of the pod so that difficult to understand which pod is having the issue .
It is Just showing [FIRING:35] (POD_MEMORY_HIGH_UTILIZATION default/k8s warning). But when I look in to the "Alert" section in the Prometheus UI, I can see the fired rules with its pod name. Can anyone help?
My alert notification template is as follows:
alertname: TargetDown
alertname: POD_CPU_HIGH_UTILIZATION
alertname: POD_MEMORY_HIGH_UTILIZATION
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#devops'
title: '{{ .CommonAnnotations.summary }}'
text: '{{ .CommonAnnotations.description }}'
send_resolved: true
I have added the option title: '{{ .CommonAnnotations.summary }}' text: '{{ .CommonAnnotations.description }}' in my alert notification template and now it is showing the description. My description is description: pod {{$labels.pod}} is using high memory. But only showing is using high memory. Not specifying the pod name
As mentioned in the article, you should check the alert rules and update them if necessary. See an example:
ALERT ElasticacheCPUUtilisation
IF aws_elasticache_cpuutilization_average > 80
FOR 10m
LABELS { severity = "warning" }
ANNOTATIONS {
summary = "ElastiCache CPU Utilisation Alert",
description = "Elasticache CPU Usage has breach the threshold set (80%) on cluster id {{ $labels.cache_cluster_id }}, now at {{ $value }}%",
runbook = "https://mywiki.com/ElasticacheCPUUtilisation",
}
To provide external URL for your prometheus GUI, apply CLI argument to your prometheus server and restart it:
-web.external-url=http://externally-available-url:9090/
After that, you can put the values into your alertmanager configuration. See an example:
receivers:
- name: 'iw-team-slack'
slack_configs:
- channel: alert-events
send_resolved: true
api_url: https://hooks.slack.com/services/<your_token>
title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] Monitoring Event Notification'
text: >-
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}`
*Description:* {{ .Annotations.description }}
*Graph:* <{{ .GeneratorURL }}|:chart_with_upwards_trend:> *Runbook:* <{{ .Annotations.runbook }}|:spiral_note_pad:>
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}