Using Elastic Search Filters in Grafana Alert Labels - grafana

I have configured Elastic Data Source for Grafana, and I am filtering out error count in logs kubernetes deployment. It works as expected except the labels in message template.
I want to print the value for kubernetes.deployment.name which I get from the elastic datasource.
It is showed in labels as follows
[ var='A' labels={kubernetes.deployment.name=api-controller} value=271 ], [ var='B' labels={kubernetes.deployment.name=api-controller} value=0 ]
But when I print it in the description it gives me
Following is the error message I am printing
Error Count for {{ $labels.kubernetes.deployment.name }} has crossed the threshold of 5000 errors in 15 minutes
Error Count for <no value> has crossed the threshold of 5000 errors in 15 minutes
Another way I tried was
{{ $labels["kubernetes.deployment.name"] }}
But it prints the whole expression as it is.
Error Count for {{ $labels["kubernetes.deployment.name"] }} has crossed the threshold of 5000 errors in 15 minutes

Try to use:
{{ index $labels "kubernetes.deployment.name" }}

Related

Error while checking for service account using Lookup Function

{{- if not (lookup "v1" "ServiceAccount" "{{.Release.Namespace}}" "{{ .Release.preinstall }}" ) }}
<< another service account >>
{{- end }}
While using lookup function for checking that if service account is already present it will not create another service account with same functionality as earlier they were created simultaneously but even after using the lookup function they are created simultaneously and if using the index function or providing the name as a variable $namespace:= {{.Release.Namespace}} and $service:= {{ .Release.preinstall"}}) } it is giving the error as nil pointer.
Can anyone please help with the use of lookup function and what is the error in using it

Filebeat : drop fields kubernetes again again

I m trying to remove some fields, I use filebeat 7.14 on Kubernetes
I tried as described in the doc
processors:
- drop_fields:
when:
contains
fields: ["host.os.name", "host.os.codename", "host.os.family"]
ignore_missing: false
container failed "ERROR instance/beat.go:989
Exiting: Failed to start crawler:
starting input failed: Error while initializing input:
missing or invalid condition
failed to initialize condition"
ignore_missing still messing
- drop_fields:
fields: ["host.os.name", "host.os.codename", "host.os.family"]
fields are still present
you don't seem to have a condition set under the when. take a look at https://www.elastic.co/guide/en/beats/filebeat/7.14/defining-processors.html#conditions and make sure you've got something for it to match

Visualize Jobber tasks on ELK (via Filebeat)

A Jobber Docker container (running periodic tasks) outputs on stdout, which is captured by Filebeat (with Docker containers autodiscovery flag on) and then sent to Logstash (within an ELK stack) or to Elasticsearch directly.
Now on Kibana, the document looks as such:
#timestamp Jan 20, 2020 # 20:15:07.752
...
agent.type filebeat
container.image.name jobber_jobber
...
message {
"job": {
"command":"curl http://my.service/run","name":"myperiodictask",
"status":"Good",
"time":"0 */5 * * * *"
},
"startTime":1579540500,
"stdout":"{\"startDate\":\"2020-01-20T16:35:00.000Z\",\"endDate\":\"2020-01-20T17:00:00.000Z\",\"zipped\":true,\"size\":3397}",
"succeeded":true,
"user":"jobberuser",
"version":"1.4"
}
...
Note: above 'message' field is a simple string reflecting a json object; above displayed format is for clearer readability.
My goal is to be able to request Elastic on the message fields, so I can filter by Jobber tasks for instance.
How can I make that happen ?
I know Filebeat uses plugins and the container tags to apply this or that filter: are there any for Jobber? If not, how to do this?
Even better would be to be able to exploit the fields of the Jobber task result (under the 'stdout' field)! Could you please direct me to ways to implement that?
Filebeat provides processors to handle such tasks.
Below's a configuration to handle the needs "Decode the json in the 'message' field", "Decode the json in the 'stdout' within" (both using the decode_json_fields processor), and other Jobber-related needs.
Note that given example filter the events going through Filebeat by a 'custom-tag' label given to the Docker container hosting the Jobber process. The docker.container.labels.custom-tag: jobber condition should be replaced according to your usecase.
filebeat.yml:
processors:
# === Jobber events processing ===
- if:
equals:
docker.container.labels.custom-tag: jobber
then:
# Drop Jobber events which are not job results
- drop_event:
when:
not:
regexp:
message: "{.*"
# Json-decode event's message part
- decode_json_fields:
when:
regexp:
message: "{.*"
fields: ["message"]
target: "jobbertask"
# Json-decode message's stdout part
- decode_json_fields:
when:
has_fields: ["jobbertask.stdout"]
fields: ["jobbertask.stdout"]
target: "jobbertask.result"
# Drop event's decoded fields
- drop_fields:
fields: ["message"]
- drop_fields:
when:
has_fields: ["jobbertask.stdout"]
fields: ["jobbertask.stdout"]
The decoded fields are placed in the "jobbertask" field. This is to avoid index-mapping collision on the root fields. Feel free to replace "jobbertask" by any other field name, keeping care of mapping collisions.
In my case, this works whether Filebeat addresses the events to Logstash or to Elasticsearch directly.

Prometheus statsd-exporter - how to tag status code in request duration metric (histogram)

I have setup statsd-exporter to scrape metric from gunicorn web server. My goal is to filter request duration metric only for successful request(non 5xx), however in statsd-exporter there is no way to tag status code in duration metric. Can anyone suggest a way to add status code in request duration metric or a way to filter only successful request duration in prometheus.
In particular I want to extract successful request duration hitogram from statsd-exporter to prometheus.
To export successful request duration histogram metrics from gunicorn web server to prometheus you would need to add this functionality in gunicorn sorcecode.
First take a look at the code that exports statsd metrics here.
You should see this peace of code:
status = resp.status
...
self.histogram("gunicorn.request.duration", duration_in_ms)
By changing the code to sth like this:
self.histogram("gunicorn.request.duration.%d" % status, duration_in_ms)
from this moment you will have metrics names exported with status codes like gunicorn_request_duration_200 or gunicorn_request_duration_404 etc.
You can also modify it a little bit and move status codes to label by adding a configuration like below to your statsd_exporter:
mappings:
- match: gunicorn.request.duration.*
name: "gunicorn_http_request_duration"
labels:
status: "$1"
job: "gunicorn_request_duration"
So your metrics will now look like this:
# HELP gunicorn_http_request_duration Metric autogenerated by statsd_exporter.
# TYPE gunicorn_http_request_duration summary
gunicorn_http_request_duration{job="gunicorn_request_duration",status="200",quantile="0.5"} 2.4610000000000002e-06
gunicorn_http_request_duration{job="gunicorn_request_duration",status="200",quantile="0.9"} 2.4610000000000002e-06
gunicorn_http_request_duration{job="gunicorn_request_duration",status="200",quantile="0.99"} 2.4610000000000002e-06
gunicorn_http_request_duration_sum{job="gunicorn_request_duration",status="200"} 2.4610000000000002e-06
gunicorn_http_request_duration_count{job="gunicorn_request_duration",status="200"} 1
gunicorn_http_request_duration{job="gunicorn_request_duration",status="404",quantile="0.5"} 3.056e-06
gunicorn_http_request_duration{job="gunicorn_request_duration",status="404",quantile="0.9"} 3.056e-06
gunicorn_http_request_duration{job="gunicorn_request_duration",status="404",quantile="0.99"} 3.056e-06
gunicorn_http_request_duration_sum{job="gunicorn_request_duration",status="404"} 3.056e-06
gunicorn_http_request_duration_count{job="gunicorn_request_duration",status="404"} 1
And now to query all metrics except these with 5xx status in prometheus you can run:
gunicorn_http_request_duration{status=~"[^5].*"}
Let me know if it was helpful.

How to write prometheus alert rules for mesos and HAProxy process down.?

I am working on a task where I need to configure and validate prometheus alertmanager.User should get alert when mesos process and HAProxy process is down, I tried to find alert rules for these on internet, but did not find proper. Can any one tell me how to write the alert rules for these. basically need condition clause.
This depends on how you are monitoring things. Let's use HAProxy as an example and say you are using the HAProxy Exporter (https://github.com/prometheus/haproxy_exporter) to monitor it. The HAProxy Exporter includes a metric named haproxy_up, which indicates whether it successfully scraped HAProxy (when Prometheus in turn scraped the exporter). If HAProxy couldn't be scraped, haproxy_up will have a value of 0 and you can alert on that. Let's say your HAProxy Exporter has a Prometheus job name of haproxy-exporter. You could then write an alerting rule like this:
ALERT HAProxyDown
IF haproxy_up{job="haproxy-exporter"} == 0
FOR 5m
LABELS {
severity = "page"
}
ANNOTATIONS {
summary = "HAProxy {{ $labels.instance }} down",
description = "HAProxy {{ $labels.instance }} could not be scraped."
}
This will send an alert if any HAProxy instance could not be scraped for more than 5 minutes.
If you wanted to know whether the exporter (instead of HAProxy itself) was down, you could instead use the expression up{job="haproxy-exporter"} == 0 to find any down HAProxy Exporter instances. Probably you'll want to check both actually.
I can't say much about Mesos and its exporter since I don't have any experience with them, but I imagine it would be something similar.
Also for export mesos metrics you should use mesos-exporter. https://github.com/prometheus-junkyard/mesos_exporter
https://hub.docker.com/r/prom/mesos-exporter/
It also has mesos_up metric. Your alert should be the same like HaProxy alert:
ALERT MesosMasterDown
IF mesos_up{job="mesos-master-exporter"} == 0
FOR 5m
LABELS {
severity = "page"
}
ANNOTATIONS {
summary = "Mesos master {{ $labels.instance }} down",
description = "Mesos master {{ $labels.instance }} could not be scraped."
}
ALERT MesosSlaveDown
IF mesos_up{job="mesos-slave-exporter"} == 0
FOR 5m
LABELS {
severity = "page"
}
ANNOTATIONS {
summary = "Mesos slave {{ $labels.instance }} down",
description = "Mesos slave {{ $labels.instance }} could not be scraped."
}