How can I make Grafana:Loki parse through HAProxy Logs and appropriately assign labels to specific line items in the log?

How can I make Grafana:Loki parse through HAProxy Logs and appropriately assign labels to specific line items in the log? - grafana

I'm trying to get Grafana:Loki & PromTail working in my environment. Our goal is to pull information from /var/logs/haproxy.log to track the traffic hitting each of our servers. Specifically, client IP addresses so we can graph it out over time. HAProxy has an exporter service that historically seems to work with Prometheus, however, we're unable to setup the exporter service due to specific security requirements on our end. Additionally, that involves a reboot that we do not want to do at the moment. So we've discovered Loki by Grafana that can pull the raw log, but it's up to us to design a proper regular expression configuration that pulls the information we want.
Long story long, I've managed to get Loki setup without much of an issue. Same with Promtail. However, I ran into an issue trying to configure Promtail to appropriately grab the information we want from our log files. I was able to find a regular expression that somebody else has written along with some labels, however, the labeling does not work appropriately on Grafana. So I'm kind of stuck. Below is the config file from Promtail with two stages: 1 to parse the data of the log and 2 to label said data.
I'm not sure this is the best way to approach this, but I'm stuck and don't know what to do. Is there a better way to grab the specific information I want from the HAProxy logs? Or is anyone able to help me make a regular expression/label for the information I want?
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
# This file is promtail A file that records log offsets , Every time you collect this file, it will be updated
# Even if the service goes down , The next restart will also start from the log offset recorded in this file
filename: /tmp/positions.yaml
clients:
- url: http://10.10.140.53:3100/loki/api/v1/push
scrape_configs:
# The following configuration section is similar to Prometheus Very similar.
- job_name: system
static_configs:
- targets:
- 10.10.140.53
labels:
# Static tags , On behalf of the job All logs under include at least these two tags
job: haproxy
# level, because haproxy The log itself is not well wrapped log level, therefore loki It doesn't work out
# If the normal log contains the mark of standard log registration, this label does not need to be set
level: info
__path__: /var/log/*log.1
pipeline_stages:
- match:
# This part deals with logs ,selector Selectors are used to filter log streams that meet the criteria
selector: '{job="haproxy"}'
stages:
- regex:
# RE2 Regular expression of format ,?P<XXX> Represents setting the matching part as a variable
expression: '^[^[]+\s+(\w+)\[(\d+)\]:([^:]+):(\d+)\s+\[([^\]]+)\]\s+[^\s]+\s+(\w+)\/(\w+)\s+(\d+)\/(\d+)\/(\d+)\/(\d+)\/(\d*)\s+(\d+)\s+(\d+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s(\d+)\/(\d+)\/(\d+)\/(\d+)\/(\d+)\s+(\d+)\/(\d+)\s+\{([^}]*)\}\s\{([^}]*)\}\s+\"([^"]+)\"$'
# Don't filter out my time and other records , With output Variable output
- output:
source: output
- match:
# Although I still use the same conditional selector here , But the log stream at this time has been processed once
# So the unwanted information no longer exists
selector: '{job="haproxy"}'
stages:
- regex:
expression: 'frontend:(?P<frontend>\S+) haproxy:(?P<haproxy>\S+) client:(?P<client>\S+) method:(?P<method>\S+) code:(?P<code>\S+) url:(?P<url>\S+) destination:(?P<destination>\S+)}?$'
- labels:
# Dynamic label generation
frontend:
method:
code:
destination:
Here's a log file example:
Dec 13 11:35:50 haproxy haproxy[8733]: frontend:app_https_frontend/haproxy/10.10.150.53:443 client:111.222.333.444:38034 GMT:13/Dec/2022:11:35:50 +0000 body:- request:GET /bower_components/modernizr/modernizr.js?v=3.30.0 HTTP/1.1

Related

Grafana dashboard variable from Loki logs

I'm a beginer for Grafana Loki, and now we have a running instance which works without issue (can see the log itself), now we want to define some variables and monitors them in the dashboard.
Below is one of our log froward from promtail->loki->grafana, belongs to job "mqtt_log",
we want to extract the "534654234" and the "1" from the log as two variable, and monitor in the dashboard.
2022-11-02 12:16:23 mqtt_log 2022-11-02 12:16:23,428 - AliyunMqtt - INFO - elevator/534654234/cabin/position/: b'{"Name":"Group.Elevators{EquipmentNumber=534654234}.Cabins:0.Position","Value":"{"Group":"1"}","Representation":"Live","TimeStamp":1667362583365}'
The problem is we don't know how to define the variables, anyone can share some comments, thanks.

You can't create dynamic (only hardcoded) dashboard variable from the parsed logs. You can do that only from existing labels.

Request URI too large for Grafana - kubernetes dashboard

We are running nearly 100 instances in Production for kubernetes cluster and using prometheus server to create Grafana dashboard. To monitor the disk usage , below query is used
(sum(node_filesystem_size_bytes{instance=~"$Instance"}) - sum(node_filesystem_free_bytes{instance=~"$Instance"})) / sum(node_filesystem_size_bytes{instance=~"$Instance"})
As Instance ip is getting replaced and we are using nearly 80 instances, I am getting error as "Request URI too large".Can someone help to fix this issue

You only need to specify the instances once and use the on matching operator to get their matching series:
(sum(node_filesystem_size_bytes{instance=~"$Instance"})
- on(instance) sum(node_filesystem_free_bytes))
/ on(instance) sum(node_filesystem_size_bytes)
Consider also adding a unifying label to your time series so you can do something like ...{instance_type="group-A"} instead of explicitly specifying instances.

Prometheus using multiple target

We need to monitor several of target with prometheus, when we have a short list of targets
it was not a problem to modify, however we need to add many targets (50-70 new targets) from diffrent clusters
My question if there is a more elegant way to achieve this
instead of using it like this
- job_name: blackbox-http # To get metrics about the exporter’s targets
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- http://clusterA
- https://clusterA
- http://clusterB
- http://clusterC
- http://clusterC
...
maybe to mount additional files for each cluster , I mean to provide a file with targets for clusterA only and new file for clusterB only etc, is it possible ?
And the same for jobs, mount each job from a file

When you have a growing or variable list of targets the best way of managing the job definition is to use SRV records instead of static_configs.
With SRV records you only need to define a dns_sd_config with only one target that will be resolved using a DNS query, then you don't need to change the configuration every time you add a new target only add it on the DNS record
An example from the documentation here adapted to your question:
- job_name: 'myjob'
metrics_path: /probe
params:
module: [http_2xx]
dns_sd_configs:
- names:
- 'telemetry.http.srv.example.org'
- 'telemetry.https.api.srv.example.org'
You can use an internal DNS service to generate those records, and if you have targets with http and https mixed you probably need to have two records because the SRV record defines the port to use.

How To Reduce Prometheus(Federation) Scrape Duration

I have a Prometheus federation with 2 prometheus' servers - one per Kubernetes cluster and a central to rule them all.
Over time the scrape durations increase. At some point, the scrape duration exceeds the timeout duration and then metrics get lost and alerts fire.
I’m trying to reduce the scrape duration by dropping metrics but this is an uphill battle and more like sisyphus then Prometheus.
Does anyone know a way to reduce the scrape time without losing metrics and without having to drop more and more as times progresses?
Thanks in advance!

Per Prometheus' documentation, these settings determine the global timeout and alerting rules evaluation frequency:
global:
# How frequently to scrape targets by default.
[ scrape_interval: <duration> | default = 1m ]
# How long until a scrape request times out.
[ scrape_timeout: <duration> | default = 10s ]
# How frequently to evaluate rules.
[ evaluation_interval: <duration> | default = 1m ]
...and for each scrape job the configuration allows setting job-specific values:
# The job name assigned to scraped metrics by default.
job_name: <job_name>
# How frequently to scrape targets from this job.
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]
# Per-scrape timeout when scraping this job.
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ]
Not knowing more about the number of targets and number of metrics per target...I can suggest to try to configure appropriate scrape_timeout per job and adjust the global evaluation_interval accordingly.
Another option, in combination with the suggestion above or on its own, can be to have prometheus instances dedicated on scraping non-overlapping set of targets. Thus, making it possible to scale prometheus and to have different evaluation_interval per set of targets. For example, longer scrape_timeout and less frequent evaluation_interval (higher value) for jobs that take longer so that they don't affect other jobs.
Also, check if an exporter isn't misbehaving by accumulating metrics over time instead of just providing current readings at the time of scraping - otherwise, the list of what's returned to prometheus will keep on growing over time.

It isn't recommended to build data replication on top of Prometheus federation, since it doesn't scale with the number of active time series as could be seen in the described case. It is better setting up data replication via Prometheus remote_write protocol. For example, add the following lines to Prometheus config in order to enable data replication to VictoriaMetrics remote storage located at the given url:
remote_write:
- url: http://victoriametrics-host:8428/api/v1/write
The following docs may be useful for further reading:
remote_write config docs
supported remote storage systems in Prometheus
remote_write tuning docs

Filebeat kafka output use filename as key

I want to use filebeat 5.4.0 to ship log to kafka. My logs are all docker container logs, in /var/lib/docker/containers/*/${container_name}.log, or soft link in /var/log/containers/${appname}-${container_name}.log.
I want to save all app logs to one topic in kafka. And my requirements are:
Make sure the log from the same container go to the same partition
in order.
The msg must contains the appname and the container_name where it comes out.
And I'm facing two problems.
How to get log from a soft link?
How to get the appname and container_name from the filename, and set to key of output.kafka?

Beats are supposed to be lightweight, if you want to do more filtering then that is what logstash is for. You can use filebeats+logstash+kafka. Before sending to kafka, use logstash's split filter.
Also you can use 'type' property in filebeats to map the log paths like below
...
paths:
"/var/log/container/${appname}-${container_name}"
document_type: log
output.kafka:
...
key:'%{[type]}'
...