How Does Prometheus Scrape a Kafka Topic? - apache-kafka

I’m a network guy trying to build my first Kafka --> Prometheus --> Grafana pipeline. My Kafka broker has a topic which is being populated by an external producer. That’s great. But I can’t figure out how to configure my Prometheus server to scrape data from that topic as a Consumer.
I should also say that my Kafka node is running on my host Ubuntu machine (not in a Docker container). I also am running an instance of JMX Exporter when I run Kafka. Here’s how I start up Kafka on the Ubuntu command line:
KAFKA_OPTS="$KAFKA_OPTS -javaagent:/home/me/kafka_2.11-2.1.1/jmx_prometheus_javaagent-0.6.jar=7071:/home/Me/kafka_2.11-2.1.1/kafka-0-8-2.yml" \
./bin/kafka-server-start.sh config/server.properties &
Okay. My Prometheus (also a host process, not the Docker container version) can successfully pull a lot of metrics off of my Kafka. So I just need to figure out how to get Prometheus to read the messages within my topic. And I wonder is those messages are already visible? My topic is called “vflow.sflow,” and when I look at “scrapeable” metrics that is available on Kafka (TCP 7071), I do see these metrics:
From http://localhost:7071/metrics:
kafka_cluster_partition_replicascount{partition="0",topic="vflow.sflow",} 1.0
kafka_cluster_partition_insyncreplicascount{partition="0",topic="vflow.sflow",} 1.0
kafka_log_logendoffset{partition="0",topic="vflow.sflow",} 1.5357405E7
kafka_cluster_partition_laststableoffsetlag{partition="0",topic="vflow.sflow",} 0.0
kafka_log_numlogsegments{partition="0",topic="vflow.sflow",} 11.0
kafka_cluster_partition_underminisr{partition="0",topic="vflow.sflow",} 0.0
kafka_cluster_partition_underreplicated{partition="0",topic="vflow.sflow",} 0.0
kafka_log_size{partition="0",topic="vflow.sflow",} 1.147821017E10
kafka_log_logstartoffset{partition="0",topic="vflow.sflow",} 0.0
“Partition 0,” “Log Size,” “Log End Offset”… all those things look promising… I guess?
But please bear in mind that I’m completely new to the Kafka/JMX/Prometheus ecosystem. Question: do the above metrics describe my “vflow.sflow” topic? Can I use them to configure Prometheus to actually read the messages within the topic?
If so, can someone recommend a good tutorial for this? I’ve been playing around with my Prometheus YAML config files, but all I manage to do is crash the Prometheus process when I do so. Yes, I have been reading the large amount of online documentation and forum posts out there. Its a lot of information to digest, and its very, very easy to invest hours in documentation which proves to be a dead end.
Any advice for a newbie like me? General advice like “you’re on the right track, next look at X” or “you obviously don’t understand Y, spend more time looking at Z” will be def appreciated. Thanks!

When you add that argument from the Kafka container, it scrapes the MBeans of the JMX metrics, not any actual topic data, since Prometheus isn't a Kafka consumer
From that JMX information, you'd see metrics such as message rate and replica counts
If you'd like to read topic data, the Kafka Connect framework could be used, and there's a plugin for Influx, Mongo, and Elasticsearch, which are all good Grafana sources. I'm not sure if there's a direct Kafka to Prometheus importer, but I think it would require using the PushGateway

Related

Kafka consumer lag monitoring visualization

I'm new to Kafka. During study to kafka, I think monitoring consumer's lag is needed. When I search from google and docs, I found few ways.
Kafka - Prometheus - graphana
kafka - burrow - someDB - graphana
kafka - burrow_stat?(I can't understand what it is..)
kafka - datadog
what I want to ask is
document says that burrow is for monitoring, can I visualize like graph(dashboard)?
without other tools like graphana or kibana or datadog??
I just trying to get less pipeline steps. What should be the simple way to visualize consumer's lag?
If you are doing the setup in an organisation, datadog or prometheus is probably the way to go. You can capture other Kafka related metrics as well. These agents also have integrations with many other tools beside Kafka and will be a good common choice for monitoring.
If you are just doing it for personal POC type of a project and you just want to view the lag, I find CMAK very useful (https://github.com/yahoo/CMAK). This does not have historical data, but provides a good current visual state of Kafka cluster including lag.
For cluster wide metrics you can use kafka_exporter (https://github.com/danielqsj/kafka_exporter) which exposes some very useful cluster metrics(including consumer lag) and is easy to integrate with prometheus and visualize using grafana.
Burrow is extremely effective and specialised in monitoring consumer lag.Burrow is good at caliberating consumer offset and more importantly validate if the lag is malicious or not. It has integrations with pagerduty so that the alerts are pushed to the necessary parties.
https://community.cloudera.com/t5/Community-Articles/Monitoring-Kafka-with-Burrow-Part-1/ta-p/245987
What burrow has:
Non-threshold based lag monitoring algorithm capable to evaluate potential slow downs.
Integration with pagerduty
Exporters for prometheus, AppD etc for historical metrics
Pluggable UI
If you are looking for quick solution you can deploy burrow followed by the burrow front end https://github.com/GeneralMills/BurrowUI

Kafka Producer jmx metrics missing

We have Spring Boot applications deployed on OKD (The Origin Community Distribution of Kubernetes that powers Red Hat OpenShift). Without much tweaking by devops team, we got in prometheus scraped kafka consumer metrics from kubernetes-service-endpoints exporter job, as well as some producer metrics, but only for kafka connect api, not for standard kafka producer api. This is I guess a configuration for that job:
https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml
What is needed to change in scrape config in order to collect what's been missing?
This issue with micrometer is the source of the problem.
So, we could add jmx exporter, or wait for the issue resolution.

Ingest Streaming Data to Kafka via http

I am very new with Kafka and Streaming Data in general. What I am trying to do is to ingest data which is to be sent via http to kafka. My research has brought me to the confluent REST proxy but I can't get it to work.
What I currently have is kafka running with a single node and single broker with kafkamanager in docker containers.
Unfortunately I can't run the full confluent platform with docker since I don't have enough memory available on my machine.
In essence my question is: How to setup a development environment where data is ingested by kafka through http?
Any help is highly appreciated!
You don't need the "full Confluent Platform" (KSQL, Control Center, included)
Zookeeper, Kafka, the REST proxy, and optionally the Schema Registry, should all only take up-to 4 GB of RAM total. If you don't even have that, then you'll need to go buy more RAM.
Note that Zookeeper and Kafka do not need to be running on the same machines as the Schema Registry or REST proxy, so if you have multiple machines, then you can save some resources that way as well.
To run one Kafka broker, zookeeper and schema registry, 1Gb is usually enough (in dev).
If you do not want for some reason to use Confluent REST proxy, you can write your own. It's quite straightforward: "on request, parse your incoming JSON, validate data, construct your message (in Avro?) and produce it to Kafka".
In this article, you'll find some configuration to press Kafka and ZK on heap memory: https://medium.com/#saabeilin/kafka-hands-on-part-i-development-environment-fc1b70955152
Here you can read how to produce/consume messages with Python:
https://medium.com/#saabeilin/kafka-hands-on-part-ii-producing-and-consuming-messages-in-python-44d5416f582e
Hope these help!

Gathering `kafka.producer` metrics using JMX

I have a Kakfa broker running, which I am monitoring with JMX.
This broker is a docker container running as a process started with kafka-server-start.sh JMX port 9999 is exposed as and used as an environment variables.
When I connect to the JMX port and try to list all the domains, I get the following;
kafka
kafka.cluster
kafka.controller
kafka.coordinator.group
kafka.coordinator.transaction
kafka.log
kafka.network
kafka.server
kafka.utils
I dont see kafka.producer which is understandable because the producer for this Kafka broker are N numbers of different applications, but at this point I am confused.
How do I get the kafka.producer metrics as well.
Do I have to expose the kafka.producer metrics in each of N application that is acting as producer OR is there some configuration that start gathering kafka.producer metrics on the broker only.
What is the correct way of doing this. Please help.
Yes you are correct , to capture the producer JMX metrics , you need to enable JMX in all the processes which are running the kafka producer instance.
It might be helpful to rephrase producing as writing over an unreliable network in this context.
From this perspective, the most reasonable place to measure writing characteristics seems to be the client itself (i.e. in each "application" as you call it).
If messages between the producer and the broker are lost, you can still send stats to a local "metric store" for example (e.g. you could see a "spike" in record-retry-rate or some other relevant metric).
Additionally, pairing Kafka producer metrics with additional, local metrics might be extremely useful (JVM stats, detailed business metrics and so on). Keep in mind, that the client will almost definitely run on a different machine in a production environment, and might be affected by different factors, than the broker itself.
If you intend to monitor your client application (which will most likely happen anyway), then I'd simply do it there (i.e. the standard way).

Monitor Kafka Consumer Metrics with JMX

I try to export Kafka metrics per JMX to Prometheus and display them with Grafana, but I´m struggling to get the Consumer metrics (to be more precise this one:
kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+) )
Everytime I try to fetch this Mbean, it doesn´t even show up. I read all the time that I have to "look into the client", or "I´m looking in the broker metrics, but I need the consumer metrics", but nobody does explain how to do this, so I´m asking you guys if you could help me. Is there some kind of configuration, or special JMX Port to get Consumer metrics or something like that?
The pattern for my config file to look for MBeans:
- pattern : kafka.consumer<type=(.+), name=(.+), client-id=(.+)><>(Count|Value)
name: kafka_consumer_$1_$2
Labels:
clientId: "$3"
Also, i need to fetch the Metrics with JMX, because i dont have access to the Kafka server.
I´m using this project as an example: https://github.com/rama-nallamilli/kafka-prometheus-monitoring
The following two things are possible:
A. May be given client already disconnected from Kafka
B. May be this metric is not present on broker. It might be visible in the JVM application which is running the consumer code. I am not sure but here is how you can check:
Restart your consumer application with JMX enabled
Use visual vm to connect to the above jvm
It should show all the available JMX metrics.
If the metrics contain metrics of your choice then you were looking at wrong place (broker). If not then I am wrong.
I do not have exact configuration but 1 mistake that i can point out in your configuration is that, name can not be the matching pattern for consumer metrics.
Try to remove the pattern with this:
- pattern : kafka.consumer<type=(.+), client-id=(.+)><>(Count|Value)
For more reference you can check the
Apache kafka docs
I am also having the problem for creating a generic pattern for consumer and producer.
Will post here as soon as i get this figured out.
#xBoLLo