Monitor Kafka Consumer Metrics with JMX

Monitor Kafka Consumer Metrics with JMX - apache-kafka

I try to export Kafka metrics per JMX to Prometheus and display them with Grafana, but I´m struggling to get the Consumer metrics (to be more precise this one:
kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+) )
Everytime I try to fetch this Mbean, it doesn´t even show up. I read all the time that I have to "look into the client", or "I´m looking in the broker metrics, but I need the consumer metrics", but nobody does explain how to do this, so I´m asking you guys if you could help me. Is there some kind of configuration, or special JMX Port to get Consumer metrics or something like that?
The pattern for my config file to look for MBeans:
- pattern : kafka.consumer<type=(.+), name=(.+), client-id=(.+)><>(Count|Value)
name: kafka_consumer_$1_$2
Labels:
clientId: "$3"
Also, i need to fetch the Metrics with JMX, because i dont have access to the Kafka server.
I´m using this project as an example: https://github.com/rama-nallamilli/kafka-prometheus-monitoring

The following two things are possible:
A. May be given client already disconnected from Kafka
B. May be this metric is not present on broker. It might be visible in the JVM application which is running the consumer code. I am not sure but here is how you can check:
Restart your consumer application with JMX enabled
Use visual vm to connect to the above jvm
It should show all the available JMX metrics.
If the metrics contain metrics of your choice then you were looking at wrong place (broker). If not then I am wrong.

I do not have exact configuration but 1 mistake that i can point out in your configuration is that, name can not be the matching pattern for consumer metrics.
Try to remove the pattern with this:
- pattern : kafka.consumer<type=(.+), client-id=(.+)><>(Count|Value)
For more reference you can check the
Apache kafka docs
I am also having the problem for creating a generic pattern for consumer and producer.
Will post here as soon as i get this figured out.
#xBoLLo

Related

Why are there kafka and brokers in the scdf trace shown in zipkin?

I have a spring cloud data flow environment created in kubernetes and a zipkin environment created as well. But when I look at the Dependencies in zipkin, I see that in addition to the application that exists in the stream, there is also a broker and kafka.
Is there anyone who can tell me why this is? And is there any way I can get broker and kafka to not show up.
It's like this image shows

That's because one of the brokers got resolved as being kafka (for example via special message headers) and the other didn't. It's either a bug in Sleuth or you're using an uninstrumented library.

How to add health check for topics in KafkaStreams api

I have a critical Kafka application that needs to be up and running all the time. The source topics are created by debezium kafka connect for mysql binlog. Unfortunately, many things can go wrong with this setup. A lot of times debezium connectors fail and need to be restarted, so does my apps then (because without throwing any exception it just hangs up and stops consuming). My manual way of testing and discovering the failure is checking kibana log, then consume the suspicious topic through terminal. I can mimic this in code but obviously no way the best practice. I wonder if there is the ability in KafkaStream api that allows me to do such health check, and check other parts of kafka cluster?
Another point that bothers me is if I can keep the stream alive and rejoin the topics when connectors are up again.

You can check the Kafka Streams State to see if it is rebalancing/running, which would indicate healthy operations. Although, if no data is getting into the Topology, I would assume there would be no errors happening, so you need to then lookup the health of your upstream dependencies.
Overall, sounds like you might want to invest some time into using monitoring tools like Consul or Sensu which can run local service health checks and send out alerts when services go down. Or at the very least Elasticseach alerting
As far as Kafka health checking goes, you can do that in several ways
Is the broker and zookeeper process running? (SSH to the node, check processes)
Is the broker and zookeeper ports open? (use Socket connection)
Are there important JMX metrics you can track? (Metricbeat)
Can you find an active Controller broker (use AdminClient#describeCluster)
Are there a required minimum number of brokers you would like to respond as part of the Controller metadata (which can be obtained from AdminClient)
Are the topics that you use having the proper configuration? (retention, min-isr, replication-factor, partition count, etc)? (again, use AdminClient)

How Does Prometheus Scrape a Kafka Topic?

I’m a network guy trying to build my first Kafka --> Prometheus --> Grafana pipeline. My Kafka broker has a topic which is being populated by an external producer. That’s great. But I can’t figure out how to configure my Prometheus server to scrape data from that topic as a Consumer.
I should also say that my Kafka node is running on my host Ubuntu machine (not in a Docker container). I also am running an instance of JMX Exporter when I run Kafka. Here’s how I start up Kafka on the Ubuntu command line:
KAFKA_OPTS="$KAFKA_OPTS -javaagent:/home/me/kafka_2.11-2.1.1/jmx_prometheus_javaagent-0.6.jar=7071:/home/Me/kafka_2.11-2.1.1/kafka-0-8-2.yml" \
./bin/kafka-server-start.sh config/server.properties &
Okay. My Prometheus (also a host process, not the Docker container version) can successfully pull a lot of metrics off of my Kafka. So I just need to figure out how to get Prometheus to read the messages within my topic. And I wonder is those messages are already visible? My topic is called “vflow.sflow,” and when I look at “scrapeable” metrics that is available on Kafka (TCP 7071), I do see these metrics:
From http://localhost:7071/metrics:
kafka_cluster_partition_replicascount{partition="0",topic="vflow.sflow",} 1.0
kafka_cluster_partition_insyncreplicascount{partition="0",topic="vflow.sflow",} 1.0
kafka_log_logendoffset{partition="0",topic="vflow.sflow",} 1.5357405E7
kafka_cluster_partition_laststableoffsetlag{partition="0",topic="vflow.sflow",} 0.0
kafka_log_numlogsegments{partition="0",topic="vflow.sflow",} 11.0
kafka_cluster_partition_underminisr{partition="0",topic="vflow.sflow",} 0.0
kafka_cluster_partition_underreplicated{partition="0",topic="vflow.sflow",} 0.0
kafka_log_size{partition="0",topic="vflow.sflow",} 1.147821017E10
kafka_log_logstartoffset{partition="0",topic="vflow.sflow",} 0.0
“Partition 0,” “Log Size,” “Log End Offset”… all those things look promising… I guess?
But please bear in mind that I’m completely new to the Kafka/JMX/Prometheus ecosystem. Question: do the above metrics describe my “vflow.sflow” topic? Can I use them to configure Prometheus to actually read the messages within the topic?
If so, can someone recommend a good tutorial for this? I’ve been playing around with my Prometheus YAML config files, but all I manage to do is crash the Prometheus process when I do so. Yes, I have been reading the large amount of online documentation and forum posts out there. Its a lot of information to digest, and its very, very easy to invest hours in documentation which proves to be a dead end.
Any advice for a newbie like me? General advice like “you’re on the right track, next look at X” or “you obviously don’t understand Y, spend more time looking at Z” will be def appreciated. Thanks!

When you add that argument from the Kafka container, it scrapes the MBeans of the JMX metrics, not any actual topic data, since Prometheus isn't a Kafka consumer
From that JMX information, you'd see metrics such as message rate and replica counts
If you'd like to read topic data, the Kafka Connect framework could be used, and there's a plugin for Influx, Mongo, and Elasticsearch, which are all good Grafana sources. I'm not sure if there's a direct Kafka to Prometheus importer, but I think it would require using the PushGateway

Kafka Connect - metrics through JMX

I am using confluent HDFS sink connector and would like to know how to get consumer properties to expose through either JMX or REST API.
I checked the following two properties, however, I don't know how to expose metrics to jmx port
connect-standalone.properties
consumer.properties

Set JMX_PORT when you launch Kafka Connect. e.g.
export JMX_PORT=4242
./bin/connect-distributed ./etc/kafka/connect-distributed.properties
You can then connect to JMX using JConsole, JMXTerm, etc.

Had the same issue a few days back - this stackoverflow link has one way how the issue was resolved. It was able to expose the metrics using jmx exporter, which got scraped by Prometheus.

Gathering `kafka.producer` metrics using JMX

I have a Kakfa broker running, which I am monitoring with JMX.
This broker is a docker container running as a process started with kafka-server-start.sh JMX port 9999 is exposed as and used as an environment variables.
When I connect to the JMX port and try to list all the domains, I get the following;
kafka
kafka.cluster
kafka.controller
kafka.coordinator.group
kafka.coordinator.transaction
kafka.log
kafka.network
kafka.server
kafka.utils
I dont see kafka.producer which is understandable because the producer for this Kafka broker are N numbers of different applications, but at this point I am confused.
How do I get the kafka.producer metrics as well.
Do I have to expose the kafka.producer metrics in each of N application that is acting as producer OR is there some configuration that start gathering kafka.producer metrics on the broker only.
What is the correct way of doing this. Please help.

Yes you are correct , to capture the producer JMX metrics , you need to enable JMX in all the processes which are running the kafka producer instance.

It might be helpful to rephrase producing as writing over an unreliable network in this context.
From this perspective, the most reasonable place to measure writing characteristics seems to be the client itself (i.e. in each "application" as you call it).
If messages between the producer and the broker are lost, you can still send stats to a local "metric store" for example (e.g. you could see a "spike" in record-retry-rate or some other relevant metric).
Additionally, pairing Kafka producer metrics with additional, local metrics might be extremely useful (JVM stats, detailed business metrics and so on). Keep in mind, that the client will almost definitely run on a different machine in a production environment, and might be affected by different factors, than the broker itself.
If you intend to monitor your client application (which will most likely happen anyway), then I'd simply do it there (i.e. the standard way).