How to monitor apache kafka using nagios? - apache-kafka

Is there a way to monitor my kafka cluster using nagios? any working plugin, api or whatever to check: broker status, partition status, memory status, current offset and all valuable metrics from my cluster?

We are using Nagios to monitor Kafka JMX metrics (we use JMXeval, but you can use any of your favorite JMX monitoring script for Nagios) where we can find many useful metrics like memory, lag, number of offline partition, and so on.
I can highly recommend you to read this article about Kafka monitoring, where you can find many useful tips what you can monitor - https://blog.serverdensity.com/how-to-monitor-kafka/
Because JMX is by default disabled, you need enable it first. You can follow instruction on Enable JMX on Kafka Brokers

Related

Kafka consumer lag monitoring visualization

I'm new to Kafka. During study to kafka, I think monitoring consumer's lag is needed. When I search from google and docs, I found few ways.
Kafka - Prometheus - graphana
kafka - burrow - someDB - graphana
kafka - burrow_stat?(I can't understand what it is..)
kafka - datadog
what I want to ask is
document says that burrow is for monitoring, can I visualize like graph(dashboard)?
without other tools like graphana or kibana or datadog??
I just trying to get less pipeline steps. What should be the simple way to visualize consumer's lag?
If you are doing the setup in an organisation, datadog or prometheus is probably the way to go. You can capture other Kafka related metrics as well. These agents also have integrations with many other tools beside Kafka and will be a good common choice for monitoring.
If you are just doing it for personal POC type of a project and you just want to view the lag, I find CMAK very useful (https://github.com/yahoo/CMAK). This does not have historical data, but provides a good current visual state of Kafka cluster including lag.
For cluster wide metrics you can use kafka_exporter (https://github.com/danielqsj/kafka_exporter) which exposes some very useful cluster metrics(including consumer lag) and is easy to integrate with prometheus and visualize using grafana.
Burrow is extremely effective and specialised in monitoring consumer lag.Burrow is good at caliberating consumer offset and more importantly validate if the lag is malicious or not. It has integrations with pagerduty so that the alerts are pushed to the necessary parties.
https://community.cloudera.com/t5/Community-Articles/Monitoring-Kafka-with-Burrow-Part-1/ta-p/245987
What burrow has:
Non-threshold based lag monitoring algorithm capable to evaluate potential slow downs.
Integration with pagerduty
Exporters for prometheus, AppD etc for historical metrics
Pluggable UI
If you are looking for quick solution you can deploy burrow followed by the burrow front end https://github.com/GeneralMills/BurrowUI

Is it possible to enable or disable the specific JMX Metrics to monitor for a Kafka cluster using Jconsole

On my Kafka cluster,I am able to view and monitor certain Mbean JMX metrics like RequestsPerSec etc. However I only see a very few of the metrics mentioned in the Apache Kafka documentation on my JConsole. Is there a way to enable others. Especially is there a way to enable a few specific ones explicitly.
Kafka has different metric sets on particular components, and some versions of Kafka have more/less metrics than others
The brokers, producers and consumers each have different JMX metrics
There's no way to disable/enable MBeans

Gathering `kafka.producer` metrics using JMX

I have a Kakfa broker running, which I am monitoring with JMX.
This broker is a docker container running as a process started with kafka-server-start.sh JMX port 9999 is exposed as and used as an environment variables.
When I connect to the JMX port and try to list all the domains, I get the following;
kafka
kafka.cluster
kafka.controller
kafka.coordinator.group
kafka.coordinator.transaction
kafka.log
kafka.network
kafka.server
kafka.utils
I dont see kafka.producer which is understandable because the producer for this Kafka broker are N numbers of different applications, but at this point I am confused.
How do I get the kafka.producer metrics as well.
Do I have to expose the kafka.producer metrics in each of N application that is acting as producer OR is there some configuration that start gathering kafka.producer metrics on the broker only.
What is the correct way of doing this. Please help.
Yes you are correct , to capture the producer JMX metrics , you need to enable JMX in all the processes which are running the kafka producer instance.
It might be helpful to rephrase producing as writing over an unreliable network in this context.
From this perspective, the most reasonable place to measure writing characteristics seems to be the client itself (i.e. in each "application" as you call it).
If messages between the producer and the broker are lost, you can still send stats to a local "metric store" for example (e.g. you could see a "spike" in record-retry-rate or some other relevant metric).
Additionally, pairing Kafka producer metrics with additional, local metrics might be extremely useful (JVM stats, detailed business metrics and so on). Keep in mind, that the client will almost definitely run on a different machine in a production environment, and might be affected by different factors, than the broker itself.
If you intend to monitor your client application (which will most likely happen anyway), then I'd simply do it there (i.e. the standard way).

Kafka network IO metrics

Using JMX server for monitoring Kafka metrics I want to get all network IO for each broker(node). Using MBeans kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec and kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec I can get network IO just when some data are produced to or consume from a broker, but we know there is some network IO between brokers for replication, metadata, connecting to Zookeeper and so on. In my Kafka cluster, each node network IO is about 6kb, while no data is consumed or produced. Is there any Kafka metrics to monitor network IO aside from data produced or consumed?
Under kafka.network:type=RequestMetrics,name=RequestsPerSec you find counters for all the request types including Fetch and FetchFollower issued even when there is no produce/consume traffic to the cluster.
You can check the produced or consumed rate either through enabling jmx at producer and consumer or at broker , both are possible.
In broker there are several metrics for n/w and request rates , for example
BrokerTopicMetrics.topic.{topic}.BytesInPerSec
BrokerTopicMetrics.topic.{topic}.BytesOutPerSec
You can check the jmx metrics exposed in the below kafka doc, although this is not exhaustive , if you want to see all the metrics ,you can enable the jmx on broker/producer/consumer and check through VisualVM or any other tool
https://docs.confluent.io/current/kafka/monitoring.html

Getting monitoring metrics from Kafka application

Beside some basic monitoring metrics like CPU , memory and network usage. Is there anyway that I can actually monitor the running Kafka application, such as number of messages in/out, stream throughput, stream size ...?
Thank you.
Kafka offers various metrics reporting in both the server and the client. See the Monitoring document for details.