I am newbie to Kafka, few days back from producer few topics are created ( automatically) i.e. with partition 1 , reflection fact -1 and ISR -1.
It worked fine , used to consume all topics messages fine.
Today i.e. after two days I ran my producer and consumer program and vice-versa too, but my consumer not able to consume/read message from the topic.
I checked all logs , no clue found what went wrong.
What is going wrong ?
Will the topics become stale after some time?
Is there any property value i need to check in kafka-server properties ?
Please help me.
Thank you.
~Shyam
There are several ways you can check the health of kafka cluster with the various tools provided.
Use the ConsumerOffsetChecker class provided to validate if there is any lag between the producer and consumer.
bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zkconnect <zk host/ip>:<zk port> --group <consumer group name>
Use the JMX metrics such as belowto verify if the messages are been produced at the cluster level and there are additional metrics.
kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}
Use the Console Consumer to validate if the messages are present on the topic
bin/kafka-console-consumer.sh --zookeeper <zk host/ip>:<zk port> --topic test --from-beginning
Verify the log.retention.XXX values in kafka configuration(server.properties file)
Additional JMX monitoring details and configurations are available in documentation link
The last point is a little complicated to explain but I will try. Look at the blog link on confluent.io,it talks about the producers buffering the message before sending them to broker in the section More Partitions May Require More Memory In the Client. Not sure if your problem is related.
Related
I'm writing a Go service that works with Kafka. I have a problems with bad commits when broker rebalances. I want to do an experiment forcing Kafka to rebalance and to see how the service behaves.
What I do:
running Kafka in Docker locally (broker, zookeeper, schema registry and control center)
created a topic with 2 partition
running producer that sends messages to both partitions
Then I'm running two consumers with the same groupID, after that I'm closing one. It seems to me that broker should start rebalancing this moment. Or no? Who's logs should I check for it?
You can check that by running the following commands:
bin/kafka-consumer-groups --bootstrap-server host:9092 --list
and to describe:
bin/kafka-consumer-groups --bootstrap-server host:9092 --describe --group foo
Full documentation could be found here: Kafka consumer group
Who's logs should I check for it?
The running consumer's log should be checked, depending on if the library you're using actually logs such information.
On a couple of my clusters I'm seeing a discrepancy between the list of topics returned by zookeeper as compared to the broker i.e the following commands return different (fewer in the case of the broker) results
kafka-topics.sh --zookeeper $zookeeper --list
kafka-topics.sh --bootstrap-server $broker --command-config $clientProperties --list
I've seen this behaviour with multiple client versions which leads me to assume that the issue is on the server side, but I have no idea what the root cause is or how to fix it.
It causes an issue for me because I'm using some code that uses the brokers for GET operations like listing topics, and zookeeper for SET operations (create/updating topics). If the broker doesn't return a topic in a listing, then the code path leads to a CREATE action against zookeeper and that will be rejected (it will fail). Unfortunately, I don't control the code so I can't apply a fix there.
Nonetheless, surely the list of topics in zookeeper should be identical to the list in the broker?
I'm using Kafka (Amazon MSK) version 2.2.1
Thanks for the suggestions in this post. This is the explanation and solution:
The command "kafka-topics.sh --zookeeper" and "kafka-topics.sh --bootstrap-server" return two different outputs because the latter takes into account the configured ACLs which, in this case, prevent access to the topic metadata. Hence, the command through zookeeper provides the full list of topics, whereas the command through the broker provides only the topics for which ACLs are not configured.
In order to ensure the second command works as expected, you need to explicitly add to the ACL list of the affected topics access to the "DESCRIBE" operation
(^^ kudos to AWS Support for figuring this out)
for consumer
bin/windows/kafka-console-consumer.bat --bootstrap-server localhost:9092 --top
irer
and for producer
bin/windows/kafka-console-producer.bat --broker-list localhost:9092 --topic test
because I think for producer, we can also only choose one broker, from this broker server, we can find the all the partition information and find the leader, so I don't think broker-list is really needed.
This is for historical reasons, in either case, the provided brokers are only used to bootstrap and discover the full cluster.
Before Kafka 0.9, the consumer was still using Zookeeper to bootstrap. At that time, the producer was already using --broker-list.
In 0.9, when the "new" consumer was added, the flag to specify the broker was named --bootstrap-server for good reason as it is exactly what it is. Since then, the tools have used different flag name even though they are the same thing.
This was annoying and finally in 2.5.0, released just a few weeks ago, all tools have been updated to use --bootstrap-server!
I have a Kafka cluster consisting on 3 servers all connected through Zookeeper. But when I delete a topic that has some information and create the topic again with the same name, the offset does not start from zero.
I tried restarting both Kafka and Zookeeper and deleting the topics directly from Zookeeper.
What I expect is to have a clean topic When I create it again.
I found the problem. A consumer was consuming from the topic and the topic was never actually deleted. I used this tool to have a GUI that allowed me to see the topics easily https://github.com/tchiotludo/kafkahq. Anyway, the consumers can be seen running this:
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
If I have multiple brokers, which broker should my producer use? Do I need to manually switch the broker to balance the load? Also why does the consumer only need a zookeeper endpoint instead of a broker endpoint?
quick example from tutorial:
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
which broker should my producer use? Do I need to manually switch the broker to balance the load?
Kafka runs on cluster, meaning set of nodes, so while producing anything you need to tell him the LIST of brokers that you've configured for your application, below is a small note taken from their documentation.
“metadata.broker.list” defines where the Producer can find a one or more Brokers to determine the Leader for each topic. This does not need to be the full set of Brokers in your cluster but should include at least two in case the first Broker is not available. No need to worry about figuring out which Broker is the leader for the topic (and partition), the Producer knows how to connect to the Broker and ask for the meta data then connect to the correct Broker.
Hope this clear some of your confusion
Also why does the consumer only need a zookeeper endpoint instead of a
broker endpoint
This is not technically correct, as there are two types of APIs available, High level and Low level consumer.
The high level consumer basically takes care of most of the thing like leader detection, threading issue, etc. but does not provide much control over messages which exactly the purpose of using the other alternatives Simple or Low level consumer, in which you will see that you need to provide the brokers, partition related details.
So Consumer need zookeeper end point only when you are going with the high level API, in case of using Simple you do need to provide other information
Kafka sets a single broker as the leader for each partition of each topic. The leader is responsible for handling both reads and writes to that partition. You cannot decide to read or write from a non-Leader broker.
So, what does it mean to provide a broker or list of brokers to the kafka-console-producer ? Well, the broker or brokers you provide on the command-line are just the first contact point for your producer. If the broker you list is not the leader for the topic/partition you need, your producer will get the current leader info (called "topic metadata" in kafka-speak) and reconnect to other brokers as necessary before sending writes. In fact, if your topic has multiple partitions it may even connect to several brokers in parallel (if the partition leaders are different brokers).
Second q: why does the consumer require a zookeeper list for connections instead of a broker list? The answer to that is that kafka consumers can operate in "groups" and zookeeper is used to coordinate those groups (how groups work is a larger issue, beyond the scope of this Q). Zookeeper also stores broker lists for topics, so the consumer can pull broker lists directly from zookeeper, making an additional --broker-list a bit redundant.
Kafka Producer API does not interact directly with Zookeeper. However, the High Level Consumer API connects to Zookeeper to fetch/update the partition offset information for each consumer. So, the consumer API would fail if it cannot connect to Zookeeper.
All above answers are correct in older versions of Kafka, but things have changed with arrival of Kafka 0.9.
Now there is no longer any direct interaction with zookeeper from either the producer or consumer. Another interesting things is with 0.9, Kafka has removed the dissimilarity between High-level and Low-level APIs, since both follows a uniformed consumer API.