kafka messages monitoring to show the actual messages published or consumed - apache-kafka

I have kafka installed on my local server, and through some other application running in the server produers are publishing messages to the brokers inside of my kafka server, through the zookeeper I can easily see the health of my kafka which shows all the topics created inside my kafka server a, offsets inside topics etc etc, so only thing zookeeper is not able to show is the messages that are inside the individual topics, so someone recommended kafka-manager tool, I installed and ran it, it worked fine, it showed lot of information from my kafka server, but still it was not able to show real messages that are published or consumed by respective consumers inside my kafka server, so my question is , is there a way/tool/code to find out the messages published or consumed, I mean in addition to this kafka-manager or I have Install some plugins inside of the same kafka-manager so that it will also show the respective messages.Thanks in advance!!

A Kafka broker cannot tell you how many messages are have been consumed for a given consumer on a given topic. The only thing that a Kafka broker knows about is the current log offset of the consumer, and the current max offset of the log. It cannot however, tell you how many messages before the current offset the consumer actually received, as it keeps no counters around this, and the consumer defines its own initial position (as well as being able to seek to various places in the log).
You can get both of these numbers using the $KAFKA_HOME/bin/kafka-consumer-offset-checker.sh script.

Related

Why my Kafka connect sink cluster only has one worker processing messages?

I've recently setup a local Kafka on my computer for testing and development purposes:
3 brokers
One input topic
Kafka connect sink between the topic and elastic search
I managed to configure it in standalone mode, so everything is localhost, and the Kafka connect was started using ./connect-standalone.sh script.
What I'm trying to do now is to run my connectors in distributed mode, so the Kafka messages can be separated into both workers.
I've started the two workers (still everything on the same machine), but when I send message to my Kafka topic, only one worker (the last started) is processing messages.
So my question is: Why only one worker is processing Kafka messages instead of both ?
When I kill one of the worker, the other one takes the message flow back, so I think the cluster is well setup.
What I think:
I don't put Keys inside my Kafka messages, can it be related to this ?
I'm running everything in localhost, does distributed mode can work this way ? (I've correctly configure specific unique field such as ret.port)
Resolved:
From Kafka documentation:
The division of work between tasks is shown by the partitions that each task is assigned
If you don't use partition (push all messages in same partition), workers won't be able to divide messages.
You don't need to use message keys, you can just push your messages to different partition in a cyclic way.
See: https://docs.confluent.io/current/connect/concepts.html#distributed-workers

Spring Cloud Stream Kafka Binder autoCommitOnError=false get unexpected behavior

I am using Spring Boot 2.1.1.RELEASE and Spring Cloud Greenwich.RC2, and the managed version for spring-cloud-stream-binder-kafka is 2.1.0RC4. The Kafka version is 1.1.0. I have set the following properties as the messages should not be consumed if there is an error.
spring.cloud.stream.bindings.input.group=consumer-gp-1
...
spring.cloud.stream.kafka.bindings.input.consumer.autoCommitOnError=false
spring.cloud.stream.kafka.bindings.input.consumer.enableDlq=false
spring.cloud.stream.bindings.input.consumer.max-attempts=3
spring.cloud.stream.bindings.input.consumer.back-off-initial-interval=1000
spring.cloud.stream.bindings.input.consumer.back-off-max-interval=3000
spring.cloud.stream.bindings.input.consumer.back-off-multiplier=2.0
....
There are 20 partitions in the Kafka topic and Kerberos is used for authentication (not sure if this is relevant).
The Kafka consumer is calling a web service for every message it processes, and if the web service is unavailable then I expect that the consumer will then try to process the message for 3 times before it moves on to the next message. So for my test, I disabled the webservice, and therefore none of the message could be processed correctly. From the logs I can see that this is happening.
After a while I stopped and then restarted the Kafka consumer (webservice is still disabled). I was expecting that after the restart of the Kafka consumer, it would attempt to process the messages that was not successfully processed the first time around. From the logs (I printed out each message with its fields) after the restart of the Kafka Consumer I couldn't see this happening. I thought the partition might be influencing something, but I check the logs and all 20 partitions were assigned to this single consumer.
Is there a property I have missed? I thought the expected behavior when I restart the consumer the second time, is that Kafka broker would pass the records that were not successfully processed to the consumer again.
Thanks
Parameters working as expected. See comment.

Kafka Topics not receiving messages and not being deleted

I'm having some troubles with kafka topics.
We have on our development environment kafka version 2.0,
and I've checked for topic deletion to be enabled.
Infact I've been able to delete some topics, but there are few that won't be deleted no matter what I do. Brokers are running fine by the way.
Not only that, but it seems that those topics don't receive any message from producers, even though those are correctly configured. We tried change the topic where to send messages and it was fine.
I didn't find anything regard these 2 problems, any of you know anything?

When does Kafka delete a topic?

I am very new to Kafka and I am dabbling about with it.
Say I have Kafka running on a Debian machine and I have managed to create a topic with a 100 messages on it.
After that initial burst of activity (i.e. placing a 100 messages onto the topic via some Kafka Producer) the Topic is just sat there idle with nothing happening (no consumers consuming and no producers producing)
I am aware of a Message Retention Policy setting, which I believe has a default value of 7 days. Let's say those 7 days pass, and the messages are indeed removed from the Topic, but what about the Topic itself?
Will Kafka eventually kill that Topic?
Also, what happens when I manually go and pull out the power cord for the machine that Kafka is running on? Will the Topic be discarded? Or will I still have my topic after I start up the machine, run ZooKeeper and create a Kafka Broker?
Any light on this matter would be appreciated.
Thank you
No, Kafka will keep the topic. It sounds like a bad idea that Kafka deletes topics by itself.
Before version 1.0.0 the topic deletion option (delete.topic.enable) was set to false by default. So it wasn't even possible to delete it without changing the config.
So the answer for you question would be Kafka never deletes topics.

Kafka Connect: Connectors Disappear when Worker shutsdown [duplicate]

I am facing the below issue on changing some properties related to kafka and re-starting the cluster.
In kafka Consumer, there were 5 consumer jobs are running .
If we make some important property change , and on restarting cluster some/all the existing consumer jobs are not able to start.
Ideally all the consumer jobs should start ,
since it will take the meta-data info from the below System-topics .
config.storage.topic
offset.storage.topic
status.storage.topic
First, a bit of background. Kafka stores all of its data in topics, but those topics (or rather the partitions that make up a topic) are append-only logs that would grow forever unless something is done. To prevent this, Kafka has the ability to clean up topics in two ways: retention and compaction. Topics configured to use retention will retain data for a configurable length of time: the broker is free to remove any log messages that are older than this. Topics configured to use compaction require every message have a key, and the broker will always retain the last known message for every distinct key. Compaction is extremely handy when each message (i.e., key/value pair) represents the last known state for the key; since consumers are reading the topic to get the last known state for each key, they will eventually get to that last state a bit faster if older states are removed.
Which cleanup policy a broker will use for a topic depends on several things. Every topic created implicitly or explicitly will use retention by default, though you can change a couple of ways:
change the globally log.cleanup.policy broker setting, affecting only topics created after that point; or
specify the cleanup.policy topic-specific setting when you create or modify a topic
Now, Kafka Connect uses several internal topics to store connector configurations, offsets, and status information. These internal topics must be compacted topics so that (at least) the last configuration, offset, and status for each connector are always available. Since Kafka Connect never uses older configurations, offsets, and status, it's actually a good thing for the broker to remove them from the internal topics.
Before Kafka 0.11.0.0, the recommended process is to manually create these internal topics using the correct topic-specific settings. You could rely upon the broker to auto-create them, but that is problematic for several reasons, not the least of which is that the three internal topics should have different numbers of partitions.
If these internal topics are not compacted, the configurations, offsets, and status info will be cleaned up and removed after the retention period has elapsed. By default this retention period is 24 hours! That means that if you restart Kafka Connect more than 24 hours after deploying / updating a connector configuration, that connector's configuration may have been purged and it will appear as if the connector configuration never existed.
So, if you didn't create these internal topics correctly, simply use the topic admin tool to update the topic's settings as described in the documentation.
BTW, not properly creating these internal topics is a very common problem, so much so that Kafka Connect 0.11.0.0 will be able to automatically create these internal topics using the correct settings without relying upon broker auto-creation of topics.
In 0.11.0 you will still have to rely upon manual creation or broker auto-creation for topics that source connectors write to. This is not ideal, and so there's a proposal to change Kafka Connect to automatically create the topics for the source connectors while giving the source connectors control over the settings. Hopefully that improvement makes it into 0.11.1.0 so that Kafka Connect is even easier to use.