Best practice for send message to many applications in kafka [closed] - apache-kafka

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a topic with many partitions. I want many applications read all messages from this topic. Some applications read frequently and other at midnight.
I don't find any help, for this problem in stackoverflow or in book.
How i can implement that in kafka ?
Thanks

You'd need Kafka Consumers in order to let your applications consume messages from Kafka topics. In case each of your application needs to consume all messages, then you'd have to assign a distinct group-id to every application.
Kafka assigns the partitions of a topic to the consumer in a group. It also guarantees that a message is only ever read by a single consumer in the group. Partitions is the level of parallelism in Kafka (if you have 2 partitions then you can have up to 2 consumers in the same consumer group).
Example:
Say you have topic example-topic with 5 partitions and 2 applications each of which must consume all the messages from example-topic. You will need two distinct consumer groups (1 per application), say group.id=app-group-1 for the first app and group.id=app-group-2 for your second app. Within each consumer group, you can start at most 5 consumers consuming messages from your topic. Therefore, up to 5 consumers will subscribe to topic example-topic and belong to group.id=app-group-1 and another 5 consumers that will subscribe to topic example-topic and belong to group.id=app-group-2.

Related

How does Kafka Consumer ensures fairness in polling when subscribed to multiple topics with multiple partitions?

We use confluent-kafka. Currently we have a consumer which is subscribed to a single topic but with multiple partitions. Now we are looking into updating the consumer to subscribe to multiple topics, each with multiple partitions.
I wonder how will polling work in this case? How will the consumer ensure fairness among topics? I have read various sources but didn't get any concrete answers.
Also is there a way I can prioritize one topic over other?

Consumer doesn't get message from topics created after subscription [duplicate]

This question already has answers here:
kafka consumer to dynamically detect topics added
(4 answers)
Closed 1 year ago.
I have create some KafkaConsumer, next call subscribe on it (using Pattern), and next poll.
I noticed that if my consumer running, and later I create some new topic (matches to pattern) this consument will not consume this data! If I restart my app then consumer get data from newly created topics.
It is ok ? How can I solve it ?
The KafkaConsumer will do a refresh of meta data of its subscriptions based on the KafkaConsumer configuration metadata.max.age.ms which defaults to 5 minutes.
You could reduce this configuration to have your consumer consuming also newly created topics matching your pattern.

Why Kafka topic queue does not get empty when message(s) was taken by consumer? [duplicate]

This question already has answers here:
Delete message after consuming it in KAFKA
(5 answers)
Closed 2 years ago.
I'm learning Kafka and if someone could help me to understood one thing.
"Producer' send message to Kafka topic. It stays there some time (7 days by default, right?).
But "consumer" receives such message and there is not much sense to keep it there eternally.
I expected that these messages disappear when consumer gets them.
Otherwise, when I connect to Kafka again, I will download the same messages again. So I have to manage duplicate avoidance.
What's the logic behind it?
Regards
"Producer" send message to Kafka topic. It stays there some time (7 days by default, right?).
Yes, a Producer send the data to a Kafka topic. Each topic has its own configurable cleanup.policy. By default it is set to a retention period of 7 days. You can also configure the retention of the topic based on byte size.
But "consumer" receives such message and there is not much sense to keep it there eternally.
Kafka can be seen as a Publisher/Subscribe messaging system (although mainly being a streaming platform). It has the great benefit that more than one single Consumer can read the same messages of a topic. Compared to other messaging systems the data is not deleted after acknowledged by a consumer.
Otherwise, when I connect to Kafka again, I will download the same messages again. So I have to manage duplicate avoidance.
Kafka has the concept of "Offsets" and "ConsumerGroups" and I highly recommend to get familiar with them as they are essential when working with Kafka. Each consumer is part of a ConsumerGroup and each message in a topic has a unique identifer called "offset". An offset is like a unique identifer that stays with the same message for its life-time.
Each ConsumerGroup keeps track of the messages (offsets) that it already consumed. Now, if you do not want to read the same messages again your ConsumerGroup just have to commit those offsets and it will not read them again.
That way you will not consume duplicates, but still other consumers (with a differen ConsumerGroup) are able to read all messages again.

Kafka: Can a Topic Be Consumed by Multiple Independent Consumer Groups?

It's odd that I can't easily find this information online, but I could not. So, my question is that I have one Kafka topic. I want 2 consumer groups independently of each other consume messages from this topic. So, what I want is that both consumer groups to be able to see and consume all messages independent of each other.
Just to be even more clear, I don't want some messages to be consumed by one consumer group, and some messages to be consumed by the other group; no, I want the 2 groups consume messages as if the other group doesn't even exist.
Yes, separate consumer groups are completely independent, so they all see all messages.
Partitioning of resources (topics/partitions) only happen within groups.

How to integrate Storm and Kafka [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have worked in Storm and developed a basic program which is using a local text file as input source. But now I have to work on streaming data coming continuously from external systems. For this purpose, Kafka is the best choice.
The problem is how to make my Spout get streaming data from Kafka. Or how to integrate Storm with Kafka. How can I do that so I may process data, coming from Kafka?
Look for KafkaSpout.
It's a normal Storm Spout implementation that reads from a Kafka cluster. All you need is to configure that spout with parameters like list of brokers, topic name, etc. You can simply then chain the output to corresponding bolts for further processing.
From the same doc mentioned above, the configuration goes like this:
SpoutConfig spoutConfig = new SpoutConfig(
ImmutableList.of("kafkahost1", "kafkahost2"), // List of Kafka brokers
8, // Number of partitions per host
"clicks", // Topic to read from
"/kafkastorm", // The root path in Zookeeper for the spout to store the consumer offsets
"discovery"); // An id for this consumer for storing the consumer offsets in Zookeeper
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);