How to achieve leadership notion using consumer group of kafka? - apache-kafka

Requirement:
Module1 publish data and module2 consumes it.
Here I can have multiple instances of module2 in which one node should act as a leader and consume the data from the topic and process it and add it to its inmemory. This node has the responsibility to replicate its inmemory with the rest of the module2 instances which acts as a passive node. One of the requirement here is the processing order should remain same.
How to design this in Kakfa?
My thoughts are Module1 publish to sample_topic (having single partition) and each instance of module2 will use the consumer group name and subscribe to sample_topic. Since any instance of the same consumer group can receive a message the concept of leader is not available.
Is there any way to achieve the leadership concept? similar to how brokers work in kafka.

As you pointed out in the question - this is not the default behavior of a consumer group.
A consumer group will distribute the partitions across each consumer and you will not receive the same messages.
What you seem to need is a way to manage global state i.e. you want all consumers to be aware of and have reference to the same data.
There might be a way to hack this with the consumer API - but what I would suggest you look into is the Kafka Streams API.
More specifically, within the Kafka Streams API there is an interface called GlobalKTable:
A KTable distributes the data between all running Kafka Streams
instances, while a GlobalKTable has a full copy of all data on each
instance.
You can also just have each consumer subscribe to the same topic from individual consumer groups, unless it is a requirement that the consumer group must be the same for scaling purposes.

Related

Kafkajs - multiple consumers reading from same topic and partition

I'm planning to use Kafkajs https://kafka.js.org/ and implement it in a NodeJs server.
I would like to know what is the expected behavior in case I have 2 (or more) instances of the server running, each of them having a consumer which is configured with the same group id and topic?
Does this mean that they might read the same messages?
Should I specify a unique consumer group per each server instance ?
I read this - Multiple consumers consuming from same topic but not sure it applies for Kafkajs
It's not possible for a single consumer group to have multiple consumers reading from overlapping partitions in any Kafka library. If your topic only has one partition, only one instance of your application will be consuming it, and if that instance dies, the group will rebalance and the other instance will take over (potentially reading some of the same data, due to the nature of at-least-once delivery, but it's not at the same time as the other instance)

One to One and Group Messaging using Kafka

As Kafka has a topic based pub-sub architecture how can I handle One-to-One and Group Messaging part of web application using Kafka?
I am using SpringBoot+Angular stack and Docker Kafka server.
I'll write another answer here.
Based on my experience with the chatting service. You only need one topic for all the messages. Using a well designed Message body.
public class Message {
private String from; // user id
private String to; // user id or group id
}
Then you can create like 100 partitions for this topic and create two consumers to consume them (50 partitions for one consumer in the beginning).
Then if your system reaches the bottleneck, you can easier scale X more consumers to handle the load.
How to do distribute the messages in the consumer. I used to send the messages to the Mobile app, so all the app has a long-existing connection to the server, and the server sends the messages to the app by that channel. For group chat, I create a Redis cache to store all the active users in the group, so I can easier get the users who belong to this group, send them the messages.
And another thing, Kafka stateless, means Kafka doesn't de-coupled from the business logic, only acts as a message system, transfers the messages. If you connect your business logic to Kafka, like create a topic "One-to-One" and delete some after they finished, Kafka will be very messy.
By One-to-One, I suppose you mean one producer and one consumer i.e. using at as a queue.
This is certainly possible with Kafka. You can have one consumer subscribe to a topic and and restrict others by not giving them authorization . See Authorization in Kafka
Note that once a message is consumed, it is not deleted, rather it is committed so that the same consumer will not consume it again.
By Group Messaging, I suppose you mean one producer > multiple consumers or
multiple-producer > multiple-consumers
This is also possible, a producer can produce messages to a topic and multiple consumers can consume them.
If all the consumers have the same group id, then each consumer in the group gets only a subset of messages.
If they have different group ids then each consumer will get all messages.
Multiple producers also can produce to the same topic.
A consumer can also subscribe to multiple topics.
Ok, It's a very complicated question, I try to type some simple basic information.
Kafka topics are divided into a number of partitions. Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers — each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel.
So if you are using partitions, means you have multiple consumers to consume some in parallel.
consumer groups for a given topic — each consumer within the group reads from a unique partition and the group as a whole consumes all messages from the entire topic.
Basically, you can have only one group, then the message will not be processed twice in the same consumer group, and this is how Kafka delivers exactly once.
If you need two consumer groups, you need to think about why you need two? Are the consumers in two groups handling the different logic?
There is more, please check the official document, or you can answer a smaller question.

Kafka Stream - How Elastic Scaling works

I was reading about Kafka Stream - Elastic Scaling features.
Means Kafka Stream can handover the task to other instance and task states will get created using changelog. Its mentioned that Instance coardinate with each other to achieve rebalance.
But there is no such detail given how exactly rebalance work?
Is it same like how Consumer Group works or different mechanism because Kafka Stream instances not exactly how consumer in Consumer Group?
Visit this article for a more thorough explanation.
..."In a nutshell, running instances of your application will automatically become aware of new instances joining the group, and will split the work with them; and vice versa, if any running instances are leaving the group (e.g. because they were stopped or they failed), then the remaining instances will become aware of that, too, and will take over their work. More specifically, when you are launching instances of your Streams API based application, these instances will share the same Kafka consumer group id. The group.id is a setting of Kafka’s consumer configuration, and for a Streams API based application this consumer group id is derived from the application.id setting in the Kafka Streams configuration."...

Join multiple Kafka topics by key

How can write a consumer that joins multiple Kafka topics in a scalable way?
I have a topic that published events with a key and a second topic that publishes other events related to a subset of the first with the same key. I would like to write a consumer that subscribes to both topics and performs some additional actions for the subset that appears in both topics.
I can do this easily with a single consumer: read everything from both topics, maintaining state locally and perform the actions when both events have been read for a given key. But I need the solution to scale.
Ideally I need to tie the topics together so that they are partitioned the same way and the partitions are assigned to consumers in sync. How can i do this?
I know Kafka Streams joins topics together such that keys are allocated to the same nodes. How do they do it? P.S. I can't used Kafka Streams because I'm using Python.
Too bad you are on Python -- Kafka Streams would be a perfect fit :)
If you want to do this manually, you will need to implement your own PartitionAssignor -- this, implementation must ensure, that partitions are co-located in the assignment: Assume you have 4 partitions per topic (let's call them A and B), than partitions A_0 and B_0 must be assigned to the same consumer (also A_1 and B_1, ...).
I hope Python consumer allows you to specify a custom partition assignor via config parameter partition.assignment.strategy.
This is the PartitionAssignor Kafka Streams uses: https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java
Streams uses the concept of tasks -- a tasks gets partitions of different topics with the same partition number assigned. Streams also tries to do a "sticky assignment" -- ie., don't move task (and thus partitions) in case of rebalance if possible. Thus, each consumer encodes its "old assignment" in the rebalance metadata.
Basically, the method #subscription() is called on each consumer that is alive. It will send the subscription information of the consumer (ie, to what topics a consumer wants to subscribe) plus optional metadata to the brokers.
In a second step, the leader of the consumer group, will compute the actual assignment, within #assign(). The responsible broker collects all information given by #subscription() in the first phase of the rebalance and hands it to #assign(). Thus, the leader gets a global overview over the whole group, and thus can ensure that partitions are assigned in a co-located manner.
In the last step, the broker received the computed assignment from the leader, and broadcasts it to all consumers of the group. This will result in a call to #onAssignment() on each consumer.
This might also help:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture
http://docs.confluent.io/current/streams/architecture.html

How to create a kafka non persistant topic

Is there a way I can make a kafka topic non persistant? I plan to use multiple consumers in a single topic but I dont want all my consumers picking up the same messages.
In kafka to simulate the behaviour of a queue all your consumers would be in the same consumer group.
See the kafka docs for more information
Consumers
Messaging traditionally has two models: queuing and publish-subscribe.
In a queue, a pool of consumers may read from a server and each
message goes to one of them; in publish-subscribe the message is
broadcast to all consumers. Kafka offers a single consumer abstraction
that generalizes both of these—the consumer group. Consumers label
themselves with a consumer group name, and each message published to a
topic is delivered to one consumer instance within each subscribing
consumer group. Consumer instances can be in separate processes or on
separate machines.
If all the consumer instances have the same consumer group, then this
works just like a traditional queue balancing load over the consumers.
If you want to control when messages are deleted from the log you can set retention.ms or retention.bytes in the topic configuration. Be aware that these parameters will delete a message disregarding if it was consumed or not