Use Case : Kafka Single Topic with volume of subscribers - apache-kafka

Scenario: A third party system produces messages for specific users and sents to a middleware system. The middleware system processes the messages and stores these messages in a Kafka topic named "MessageTopic". Now at the other end, there are 1000 or 10,000 growing users + which have subscribed to the topic for receiving the messages.
Now, each message is intended for a specific user.And so the expectation is that each user must receive only that specific message ?
Can you please let know how it can be achieved using Kafka topic? [Obviously creating thousands of topics would not make sense in this case.]

Related

Kafka Best Way to Filter Messages

One application (producer) is publishing messages and these messages are being consumed by another application (with multiple consumers). Producer sends data with field country and we will have multiple consumers in our application, each consumer will subscribe to specific country.
From what I have been reading so far, we can have 2 approaches to filter message:
Filter data on consumer side: Producer can add country in message
header. Consumer will receive all data and filter country it needs by checking
from message header. Not sure if we can/should have multiple Consumers with different filters on different countries? Or just one Consumer that filters out the list of countries and we do aggregation by countries on our own?
One topic with separate partition for separate
country: We will have a custom partitioner on Producer so it can send
message to a specific partition. Consumers will be directed to the
right partition for consuming country specific message.
My question is should we choose option 1 or 2? We are expecting to receive hundreds of messages every few seconds.
In my experience typically the first approach is used.
The second option is problematic. What if you add a new country? You will need to add a partition to the topic, which is possible but not straightforward. You will also need to change the logic on the producer and conusmer side. If consumers are just subscribed to the topic, then in case of failure partitions will be automatically assigned to the alive consumers inside the consumer group. In your case you will need to handle the failures with the programming logic.
Another approach is to have a topic per country.
One more approach is to publish all the data into one topic and then distribute data to other topics(each per consumer) with Kafka Streams application. If the requirements change then you change the implementation of Kafka Streams app.

Why components involved in request response flow should not consume messages on a kafka topic?

As part of design decision at my client site, the components(microservice) involved in http request-response flow are allowed to produce messsages on a kafka topic, but not allowed to consume messages from kafka topic.
Such components(microservice) can read & write database, can talk to other components, can produce messages on a topic, but cannot consume messages from a kafka topic.
Instead ,the design suggest to write separate utilities that consume messages from kafka topics and store in database. Components(microservice) involved in request-response flow, will read that information from database.
What are the design flaws, if such components(microservice) consume kafka topics? Why the design is suggesting to write separate utilities to consume kafka topic and store in database, so that components can read those information from database.
Kafka Topics are divided into partitions, and for each consumer group, the partitions are distributed among the various consumers in that group. Each consumer is responsible for consuming the messages in the partitions is gets assigned.
Presumably, your request handling machines are clustered and load balanced. There are two ways you might have those machines subscribe to Kafka topics, and both of those ways are broken:
You could put your request handling machines in different consumer groups. In this case, each one will have to consume all of the messages. That is probably not what you want, which is typically to have each consumer pull from the queue and have each message processed only once. Also, the consumers will be out of sync and will process the messages ad different rates.
You could put your request handling machines in the same consumer groups. In this case, each one will only have access to the partitions that it is assigned. Every machine will see different message streams. This, of course, is also not what you want. Clients would get different results depending on which machine the load balancer directed them to.
If you want all of your request handling machines to pull from the same queue of messages across the whole topic, then they need to communicate with a single consumer that is assigned all the partitions.

One to One and Group Messaging using Kafka

As Kafka has a topic based pub-sub architecture how can I handle One-to-One and Group Messaging part of web application using Kafka?
I am using SpringBoot+Angular stack and Docker Kafka server.
I'll write another answer here.
Based on my experience with the chatting service. You only need one topic for all the messages. Using a well designed Message body.
public class Message {
private String from; // user id
private String to; // user id or group id
}
Then you can create like 100 partitions for this topic and create two consumers to consume them (50 partitions for one consumer in the beginning).
Then if your system reaches the bottleneck, you can easier scale X more consumers to handle the load.
How to do distribute the messages in the consumer. I used to send the messages to the Mobile app, so all the app has a long-existing connection to the server, and the server sends the messages to the app by that channel. For group chat, I create a Redis cache to store all the active users in the group, so I can easier get the users who belong to this group, send them the messages.
And another thing, Kafka stateless, means Kafka doesn't de-coupled from the business logic, only acts as a message system, transfers the messages. If you connect your business logic to Kafka, like create a topic "One-to-One" and delete some after they finished, Kafka will be very messy.
By One-to-One, I suppose you mean one producer and one consumer i.e. using at as a queue.
This is certainly possible with Kafka. You can have one consumer subscribe to a topic and and restrict others by not giving them authorization . See Authorization in Kafka
Note that once a message is consumed, it is not deleted, rather it is committed so that the same consumer will not consume it again.
By Group Messaging, I suppose you mean one producer > multiple consumers or
multiple-producer > multiple-consumers
This is also possible, a producer can produce messages to a topic and multiple consumers can consume them.
If all the consumers have the same group id, then each consumer in the group gets only a subset of messages.
If they have different group ids then each consumer will get all messages.
Multiple producers also can produce to the same topic.
A consumer can also subscribe to multiple topics.
Ok, It's a very complicated question, I try to type some simple basic information.
Kafka topics are divided into a number of partitions. Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers — each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel.
So if you are using partitions, means you have multiple consumers to consume some in parallel.
consumer groups for a given topic — each consumer within the group reads from a unique partition and the group as a whole consumes all messages from the entire topic.
Basically, you can have only one group, then the message will not be processed twice in the same consumer group, and this is how Kafka delivers exactly once.
If you need two consumer groups, you need to think about why you need two? Are the consumers in two groups handling the different logic?
There is more, please check the official document, or you can answer a smaller question.

Kafka very large number of topics?

I am considering Kafka to stream updates from the back-end to the front-end applications.
- Data streams are specific to a user requests, so each request will generate a stream in the back-end.
- Each user will have multiple concurrent requests. One to many relationship btw user and streams
I first thought I would setup a topic "per user request" but learnt that hundreds of thousands of topics is bad for multiple reasons.
Reading online, I came across posts that suggest one topic partitioned on userid. How is that any better than multiple topics?
If partitioning on userid is the way to go, the consumer will receive updates for different requests (from that user) and that will cause issues. I need to be able to not process a stream until I choose to, and if each request had it own topic this will work out great.
Thoughts?
I don't think Kafka will be a good option for your use case. As your use case is somewhat "synchronous" and "dynamic" in nature. A user request is submitted and the client wait for the stream of response events, the client should also know when the response for a particular user request ends. Multiple user requests may end up in the same Kafka partition as we cannot afford to have an exclusive partition for each user when number of users is high.
I guess Redis may be a better use case for this use case. Every request can have an unique id, and response events are added to a Redis list with some reasonable expiry time. The Redis list is given the same key name as the request id.
Redis list will look like (key is request id):
request id --> response even1, response event2,...... , response end evnt
The process which is relaying the event to the client will delete the list after it successfully sends all the response event to the client and the "last response event marker" is encountered. If the relaying process dies before it can delete the response, Redis will take care of deleting the list after the list's expiry time.
Although it is possible (I guess) to have a Kafka cluster of several thousends topics, I'm not sure it is the way to go in your particular case.
Usually you design your Kafka app around streams of data: like click-streams, page-views etc. Then, if you want some kind of "sticky" processors - you need partition key. In your case, if you select user id as a key, Kafka will store all events from an user to the same partition.
Kafka consumer, on the other side, read messages from 1 to all partitions of a topic. That means, if say, you have a topic with 10 partitions, you can start your Kafka consumer in a consumer group so every consumer has a distinct partitions assigned.
It means, for the user id example, all users will be processed by the exactly one consumer depending on the key. For example, userid A goes to partition 1, but userid B goes to partition 10.
Again, you can use message key in order to map your data stream to Kafka partitions. All events with the same key will be stored to the same partition and will be consumed/processed by the same consumer instance.

Kafka instead of Rest for communication between microservices

I want to change the communication between (micro)-services from REST to Kafka.
I'm not sure about the topics and wanted to hear some opinions about that.
Consider the following setup:
I have an API-Gateway that provides CRUD functions via REST for web applications. So I have 4 endpoints which users can call.
The API-Gateway will produce the request and consumes the responses from the second service.
The second service consumes the requests, access the database to execute the CRUD operations on the database and produces the result.
How many topics should I create?
Do I have to create 8 (2 per endpoint (request/response)) or is there a better way to do it?
Would like to hear some experience or links to talks / documentation on that.
The short answer for this question is; It depends on your design.
You can use only one topic for all your operations or you can use several topics for different operations. However you must know that;
Your have to produce messages to kafka in the order that they created and you must consume the messages in the same order to provide consistency. Messages that are send to kafka are ordered within a topic partition. Messages in different topic partitions are not ordered by kafka. Lets say, you created an item then deleted that item. If you try to consume the message related to delete operation before the message related to create operation you get error. In this scenario, you must send these two messages to same topic partition to ensure that the delete message is consumed after create message.
Please note that, there is always a trade of between consistency and throughput. In this scenario, if you use a single topic partition and send all your messages to the same topic partition you will provide consistency but you cannot consume messages fast. Because you will get messages from the same topic partition one by one and you will get next message when the previous message consumed. To increase throughput here, you can use multiple topics or you can divide the topic into partitions. For both of these solutions you must implement some logic on producer side to provide consistency. You must send related messages to same topic partition. For instance, you can partition the topic into the number of different entity types and you send the messages of same entity type crud operation to the same partition. I don't know whether it ensures consistency in your scenario or not but this can be an alternative. You should find the logic which provides consistency with multiple topics or topic partitions. It depends on your case. If you can find the logic, you provide both consistency and throughput.
For your case, i would use a single topic with multiple partitions and on producer side i would send related messages to the same topic partition.
--regards