Why components involved in request response flow should not consume messages on a kafka topic? - apache-kafka

As part of design decision at my client site, the components(microservice) involved in http request-response flow are allowed to produce messsages on a kafka topic, but not allowed to consume messages from kafka topic.
Such components(microservice) can read & write database, can talk to other components, can produce messages on a topic, but cannot consume messages from a kafka topic.
Instead ,the design suggest to write separate utilities that consume messages from kafka topics and store in database. Components(microservice) involved in request-response flow, will read that information from database.
What are the design flaws, if such components(microservice) consume kafka topics? Why the design is suggesting to write separate utilities to consume kafka topic and store in database, so that components can read those information from database.

Kafka Topics are divided into partitions, and for each consumer group, the partitions are distributed among the various consumers in that group. Each consumer is responsible for consuming the messages in the partitions is gets assigned.
Presumably, your request handling machines are clustered and load balanced. There are two ways you might have those machines subscribe to Kafka topics, and both of those ways are broken:
You could put your request handling machines in different consumer groups. In this case, each one will have to consume all of the messages. That is probably not what you want, which is typically to have each consumer pull from the queue and have each message processed only once. Also, the consumers will be out of sync and will process the messages ad different rates.
You could put your request handling machines in the same consumer groups. In this case, each one will only have access to the partitions that it is assigned. Every machine will see different message streams. This, of course, is also not what you want. Clients would get different results depending on which machine the load balancer directed them to.
If you want all of your request handling machines to pull from the same queue of messages across the whole topic, then they need to communicate with a single consumer that is assigned all the partitions.

Related

how to publish messages intended for different consumers on same topic of Kafka server?

We have multiple consumers(separate microservices) for our topic and the events which we are publishing on the topic is intended for separate micro services or only one consumer at a time?
Can someone suggest what is the best approach to implement this?
eg. I have partition 0 & 1 in my Kafka topic which is being consumed by CG-A and CG-B.
I am publishing something like this
record-1 for CG-A then record-2 for CG-B then again record-3 for CG-A.
How do i make sure that CG-A consumes record-1 from the offset.
Producers and consumers are completely decoupled. Your producer cannot send records "to a consumer".
Consumers always read all records from the topic partitions they've been assigned, regardless of what processes produced into them.
If only certain records are meant for certain consumer groups, then that's processing logic unique to your own applications post-consumption from Kafka. I.e. add conditional statements to filter those events

Publisher which subscribes to its own topic

I'm currently designing an application which is will have hundreds of log-compacted topics. Each topic is related to a failover group and should have a dynamic (e.g., to be changed on demand) set of producers and consumers.
For example, let's say I have 3 failover instances related to topic T1. Each of those failover instances should have the same data / state (eventually consistent). And each of the instances may consume and produce messages on that topic.
As I understand, I need to assign different group IDs for each consumer/producer in order to have every instance read the topic entirely.
Though given that the number of readers and writers for a topic are not fixed, how is it possible to avoid reading ones own messages for that topic?
Sure, I could add a source ID to the message and just dismiss the message when the consumer figures out that he is about to read a message he previously produced himself. But I'd rather avoid the data transfer entirely.
Producers and consumers are independent processes. If you subscribe to the same topic that's being produced to without some extra processing logic, you'll end up with an infinite loop.
You also cannot have more consumers than partitions, so the dynamic consumer amount will be limited by that.
need to assign different group IDs for each consumer/producer in order to have every instance read the topic entirely
Not necessarily. You've mentioned you have compacted topics, so I assume you are using Kafka Streams. In the Streams API, you can set num.standby.replicas for copying statestore data across instances of the same application.id

One to One and Group Messaging using Kafka

As Kafka has a topic based pub-sub architecture how can I handle One-to-One and Group Messaging part of web application using Kafka?
I am using SpringBoot+Angular stack and Docker Kafka server.
I'll write another answer here.
Based on my experience with the chatting service. You only need one topic for all the messages. Using a well designed Message body.
public class Message {
private String from; // user id
private String to; // user id or group id
}
Then you can create like 100 partitions for this topic and create two consumers to consume them (50 partitions for one consumer in the beginning).
Then if your system reaches the bottleneck, you can easier scale X more consumers to handle the load.
How to do distribute the messages in the consumer. I used to send the messages to the Mobile app, so all the app has a long-existing connection to the server, and the server sends the messages to the app by that channel. For group chat, I create a Redis cache to store all the active users in the group, so I can easier get the users who belong to this group, send them the messages.
And another thing, Kafka stateless, means Kafka doesn't de-coupled from the business logic, only acts as a message system, transfers the messages. If you connect your business logic to Kafka, like create a topic "One-to-One" and delete some after they finished, Kafka will be very messy.
By One-to-One, I suppose you mean one producer and one consumer i.e. using at as a queue.
This is certainly possible with Kafka. You can have one consumer subscribe to a topic and and restrict others by not giving them authorization . See Authorization in Kafka
Note that once a message is consumed, it is not deleted, rather it is committed so that the same consumer will not consume it again.
By Group Messaging, I suppose you mean one producer > multiple consumers or
multiple-producer > multiple-consumers
This is also possible, a producer can produce messages to a topic and multiple consumers can consume them.
If all the consumers have the same group id, then each consumer in the group gets only a subset of messages.
If they have different group ids then each consumer will get all messages.
Multiple producers also can produce to the same topic.
A consumer can also subscribe to multiple topics.
Ok, It's a very complicated question, I try to type some simple basic information.
Kafka topics are divided into a number of partitions. Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers — each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel.
So if you are using partitions, means you have multiple consumers to consume some in parallel.
consumer groups for a given topic — each consumer within the group reads from a unique partition and the group as a whole consumes all messages from the entire topic.
Basically, you can have only one group, then the message will not be processed twice in the same consumer group, and this is how Kafka delivers exactly once.
If you need two consumer groups, you need to think about why you need two? Are the consumers in two groups handling the different logic?
There is more, please check the official document, or you can answer a smaller question.

Kafka instead of Rest for communication between microservices

I want to change the communication between (micro)-services from REST to Kafka.
I'm not sure about the topics and wanted to hear some opinions about that.
Consider the following setup:
I have an API-Gateway that provides CRUD functions via REST for web applications. So I have 4 endpoints which users can call.
The API-Gateway will produce the request and consumes the responses from the second service.
The second service consumes the requests, access the database to execute the CRUD operations on the database and produces the result.
How many topics should I create?
Do I have to create 8 (2 per endpoint (request/response)) or is there a better way to do it?
Would like to hear some experience or links to talks / documentation on that.
The short answer for this question is; It depends on your design.
You can use only one topic for all your operations or you can use several topics for different operations. However you must know that;
Your have to produce messages to kafka in the order that they created and you must consume the messages in the same order to provide consistency. Messages that are send to kafka are ordered within a topic partition. Messages in different topic partitions are not ordered by kafka. Lets say, you created an item then deleted that item. If you try to consume the message related to delete operation before the message related to create operation you get error. In this scenario, you must send these two messages to same topic partition to ensure that the delete message is consumed after create message.
Please note that, there is always a trade of between consistency and throughput. In this scenario, if you use a single topic partition and send all your messages to the same topic partition you will provide consistency but you cannot consume messages fast. Because you will get messages from the same topic partition one by one and you will get next message when the previous message consumed. To increase throughput here, you can use multiple topics or you can divide the topic into partitions. For both of these solutions you must implement some logic on producer side to provide consistency. You must send related messages to same topic partition. For instance, you can partition the topic into the number of different entity types and you send the messages of same entity type crud operation to the same partition. I don't know whether it ensures consistency in your scenario or not but this can be an alternative. You should find the logic which provides consistency with multiple topics or topic partitions. It depends on your case. If you can find the logic, you provide both consistency and throughput.
For your case, i would use a single topic with multiple partitions and on producer side i would send related messages to the same topic partition.
--regards

How does Kafka message processing scale in publish-subscribe mode?

All, Forgive me I am a newbie just beginner of Kafka. Currently I was reading the document of Kafka about the difference between traditional message system like Active MQ and Kafka.
As the document put.
For the traditional message system. they can not scale the message processing.
Since
Publish-subscribe allows you broadcast data to multiple processes, but
has no way of scaling processing since every message goes to every
subscriber.
I think this make sense to me.
But for the Kafka. Document says the Kafka can scale the message processing even in the publish-subscribe mode. (Please correct me if I was wrong. Thanks.)
The consumer group concept in Kafka generalizes these two concepts. As
with a queue the consumer group allows you to divide up processing
over a collection of processes (the members of the consumer group). As
with publish-subscribe, Kafka allows you to broadcast messages to
multiple consumer groups.
The advantage of Kafka's model is that every topic has both these
properties—it can scale processing and is also multi-subscriber—there
is no need to choose one or the other.
So my question is How Kafka make it ? I mean scaling the processing in the publish-subscribe mode. Thanks.
The main unique features in Kafka that enables scalable pub/sub are:
Partitioning individual topics and spreading the active partitions across multiple brokers in the cluster to take advantage of more machines, disks, and cache memory. Producers and consumers often connect to many or all nodes in the cluster, not just a single master node for a given topic/queue.
Storing all messages in a sequential commit log and not deleting them when consumed. This leads to more sequential reads and writes, offloads the broker from having to deal with keeping track of different copies of messages, deleting individual messages, handling fragmentation, tracking which consumer has acknowledged consuming which messages.
Enabling smart parallel processing of individual consumers and consumer groups in a way that each parallel message stream can come from the distributed partitions mentioned in #1 while offloading the offset management and partition assignment logic onto the clients themselves. Kafka scales with more consumers because the consumers do some of the work (unlike most other pub/sub brokers where the bulk of the work is done in the broker)