I have a User service which is listening to a request topic and returning the User object. I need to call this service synchronously from two different services and wanted to confirm if its ok for them to use the same request/response topic names to both request the User object?
See the documentation.
There are two options when using the same reply topic:
Discard unexpected replies:
When configuring with a single reply topic, each instance must use a different group.id. In this case, all instances receive each reply, but only the instance that sent the request finds the correlation ID. This may be useful for auto-scaling, but with the overhead of additional network traffic and the small cost of discarding each unwanted reply. When you use this setting, we recommend that you set the template’s sharedReplyTopic to true, which reduces the logging level of unexpected replies to DEBUG instead of the default ERROR.
Use dedicated partitions:
If you have multiple client instances and you do not configure them as discussed in the preceding paragraph, each instance needs a dedicated reply topic. An alternative is to set the KafkaHeaders.REPLY_PARTITION and use a dedicated partition for each instance. The Header contains a four-byte int (big-endian). The server must use this header to route the reply to the correct partition (#KafkaListener does this). In this case, though, the reply container must not use Kafka’s group management feature and must be configured to listen on a fixed partition (by using a TopicPartitionOffset in its ContainerProperties constructor).
Multiple consumers can read and process same message from same topic if both consumers are in different consumer group.
Related
This is more of a design/architecture question.
We have a microservice A (MSA) with multiple instances (say 2) running of it behind LB.
The purpose of this microservice is to get the messages from Kafka topic and send to end users/clients. Both instances use same consumer group id for a particular client/user so as messages are not duplicated. And we have 2 (or =#instances) partitions of Kafka topic
End users/clients connect to LB to fetch the message from MSA. Long polling is used here.
Request from client can land to any instance. If it lands to MSA1, it will pull the data from kafka partion1 and if it lands to MSA2, it will pull the data from partition2.
Now, a producer is producing the messages, we dont have high messages count. So, lets say producer produce msg1 and it goes to partition1. End user/client will not get this message unless it's request lands to MSA1, which might not happen always as there are other requests coming to LB.
We want to solve this issue. We want that client gets the message near realtime.
One of the solution can be having a distributed persistent queue (e.g. ActiveMQ) where both MSA1 and MSA2 keep on putting the messages after reading from Kafka and client just fetch the message from queue. But this will cause separate queue for every end-user/client/groupid.
Is this a good solution, can we go ahead with this? Anything that we should change here. We are deploying our system on AWS, so if any AWS managed service can help here e.g. SNS+SQS combination?
Some statistics:
~1000 users, one group id per user
2-4 instances of microservice
long polling every few seconds (~20s)
average message size ~10KB
Broadly you have three possible approaches:
You can dispense with using Kafka's consumer group functionality and allow each instance to consume from all partitions.
You can make the instances of each service aware of each other. For example, an instance which gets a request which can be fulfilled by another instance will forward the request there. This is most effective if the messages can be partitioned by client on the producer end (so that a request from a given client only needs to be routed to an instance). Even then, the consumer group functionality introduces some extra difficulty (rebalances mean that the consumer currently responsible for a given partition might not have seen all the messages in the partition). You may want to implement your own variant of the consumer group coordination protocol, only on rebalance, the instance starts from some suitably early point regardless of where the previous consumer got to.
If you can't reliably partition by client in the producer (e.g. the client is requesting a stream of all messages matching arbitrary criteria) then Kafka is really not going to be a fit and you probably want a database (with all the expense and complexity that implies).
I am trying to understand how Kafka can be used for real time notification. Let's say I have a kafka topic for alerting purposes. This topic is used by various services to send updates to the users.
There are 10 instances of notification service running and consuming messages from the topic.
Online users would be distributed among 10 instances. For ex: User1 might be connected to Instance 8 with a websocket connection.
So how to ensure that users are notified correctly? That is, how to ensure that only Instance8 is processing the message for the User1.?
This problem needs to be addressed through multiple angles - let's look at each one...
First - the consumer side...
You'll need as many partitions as there are consumer application instances i.e. the notification service - in your case you've got 10 instances so 10 partitions (or a multiple of 10) to the topic. This will ensure none of the service instances are left idle. Also, they'll need to be a part of the same consumer group. Now, there are a few different partition assignment approaches available and you might need to look into these to find out the one that suits your situation - here's a good reference article.
An example - If you've got 100 users and user-1 to user-10 must be handled by notification-service-1, then StickyAssignor might suit you best.
Alternatively, you could even write your custom partition assignor and the reference article mentioned above does provide some information on this as well
Second - the producer side...
The producer applications writing data to the given Kafka topic should ensure that they send data related to a particular user to a certain partition.
As Kafka messages are made up of key-value pairs, you'll need to make sure that the keys are NOT null. The best would be to use some user-related-information as the key - this way you can make sure that messages in any partition are consumed by the designated consumer instance.
Lastly, please note that I've left out the part on which users (socket connections) are mapped to which notification service instance as it is beyond Kafka and I'm not sure if that part is designed to be strict or not.
This is in continuation of another Question which I posted.
Below is the link for previous questions : Aggregation of messages from multiple topics
I have a requirement for which I have tried to create a diagram.
My requirement is to make sure that node1 should receive the responses R1M1 & R1M2 and node2 should receive the R2M1 & R2M2.
Things which are implemented:
Setting the KafkaHeaders.CORRELATION_ID in the Producer Record from both the nodes.
KafkaMessageListenerContainer bean created with the two response topics in Container Properties.
ContainerProperties containerProperties = new ContainerProperties(output-message-topic1, output-message-topic2);
aggregatingReplyingKafkaTemplate.setSharedReplyTopic(true);
consumerFactory is configured with single groupId.
consumerConfigProps.put(ConsumerConfig.GROUP_ID_CONFIG, "xxxx.yyyyy.zzzz");
Note :
I am able to achieve the aggregations of multiple response message with one node.
Creating separate topic for each consumer(node) is not possible because of infra limitation.
I need assistance on changes which needs to be implemented to get the associated aggregated response for a request.
See the documentation.
You MUST configure each instance to use a discrete reply topic partition and the server must route the reply to the requested partition; or you must use a different group.id for each instance so that replies are sent to both (and discarded by the instance that did not emit the request).
When you configure with a single reply TopicPartitionOffset, you can use the same reply topic for multiple templates, as long as each instance listens on a different partition. When configuring with a single reply topic, each instance must use a different group.id. In this case, all instances receive each reply, but only the instance that sent the request finds the correlation ID. This may be useful for auto-scaling, but with the overhead of additional network traffic and the small cost of discarding each unwanted reply. When you use this setting, we recommend that you set the template’s sharedReplyTopic to true, which reduces the logging level of unexpected replies to DEBUG instead of the default ERROR.
If you have multiple client instances and you do not configure them as discussed in the preceding paragraph, each instance needs a dedicated reply topic. An alternative is to set the KafkaHeaders.REPLY_PARTITION and use a dedicated partition for each instance. The Header contains a four-byte int (big-endian). The server must use this header to route the reply to the correct partition (#KafkaListener does this). In this case, though, the reply container must not use Kafka’s group management feature and must be configured to listen on a fixed partition (by using a TopicPartitionOffset in its ContainerProperties constructor).
I'm trying to implement a request/reply pattern with Kafka. I am working with named services and unnamed clients that send messages to those services, and clients may expect a reply. Many (10s-100s) of clients may interact with a single service, or consumer group of services.
Strategy one: filtering messages
The first thought was to have two topics per service - the "HelloWorld" service would consume the "HelloWorld" topic, and produce replies back to the "HelloWorld-Reply" topic. Clients would consume that reply topic and filter on unique message IDs to know what replies are relevant to them.
The drawback there is it seems like it might create unnecessary work for clients to filter out a potentially large amount of irrelevant messages when many clients are interacting with one service.
Strategy two: ephemeral topics
The second idea was to create a unique ID per client, and send that ID along with messages. Clients would consume their own unique topic "[ClientID]" and services would send to that topic when they have a reply. Clients would thus not have to filter irrelevant messages.
The drawback there is clients may have a short lifespan, e.g. they may be single use scripts, and they would have to create their topic beforehand and delete it afterward. There might have to be some extra process to purge unused client topics if a client dies during processing.
Which of these seems like a better idea?
We are using Kafka in production as a handler for event based messages and request/response messages. our approach to implementing request/response is your first strategy because, when the number of clients grows, you have to create many topics which some of them are completely useless. another reason for choosing the first strategy was our topic naming guideline that each service should belong to only one topic for tacking. however, Kafka is not made for request/response messages but I recommend the first strategy because:
few numbers of topics
better service tracking
better topic naming
but you have to be careful about your consumer groups. which may causes of data loss.
A better approach is using the first strategy with many partitions in one topic (service) that each client sends and receives its messages with a unique key. Kafka guarantees that all messages with the same key will go to a specific partition. this approach doesn't need filtering irrelevant messages and maybe is a combination of your two strategies.
Update:
As #ValBonn said in the suggested approach you always have to be sure that the number of partitions >= number of clients.
I have a validation service which takes in validation-requests and publishes them to a SQS queue. Now based on the type of validation request, I want to forward the message to that specific service.
So basically, I have one producer and multiple consumers, but essentially, one message is to be consumed by only one consumer.
What approach should I use? Should I have a different SQS queue for each service or I can do this using a single queue based on message type?
As I see it, you have three options;
The first option, like you say is to have a unique consumer for each message type. This is the approach we use and we have thousands of queues and many different messages types.
The second option would be to decorate the message being pushed onto SQS with something that would indicate it's desired consume, then have a generic consumer in your application that can forward the message on to the right consumer. Though this approach is generally seen as an anti pattern, I would personally agree.
Thirdly, you could take advantage of SNS filtering but that's only if you use SNS right now otherwise you'd have to invest in some time to setup it up and make it work.
Hope that helps!