I will give an abstract example of the issue that I encountered.
The user makes an HTTPS request to our server (request-proxy / load balancer), the load balancer establishes a socket connection with one of the endpoint node (multi node service). This service, in turn, performs some logic, creates a message and sends it to the topic (request topic). Payload in this message also contains an assigned partitions for this instance (e.g. [1, 3, 5]). Then the system (black box) processes this request and replies to another topic (response topic) determining which partition to send this message to (e.g. randomly from [1, 3, 5]). Endpoint service (pod) that has a connections to the user receives this message and replies to user via http.
Now imagine that there was a rebalance, but the endpoint service managed to send a message before that. As a result (possibly) another pod of endpoint service will receive response but will not be able to respond to user, because no connection with him.
Note:
Consumer group segregation is not the way to go (use different consumer groups for each pod), because messages are relatively large. I don't want each pod to receive messages that do not belong to it, thereby increasing the load on the network.
I see no point in using key partitioning (calculate hash).
I use workarounds to solve this problem, but would really like to know what practices exist when using Kafka. Thanks.
See diagram here
Related
This is more of a design/architecture question.
We have a microservice A (MSA) with multiple instances (say 2) running of it behind LB.
The purpose of this microservice is to get the messages from Kafka topic and send to end users/clients. Both instances use same consumer group id for a particular client/user so as messages are not duplicated. And we have 2 (or =#instances) partitions of Kafka topic
End users/clients connect to LB to fetch the message from MSA. Long polling is used here.
Request from client can land to any instance. If it lands to MSA1, it will pull the data from kafka partion1 and if it lands to MSA2, it will pull the data from partition2.
Now, a producer is producing the messages, we dont have high messages count. So, lets say producer produce msg1 and it goes to partition1. End user/client will not get this message unless it's request lands to MSA1, which might not happen always as there are other requests coming to LB.
We want to solve this issue. We want that client gets the message near realtime.
One of the solution can be having a distributed persistent queue (e.g. ActiveMQ) where both MSA1 and MSA2 keep on putting the messages after reading from Kafka and client just fetch the message from queue. But this will cause separate queue for every end-user/client/groupid.
Is this a good solution, can we go ahead with this? Anything that we should change here. We are deploying our system on AWS, so if any AWS managed service can help here e.g. SNS+SQS combination?
Some statistics:
~1000 users, one group id per user
2-4 instances of microservice
long polling every few seconds (~20s)
average message size ~10KB
Broadly you have three possible approaches:
You can dispense with using Kafka's consumer group functionality and allow each instance to consume from all partitions.
You can make the instances of each service aware of each other. For example, an instance which gets a request which can be fulfilled by another instance will forward the request there. This is most effective if the messages can be partitioned by client on the producer end (so that a request from a given client only needs to be routed to an instance). Even then, the consumer group functionality introduces some extra difficulty (rebalances mean that the consumer currently responsible for a given partition might not have seen all the messages in the partition). You may want to implement your own variant of the consumer group coordination protocol, only on rebalance, the instance starts from some suitably early point regardless of where the previous consumer got to.
If you can't reliably partition by client in the producer (e.g. the client is requesting a stream of all messages matching arbitrary criteria) then Kafka is really not going to be a fit and you probably want a database (with all the expense and complexity that implies).
How can I implement request/reply pattern with Apache Kafka? Implementation should also work with scaling of service instances (f.e. pods in the kubernetes).
In the rabbit, I can create the temporary non-durable unique queue per instance that receives responses from other services. This queue will be removed automatically when connection is lost (when instance of the service is down).
How can I do this with Kafka? How to scale this solution?
I use node js
Given that your Rabbit example is only talking about the channel for receiving the response (ignoring sending the request), it's most practical (since Kafka doesn't handle dynamic topic creation/deletion particularly well) to have a single topic for responses to that service with however many partitions you need to meet your throughput goal. A requestor instance will choose a partition to consume at random (multiple instances could consume the same partition) and communicate that partition and a unique correlation ID with the request. The response is then produced to the selected partition and keyed with the correlation ID. Requestors track the set of correlation IDs they're waiting for and ignore responses with keys not in that set.
The risk of collisions in correlation IDs can be mitigated by having the requestors coordinate among themselves (possibly using something like etcd/zookeeper/consul).
This isn't a messaging pattern for which Kafka is that well-suited (it's definitely not best of breed for this), but it's workable.
I have a User service which is listening to a request topic and returning the User object. I need to call this service synchronously from two different services and wanted to confirm if its ok for them to use the same request/response topic names to both request the User object?
See the documentation.
There are two options when using the same reply topic:
Discard unexpected replies:
When configuring with a single reply topic, each instance must use a different group.id. In this case, all instances receive each reply, but only the instance that sent the request finds the correlation ID. This may be useful for auto-scaling, but with the overhead of additional network traffic and the small cost of discarding each unwanted reply. When you use this setting, we recommend that you set the template’s sharedReplyTopic to true, which reduces the logging level of unexpected replies to DEBUG instead of the default ERROR.
Use dedicated partitions:
If you have multiple client instances and you do not configure them as discussed in the preceding paragraph, each instance needs a dedicated reply topic. An alternative is to set the KafkaHeaders.REPLY_PARTITION and use a dedicated partition for each instance. The Header contains a four-byte int (big-endian). The server must use this header to route the reply to the correct partition (#KafkaListener does this). In this case, though, the reply container must not use Kafka’s group management feature and must be configured to listen on a fixed partition (by using a TopicPartitionOffset in its ContainerProperties constructor).
Multiple consumers can read and process same message from same topic if both consumers are in different consumer group.
I'm having an architecture of microservices where each service's producer write to the same topic. I have two instance of kafkaRestproxy each listen to that topic but the problem here is that :
Suppose a request come to instance-1 of a restproxy and it will redirect to the microservice and that service done with the job and write the response to the topic but the response is consumed by the second instance of the restproxy let say instance-2.
What should I do to solve this? Is their any kind of application_id we can attach to the request so when that microservice done with the job and if another instance of restproxy consumed that response then we can redirect the response to that instance of restproxy which gets that request?
Your proxies form a Kafka Consumer group, just as any other application. When you request records, you give both the consumer group and the consumer instance name (such as a host of the HTTP client) GET /consumers/(string:group_name)/instances/(string:instance)/records
You should generally not try to strictly control which consumers get which information beyond assigning a unique instance to each request, to allow for parallel consumption (assuming this is what you want).
Also, the rest proxy isn't consuming anything unless you have another application that's requesting that information, e.g. the GET request above.
I'm trying to implement an RPC architecture using Kafka as a message broker. The decision of using Kafka instead of another message broker solution is dictated by the current context.
The actual implementation consists on two different types of service:
The receiver: this service receives messages from a Kafka topic which consumes, processes the messages and then publish the response message to a response topic;
The caller: this service receives HTTP requests, then publish messages to the receiver topic, consumes the response topic of the receiver service for the response message, then returns it as an HTTP response.
The request/response messages published in the topics are related by the message key.
The receiver implementation was fairly simple: at startup, it creates the "request" and "response" topic, then starts consuming the request topic with the service group id (many instances of the receiver will share the same group id in order to implement a proper request balance). When a request arrives, the service processes the request and then publish the response in the response topic.
My problem is with the caller implementation, in particular while consuming the response from the response queue.
With the following assumptions:
The HTTP requests must be managed concurrently;
There could be more than one instance of this caller service.
every single thread/service must receive all the messages in the response topic, in order to find the message with the corresponding request key.
As an example, imagine that two receiver services produce two messages with keys 1 and 2 respectively. These messages will be published in the receiver topic, and processed. The response will then be published in the topic receiver-responses. If the two receiver services share the same group-id, it could be that response 1 arrives to the service that published message 2 and vice versa, resulting in a HTTP timeout.
To avoid this problem, I've managed to think these possible solutions:
Creating a new group for every request (EDIT: but a group cannot be deleted via code, hence it would be necessary another service to clean the zookeeper from these groups);
Creating a new topic for every request, then delete it afterwards.
Hoping that I made myself sufficiently clear - I must admit I am a beginner to Kafka - my question would be:
Which solution is more costly than the other? Or is there another topic/group configuration that could achieve the assumption 3?
Thanks.
I think I've found a possible solution. A group will be automatically deleted by the zookeeper when it's offset doesn't update for a period of time, determined by the configuration offsets.topic.retention.minutes.
The offset update time check should be possible to set up by setting the configuration offsets.retention.check.interval.ms.
This way, when a consumer connects to the response topic searching for the reply message, the created group can be abandoned, and it will be deleted by the zookeeper later in time.