I am using ActiveMQ.
The problem arises that we have a request queue where we leave a message for the consumer to perform a process and then return the response in a response queue in which the producers are subscribed.
It works fine, but now there is a second producer that sends a message for the same consumer to perform the operation and return the response in the response queue.
As ActiveMQ determines which of the consumers of the response queue corresponds to the response of the process. Here is a graph for better detail:
In a request/response messaging pattern, a CorrelationId is used to relate request and response. For example if producer A puts a request with message Id 1000. The consumer processes that request and sends a response message with CorrelationId set to message id (1000 in this example) of the request message. Then the producer sets a filter condition "give me message from response queue where respnseMessage.CorrelationId=requestMessage.messageId" when it wants to get the response message. This is how requests/responses are correlated.
Related
This is my understanding of sending messages to Kafka asynchronously and implementing a callback function is:
Scenario: Producer is going to receive 4 batches of messages to send (to same partition, for the sake of simplicity).
Producer sends batch A.
Producer sends batch B.
Producers receives reply from broker and implements call back - batch A was unsuccessful and retriable, batch B was successful, so producer sends batch A again.
Won't this disturb the message ordering as now A is received by Kafka after B?
If you need message ordering within partition you can use idempotent producer:
enable.idempotence=true
akcs=all
max.in.flight.requests.per.connection<=5
retries>0
This will resolve potential duplicates from producer and maintain the ordering.
If you don't want to use idempotent producer then it is enough to set max.in.flight.requests.per.connection=1. This is the number of unacknowledged batches on the producer side. It means that batch B will not be sent before acknowledge for A is received.
I have a use case where I want to implement synchronous request / response on top of kafka. For example when the user sends an HTTP request, I want to produce a message on a specific kafka input topic that triggers a dataflow eventually resulting in a response produced on an output topic. I want then to consume the message from the output topic and return the response to the caller.
The workflow is:
HTTP Request -> produce message on input topic -> (consume message from input topic -> app logic -> produce message on output topic) -> consume message from output topic -> HTTP Response.
To implement this case, upon receiving the first HTTP request I want to be able to create on the fly a consumer that will consume from the output topic, before producing a message on the input topic. Otherwise there is a possibility that messages on the output topic are "lost". Consumers in my case have a random group.id and have auto.offset.reset = latest for application reasons.
My question is how I can make sure that the consumer is ready before producing messages. I make sure that I call SubscribeTopics before producing messages. but in my tests so far when there are no committed offsets and kafka is resetting offsets to latest, there is a possibility that messages are lost and never read by my consumer because kafka sometimes thinks that the consumer registered after the messages have been produced.
My workaround so far is to sleep for a bit after I create the consumer to allow kafka to proceed with the commit reset workflow before I produce messages.
I have also tried to implement logic in a rebalance call back (triggered by a consumer subscribing to a topic), in which I am calling assign with offset = latest for the topic partition, but this doesn't seem to have fixed my issue.
Hopefully there is a better solution out there than sleep.
Most HTTP client libraries have an implicit timeout. There's no guarantee your consumer will ever consume an event or that a downstream producer will send data to the "response topic".
Instead, have your initial request immediately return 201 Accepted status (or 400, for example, if you do request validation) with some tracking ID. Then require polling GET requests by-id for status updates either with 404 status or 200 + some status field within the request body.
You'll need a database to store intermediate state.
I am having a Kafka Consumer which fetches the broker in a given fetch interval. It is fetching with the given time interval and it is fine when the messages are there in a topic. But i really don't know the reason why consumer is sending more fetch requests when there are no messages in kafka topic.
In general, consumers send two types of requests to broker:
Heartbeat
Poll request
Heartbeat is sent via separate thread and its interval is configured with heartbeat.interval.ms (3 seconds in default)
For poll request there is no specific interval and it's up to your code. (there is just an upper bound for it (max.poll.interval.ms))
It is absolutely reasonable to send more frequent poll requests when there is no data in partition(s) that your consumer is assigned to. Suppose that you have a code like this:
void consumeLoop() {
while (true) {
records = consumer.poll();
if(!records.isEmpty()) {
processMessages(records);
}
}
}
As you see if there is no records returned from poll, then your consumer will immediately send another poll request. But if there is data, you should first process these records before sending next poll request.
Let's say you have a POST request with some product as the payload. Traditionally, your HttpRequest lifecycle should end with an HttpResponse carrying the requested action's result, in our case a response saying "Product created" might be enough.
But with a message broker, things might turn like this:
The request handler create the appropriate message, CreateProduct(...), and produces it to a topic in the message broker.
Then what ???
A consumer retrieves and process the message by actually creating the product in a persistent database.
Then What ???
What should happens at step 2 ?
If we send a response saying "Your product should be created very soon, keep waiting, we keep you posted":
How can the client be notified after a response has been sent already
?
Are we forced to use WebSocket so we can keep the link opened ?
What should happens at step 4 ?
I have my opinion but I would like to know how you handle it in production.
The app that actually created the product can produce a message saying "Product created" to a status topic in the message broker, so the original message's producer can consume it and then notify the client some how. The only way I see it possible is through a WebSocket connection.
So I would like to know if WebSocket is the only way to do Http Request/Response involving a message broker ? and is it reasonable to use a message broker for Http Request/Response ?
You could think of this in a fully asynchronous fashion ( no websocket needed then).
You do an POST Http request and this will create an unique ID associated with your job. This ID will be stored in a database as well, with a status like 'processing'.
Besides the ID will be returned to your client.
Your job ID ( and its payload parameters) travels inside Kafka and finally goes to a consumer. This consumer will process the job and commit stuff to external DB ( or whatever).
When the job is done you update the job status to 'done' or something like this.
In the meantime, client side, you poll an endpoint that will ask your Job DB state if the job is over or not.
This is a very common way to cover your needs.
Yannick
I'm trying to implement an RPC architecture using Kafka as a message broker. The decision of using Kafka instead of another message broker solution is dictated by the current context.
The actual implementation consists on two different types of service:
The receiver: this service receives messages from a Kafka topic which consumes, processes the messages and then publish the response message to a response topic;
The caller: this service receives HTTP requests, then publish messages to the receiver topic, consumes the response topic of the receiver service for the response message, then returns it as an HTTP response.
The request/response messages published in the topics are related by the message key.
The receiver implementation was fairly simple: at startup, it creates the "request" and "response" topic, then starts consuming the request topic with the service group id (many instances of the receiver will share the same group id in order to implement a proper request balance). When a request arrives, the service processes the request and then publish the response in the response topic.
My problem is with the caller implementation, in particular while consuming the response from the response queue.
With the following assumptions:
The HTTP requests must be managed concurrently;
There could be more than one instance of this caller service.
every single thread/service must receive all the messages in the response topic, in order to find the message with the corresponding request key.
As an example, imagine that two receiver services produce two messages with keys 1 and 2 respectively. These messages will be published in the receiver topic, and processed. The response will then be published in the topic receiver-responses. If the two receiver services share the same group-id, it could be that response 1 arrives to the service that published message 2 and vice versa, resulting in a HTTP timeout.
To avoid this problem, I've managed to think these possible solutions:
Creating a new group for every request (EDIT: but a group cannot be deleted via code, hence it would be necessary another service to clean the zookeeper from these groups);
Creating a new topic for every request, then delete it afterwards.
Hoping that I made myself sufficiently clear - I must admit I am a beginner to Kafka - my question would be:
Which solution is more costly than the other? Or is there another topic/group configuration that could achieve the assumption 3?
Thanks.
I think I've found a possible solution. A group will be automatically deleted by the zookeeper when it's offset doesn't update for a period of time, determined by the configuration offsets.topic.retention.minutes.
The offset update time check should be possible to set up by setting the configuration offsets.retention.check.interval.ms.
This way, when a consumer connects to the response topic searching for the reply message, the created group can be abandoned, and it will be deleted by the zookeeper later in time.