I wish to describe the following scenario:
I have a node.js backend application (It uses a single thread event loop).
This is the general architecture of the system:
Producer -> Kafka -> Consumer -> Database
Let's say that the producer sends a message to Kafka, and the purpose of this message is the make a certain query in database and retrieve the query result.
However, as we all know Kafka is an asynchronous system. If the producer sends a message to Kafka, it gets a response that the message has been accepted by a Kafka broker. Kafka broker doesn't wait until the consumer polls the message and processes it.
In this case, how can the producer get the query result operated on the database?
The flow using Kafka will look like this:
The only way of the Producer A be aware of what happened with the message consumed by the Consumer A is producing another message. Which will be handled accordingly by any other consumer available (in this case, Consumer B).
As you already mentioned, this flow is asynchronous. This can be useful when you have a very heavy processing on your query, like a report generation or something like that, and the second producer will notify an user inbox for example.
If that is not the case, perhaps you should use HTTP, which is synchronous and you will have the response at the end of processing.
You must generate new flow for communicate the query result:
Consumer (now its a producer) -> Kafka topic -> Producer (now its a consumer)
You should consider using another synchronous communication mechanism like HTTP.
Related
I have a use case where I want to implement synchronous request / response on top of kafka. For example when the user sends an HTTP request, I want to produce a message on a specific kafka input topic that triggers a dataflow eventually resulting in a response produced on an output topic. I want then to consume the message from the output topic and return the response to the caller.
The workflow is:
HTTP Request -> produce message on input topic -> (consume message from input topic -> app logic -> produce message on output topic) -> consume message from output topic -> HTTP Response.
To implement this case, upon receiving the first HTTP request I want to be able to create on the fly a consumer that will consume from the output topic, before producing a message on the input topic. Otherwise there is a possibility that messages on the output topic are "lost". Consumers in my case have a random group.id and have auto.offset.reset = latest for application reasons.
My question is how I can make sure that the consumer is ready before producing messages. I make sure that I call SubscribeTopics before producing messages. but in my tests so far when there are no committed offsets and kafka is resetting offsets to latest, there is a possibility that messages are lost and never read by my consumer because kafka sometimes thinks that the consumer registered after the messages have been produced.
My workaround so far is to sleep for a bit after I create the consumer to allow kafka to proceed with the commit reset workflow before I produce messages.
I have also tried to implement logic in a rebalance call back (triggered by a consumer subscribing to a topic), in which I am calling assign with offset = latest for the topic partition, but this doesn't seem to have fixed my issue.
Hopefully there is a better solution out there than sleep.
Most HTTP client libraries have an implicit timeout. There's no guarantee your consumer will ever consume an event or that a downstream producer will send data to the "response topic".
Instead, have your initial request immediately return 201 Accepted status (or 400, for example, if you do request validation) with some tracking ID. Then require polling GET requests by-id for status updates either with 404 status or 200 + some status field within the request body.
You'll need a database to store intermediate state.
I have data coming in through RabbitMQ. The data is coming in constantly, multiple messages per second.
I need to forward that data to Kafka.
In my RabbitMQ delivery callback where I am getting the data from RabbitMQ I have a Kafka producer that immediately sends the recevied messages to Kafka.
My question is very simple. Is it better to create a Kafka producer outside of the callback method and use that one producer for all messages or should I create the producer inside the callback method and close it after the message is sent, which means that I am creating a new producer for each message?
It might be a naive question but I am new to Kafka and so far I did not find a definitive answer on the internet.
EDIT : I am using a Java Kafka client.
Creating a Kafka producer is an expensive operation, so using Kafka producer as a singleton will be a good practice considering performance and utilizing resources.
For Java clients, this is from the docs:
The producer is thread safe and should generally be shared among all threads for best performance.
For librdkafka based clients (confluent-dotnet, confluent-python etc.), I can link this related issue with this quote from the issue:
Yes, creating a singleton service like that is a good pattern. you definitely should not create a producer each time you want to produce a message - it is approximately 500,000 times less efficient.
Kafka producer is stateful. It contains meta info(periodical synced from brokers), send message buffer etc. So create producer for each message is impracticable.
In my application, there are multiple enterprises. each enterprise login and do some action like upload the data, then Kafka producer takes the data and sends to the topic. Another side Kafka consumer consumes data from the topic and performs business logic. and persists into the database.
In this case, everything is perfect when a single enterprise login. but when multiple enterprise logins then Kafka consuming in sequentially. i.e.,
how can I make the process parallel? on multiple client requests.
thanks in advance.
As mentioned in previous Answers you can use multiple partitions .
Another option is you get advantage of threading(Threadpoolexecutor) so follow will be like :
receive message -> create parallel thread to do the required logic --> ack message .
Please ensure you have throttling (using thread pool executors) application perforamance .
If that topic only has one partition, it's sequential on the consumer side. Multiple producers for one partition has no guarantees on ordering.
Consumers and producers will batch messages and process them in chunks.
Another side Kafka consumer consumes data from the topic and performs business logic. and persists into the database.
I suggest not using a regular consumer for this. Please research Kafka Connect and see if your database is supported
My app has 5+ consumers consuming off of five partitions on a kafka topic.(using kafka version 11) My consumer's each produce a message to another topic then save some state to the database, then do a manual_ immediate acknowledgement and move onto the next message.
I'm trying to solve the scenario when they emit successful to the outbound topic. then we have a failure/lose the consumer. When another consumer takes over the partition it will emit ANOTHER message to the outbound topic. This is bad :(
I discovered that kafka now has idempotent producers but from what I read it only guarantees for a producers session.
"When producer restarts, new PID gets assigned. So the idempotency is promised only for a single producer session" - (blog) - https://hevodata.com/blog/kafka-exactly-once
This seems largely useless to me. In my use-case the whole point is when I replay a message on another consumer it does not duplicate the outbound message.
Is there something i'm missing?
When using transactions, you shouldn't use ANY consumer-based mechanism, manual or otherwise, to commit the offsets.
Instead, you use the producer to send the offsets to the transaction so the offset commit is part of the transaction.
If configured with a KafkaTransactionManager, or ChainedKafkaTransactionManager the Spring listener container will send the offsets to the transaction when the listener exits normally.
If you don't use a Kafka transaction manager, you need to use the KafkaTemplate (or Producer if you are using the native APIs) to send the offsets to the transaction.
Using the consumer to commit the offset is not part of the transaction, so things will not work as expected.
When using a transaction manager, the listener container binds the Producer to the thread so any downstream KafkaTemplate operations participate in the transaction that the consumer starts. See the documentation.
I have one kafka producer and consumer.The kafka producer is publishing to one topic and the data is taken and some processing is done. The kafka consumer is reading from another topic about whether the processing of data from topic 1 was successful or not ie topic 2 has success or failure messages.Now Iam starting my consumer and then publishing the data to topic 1 .I want to make the producer and consumer synchronous ie once the producer publishes the data the consumer should read the success or failure message for that data and then the producer should proceed with the next set of data .
Apache Kafka and Publish/Subscribe messaging in general seeks to de-couple producers and consumers through the use of streaming async events. What you are describing is more like a batch job or a synchronous Remote Procedure Call (RPC) where the Producer and Consumer are explicitly coupled together. The standard Apache Kafka Producers/Consumer APIs do not support this Message Exchange Pattern but you can always write your own simple wrapper on top of the Kafka API's that uses Correlation IDs, Consumption ACKs, and Request/Response messages to make your own interface that behaves as you wish.
Short Answer : You can't do that, Kafka doesn't provide that support.
Long Answer: As Hans explained, Publish/Subscribe messaging model keeps Publish and subscribe completely unaware of each other and I believe that is where the power of this model lies. Producer can produce without worrying about if there is any consumer and consumer can consume without worrying about how many producers are there.
The closest you can do is, you can make your producer synchronous. Which means you can wait till your message is received and acknowledged by broker.
if you want to do that, flush after every send.