Kafka consumer group.id multiple messages - apache-kafka

I have developed a Kafka consumer and there will be multiple instances of this consumer running in production. I know how we can use group.id as to not duplicate the processing of data. Is there a way to have all the consumers receive the message but send one consumer a leader bit?
Is there a way to have a group.id per topic or even per key in a topic?

Looks like this has nothing to with Kafka. You already know that by providing a unique group.id for each consumer, all consumer instances will get all messages from the topic. Now as far as the push to DB is concerned - you can factor out that logic and try using a distributed lock so that the push to DB part of your application can only be executed by one of the consumers. Is this a Java based setup ?

Related

Apache Kafka PubSub

How does the pubsub work in Kafka?
I was reading about Kafka Topic-Partition theory, and it mentioned that In one consumer group, each partition will be processed by one consumer only. Now there are 2 cases:-
If the producer didn't mention the partition key or message key, the message will be evenly distributed across the partitions of a specific topic. ---- If this is the case, and there can be only one consumer(or subscriber in case of PubSub) per partition, how does all the subscribers receive the similar message?
If I producer produced to a specific partition, then how does the other consumers (or subscribers) receive the message?
How does the PubSub works in each of the above cases? if only a single consumer can get attached to a specific partition, how do other consumers receive the same msg?
Kafka prevents more than one consumer in a group from reading a single partition. If you have a use-case where multiple consumers in a consumer group need to process a particular event, then Kafka is probably the wrong tool. Otherwise, you need to write code external to Kafka API to transmit one consumer's events to other services via other protocols. Kafka Streams Interactive Query feature (with an RPC layer) is one example of this.
Or you would need lots of unique consumers groups to read the same event.
Answer doesn't change when producers send data to a specific partitions since "evenly distributed" partitions are still pre-computed, as far as the consumer is concerned. The consumer API is assigned to specific partitions, and does not coordinate the assignment with any producer.

Kafkajs - multiple consumers reading from same topic and partition

I'm planning to use Kafkajs https://kafka.js.org/ and implement it in a NodeJs server.
I would like to know what is the expected behavior in case I have 2 (or more) instances of the server running, each of them having a consumer which is configured with the same group id and topic?
Does this mean that they might read the same messages?
Should I specify a unique consumer group per each server instance ?
I read this - Multiple consumers consuming from same topic but not sure it applies for Kafkajs
It's not possible for a single consumer group to have multiple consumers reading from overlapping partitions in any Kafka library. If your topic only has one partition, only one instance of your application will be consuming it, and if that instance dies, the group will rebalance and the other instance will take over (potentially reading some of the same data, due to the nature of at-least-once delivery, but it's not at the same time as the other instance)

Kafka consumer: how to check if all the messages in the topic partition are completely consumed?

Is there any API or attributes that can be used or compared to determine if all messages in one topic partition are consumed? We are working on a test that will use another consumer in the same consumer group to check if the topic partition still has any message or not. One of our app services is also using Kafka to process internal events. So is there any way to sync the progress of message consumption?
Yes you can use the admin API.
From the admin API you can get the topic offsets for each partition, and a given consumer group offsets. If all messages read, the subtraction of the later from the first will evaluate to 0 for all partitions.

Kafka throttle producer based on consumer lag

Is there any way to pause or throttle a Kafka producer based on consumer lag or other consumer issues? Would the producer need to determine itself if there is consumer lag then perform throttling itself?
Kafka is build on Pub/Sub design. Producer publish the message to centralized topic. Multiple consumers can subscribe to that topic. Since multiple consumers are involve you cannot decide on producer speed. One consumer can be slow other can be fast. Also it is against the design principle otherwise both system will become tightly couple. If you have use case of throttling may be you should evaluate other framework like direct rest call.
Producer and Consumer are decoupled.
Producer push data to Kafka topics (partitions topic), that are stored in Kafka Brokers. Producer doesn't know who and how often consume messages.
Consumer consume data from Brokers. Consumer doesn't know how many producers produce the messages. Even the same messages can be consumed by several consumers that are in different groups. In example some consumer can consume faster than the other.
You can read more about Producer and Consumer in Apache Kafka webpage
It is not possible to throttle the producer/producers weighing on performance of consumers.
In my scenario I don't want to loose events if the disk size is
exceeded before a message is consumed
To tackle your issue, you have to depend on the parallelism offering by the Kafka. Your Kafka topic should have multiple partitions and producers has to use different keys to populate the topic. So your data will be distributed across multiple partitions and bringing a consumer group you can manage load within a group of consumers. All data within a partition can be processed in order, that may be relevant since you are dealing with event processing.

Kafka consumer was not able to poll

If one kafka consumer application is reading message from kafka, another was not able to read and vice-versa.
We are running two independent applications, one will process message and another will read and put into a database.
Message which is been processing in first application is not available in the second application
Without seeing the code I can only guess ... :-)
I think that you have a topic with only one partition and both consumer applications are in the same consumer group. In this case only one consumer gets messages from the only one partition in the topic.
If you want both applications receiving messages from same topic, you need to put them into different consumer groups (group.id parameter for the consumer properties).