Is there a way I can make a kafka topic non persistant? I plan to use multiple consumers in a single topic but I dont want all my consumers picking up the same messages.
In kafka to simulate the behaviour of a queue all your consumers would be in the same consumer group.
See the kafka docs for more information
Consumers
Messaging traditionally has two models: queuing and publish-subscribe.
In a queue, a pool of consumers may read from a server and each
message goes to one of them; in publish-subscribe the message is
broadcast to all consumers. Kafka offers a single consumer abstraction
that generalizes both of these—the consumer group. Consumers label
themselves with a consumer group name, and each message published to a
topic is delivered to one consumer instance within each subscribing
consumer group. Consumer instances can be in separate processes or on
separate machines.
If all the consumer instances have the same consumer group, then this
works just like a traditional queue balancing load over the consumers.
If you want to control when messages are deleted from the log you can set retention.ms or retention.bytes in the topic configuration. Be aware that these parameters will delete a message disregarding if it was consumed or not
Related
I don't understand the practical use case of the consumer group in Kafka.
A partition can only be read by only one consumer in a consumer group, so only a subset of a topic record is read by one consumer.
Can someone help with any practical scenario where the consumer group helps?
It's for parallel processing of event messages from the specific topic.
Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.
If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.
Read more here:
https://docs.confluent.io/5.3.3/kafka/introduction.html#consumers
How does the pubsub work in Kafka?
I was reading about Kafka Topic-Partition theory, and it mentioned that In one consumer group, each partition will be processed by one consumer only. Now there are 2 cases:-
If the producer didn't mention the partition key or message key, the message will be evenly distributed across the partitions of a specific topic. ---- If this is the case, and there can be only one consumer(or subscriber in case of PubSub) per partition, how does all the subscribers receive the similar message?
If I producer produced to a specific partition, then how does the other consumers (or subscribers) receive the message?
How does the PubSub works in each of the above cases? if only a single consumer can get attached to a specific partition, how do other consumers receive the same msg?
Kafka prevents more than one consumer in a group from reading a single partition. If you have a use-case where multiple consumers in a consumer group need to process a particular event, then Kafka is probably the wrong tool. Otherwise, you need to write code external to Kafka API to transmit one consumer's events to other services via other protocols. Kafka Streams Interactive Query feature (with an RPC layer) is one example of this.
Or you would need lots of unique consumers groups to read the same event.
Answer doesn't change when producers send data to a specific partitions since "evenly distributed" partitions are still pre-computed, as far as the consumer is concerned. The consumer API is assigned to specific partitions, and does not coordinate the assignment with any producer.
I'm a newbie in Kafka. I had a glance at the Kafka Documentation. It seems that the the message dispatched to a subscribing consumer group is implemented by binding the partition with the consumer instance.
One important thing we should remember when we work with Apache Kafka is the number of consumers in the same consumer group should be less than or equal the number of partitions in the consumed topic. Otherwise, the exceedable consumers will not be received any messages from the topic.
In a non-prod environment, I didn't config the topic partition. In such case, is there only a single partition in Kafka. And If I start multiple consumers sharing the same group and subscribe them to the topic, would the message always dispatched to the same instance in the group? In other words, I have to partition the topic to get the load-balance feature in consumer group?
Thanks!
You are absolutely right. One partitions cannot be processed in paralell (by one consumer group). You can treat partition as atomic and it cannot be split.
If you configure non-prod and prod env with the same amount of partitions per topic, that should help you to find correct number of conumsers and catch problems before moving to prod.
I have two consumer servers with same group id subscribed the same topic.
A kafka server is running with only one partition.
As far as I know, the message should be consumed randomly in those two consumer servers.
But now it seems to be always the same consumer server A consume messages, another one does not consume messages.If I stop consumer server A, another one will work fine.
What I expect that they can consume message randomly.
To be able to use two consumer instances in parallel you need at least two partitions in the topic. A consumer will bind to one or more partitions of a topic and other consumers with the same groupId will not claim partitions which already have consumers bound to them. If a consumer fails/crashes, the partition will be released and then picked up by another consumer instance.
Does Kafka have something analogues to an SQS visibility timeout?
What is the property called?
I can't seem to find anything like this in the docs.
Kafka works a little bit differently than SQS.
Every message resides on a single topic partition. Kafka consumers are organized into consumer groups. A single partition can be assigned only to a single consumer in a group. That means that other consumers from the same CG won't get the same message and the same message will only be re-sent if the same consumer pulls messages from a broker and hasn't committed the offset.
If Kafka broker designated as group coordinator detects that consumer died, rebalance happens and that partition can be assigned to another consumer. But again this will be the only consumer that gets messages from that partition.
So as you can see since Kafka is not using the competing consumer pattern, there's no notion of visibility timeout.