Does Kafka have a visibility timeout? - apache-kafka

Does Kafka have something analogues to an SQS visibility timeout?
What is the property called?
I can't seem to find anything like this in the docs.

Kafka works a little bit differently than SQS.
Every message resides on a single topic partition. Kafka consumers are organized into consumer groups. A single partition can be assigned only to a single consumer in a group. That means that other consumers from the same CG won't get the same message and the same message will only be re-sent if the same consumer pulls messages from a broker and hasn't committed the offset.
If Kafka broker designated as group coordinator detects that consumer died, rebalance happens and that partition can be assigned to another consumer. But again this will be the only consumer that gets messages from that partition.
So as you can see since Kafka is not using the competing consumer pattern, there's no notion of visibility timeout.

Related

Apache Kafka PubSub

How does the pubsub work in Kafka?
I was reading about Kafka Topic-Partition theory, and it mentioned that In one consumer group, each partition will be processed by one consumer only. Now there are 2 cases:-
If the producer didn't mention the partition key or message key, the message will be evenly distributed across the partitions of a specific topic. ---- If this is the case, and there can be only one consumer(or subscriber in case of PubSub) per partition, how does all the subscribers receive the similar message?
If I producer produced to a specific partition, then how does the other consumers (or subscribers) receive the message?
How does the PubSub works in each of the above cases? if only a single consumer can get attached to a specific partition, how do other consumers receive the same msg?
Kafka prevents more than one consumer in a group from reading a single partition. If you have a use-case where multiple consumers in a consumer group need to process a particular event, then Kafka is probably the wrong tool. Otherwise, you need to write code external to Kafka API to transmit one consumer's events to other services via other protocols. Kafka Streams Interactive Query feature (with an RPC layer) is one example of this.
Or you would need lots of unique consumers groups to read the same event.
Answer doesn't change when producers send data to a specific partitions since "evenly distributed" partitions are still pre-computed, as far as the consumer is concerned. The consumer API is assigned to specific partitions, and does not coordinate the assignment with any producer.

Client rebalacing when leader election takes place

I have a custom kafka setup, where my application and a kafka broker are placed in a single node.
To make sure that the app instance only consumes the partitions in that node(to reduce network overhead), I have a custom partition assignor assigned to all members of the group.
However, if a broker fails and then it rejoins the cluster, will that trigger a consumer re-balance ? Similarly, if I add a new broker and trigger the partition re-assignment script, would that also trigger a re-balance ?
Typically, a consumer rebalancing will happen when :
A consumer joins or leaves the Consumer Group.
A consumer fails to send an heartbeat request to the Broker Coordinator before reaching a timeout (see session.timeout.ms and heartbeat.interval.ms) managing the group.
A consumer does not invoke the poll() method frequently enough (see max.poll.interval.ms).
A consumer subscription has changed.
Metadata for a topic matching the subscription has changed (i.e: the number of partitions has been increased).
A new topic matching the subscription has been created (when using pattern).
A topic matching the subscrption has been deleted (when using pattern).
When a rebalancing is manually triggered using the using Java Consumer API (see Consumer#enforceRebalance()).
When the broker acting as coordinator of the group fails.
So, to answer your question adding a new broker will not trigger a partition-reassignment.
Here is blog post explaining how the rebalance protocol works Apache Kafka Rebalance Protocol, or the magic behind your streams applications.

Does consumer consume from replica partitions if multiple consumers running under same consumer group?

I am writing a kafka consumer application. I have a topic with 4 partitions - 1 is leader and 3 are followers. Producer uses key to identify a partition to push a message.
If I write a consumer and run it on different nodes or start 4 instances of same consumer, how message consuming will happen ? Does all 4 instances will get same messages ?
What happens in the case of multiple consumer(same group) consuming a single topic?
Do they get same data?
How offset is managed? Is it separate for each consumer?
I would suggest that you read at least first few chapters of confluent's definitive guide to kafka to get a priliminary understanding of how kafka works.
I've kept my answers brief. Please refer to the book for detailed explanation.
How offset is managed? Is it separate for each consumer?
Depends on the group id. Only one offset is managed for a group.
What happens in the case of multiple consumer(same group) consuming a single topic?
Consumers can be multiple - all can be identified by the same or different groups.
If 2 consumers belong to the same group, both will not get all messages.
Do they get same data?
No. Once a message is sent and a read is committed, the offset is incremented for that group. So a different consumer with the same group will not receive that message.
Hope that helps :)
What happens in the case of multiple consumer(same group) consuming a single topic?
Answer: Producers send records to a particular partition based on the record’s key here. The default partitioner for Java uses a hash of the record’s key to choose the partition. When there are multiple consumers in same consumer group, each consumer gets different partition. So, in this case, only single consumer receives all the messages. When the consumer which is receiving messages goes down, group coordinator (one of the brokers in the cluster) triggers rebalance and then that partition is assigned to one of the available consumer.
Do they get same data?
Answer: If consumer commits consumed messages to partition and goes down, so as stated above, rebalance occurs. The consumer who gets this partition, will not get messages. But if consumer goes down before committing its then the consumer who gets this partition, will get messages.
How offset is managed? Is it separate for each consumer?
Answer: No, offset is not separate to each consumer. Partition never gets assigned to multiple consumers in same consumer group at a time. The consumer who gets partition assigned, gets offset as well by default.

Is manipulating the "read-offset" as kafka consumer bad-practice?

We have an ongoing discussion about the correct (or intended) usage of Kafka for events.
The arguing point is the ability of a consumer to not only subscribe (or resubscribe) to a topic but also to modify its own read offset.
Am I right in saying that "A consumer should be design in a way that it never modifies its own read offset!"
Reasoning behind this:
The consumer cannot know what events actually are stored inside a topic (log retention)
... So restoring a complete state from "delta"-events is not possible.
The consumer has consumed an event once and confirmed this to the broker. why consuming again?
If your consumer instances belongs to same consumer group, consumer need not to keep the state of reading from topic. The state of reading is nothing but offset of topic up to which record your consumer read so far. If your topic has multiple partitions consumers belong to the same consumer group can distribute the work load among consumers. In case one of the consumers crashed or failed other consumers from same consumer group will be aware of from which partition offset they continue to consume the record.

How to create a kafka non persistant topic

Is there a way I can make a kafka topic non persistant? I plan to use multiple consumers in a single topic but I dont want all my consumers picking up the same messages.
In kafka to simulate the behaviour of a queue all your consumers would be in the same consumer group.
See the kafka docs for more information
Consumers
Messaging traditionally has two models: queuing and publish-subscribe.
In a queue, a pool of consumers may read from a server and each
message goes to one of them; in publish-subscribe the message is
broadcast to all consumers. Kafka offers a single consumer abstraction
that generalizes both of these—the consumer group. Consumers label
themselves with a consumer group name, and each message published to a
topic is delivered to one consumer instance within each subscribing
consumer group. Consumer instances can be in separate processes or on
separate machines.
If all the consumer instances have the same consumer group, then this
works just like a traditional queue balancing load over the consumers.
If you want to control when messages are deleted from the log you can set retention.ms or retention.bytes in the topic configuration. Be aware that these parameters will delete a message disregarding if it was consumed or not