Is there a way to notify consumer about the new events being published to kafka topics which consumer has subscribed to while consumer is not actively listening? I know the question itself seems confusing but i was thinking if it is really necessary to have one process running continuously in order to consume messages. I think it will make consumer process easier if we know when the message is available to read.
Consumers read messages by polling the topic, so fundamentally, you must have a process running continuously. If the consumer does not poll within the value of the property max.poll.interval.ms, the consumer will leave the group. A hallmark feature of event-driven architectures is that consumers and producers are decoupled; the consumer does not know whether the producer even exists. Therefore, there is no way to know when a message is available to read without actively polling.
Related
I'm using regex pattern to subscribe a group of topics, which might be created dynamically. However, there might be quite a while before the consumer discovers the new created topics.
I can set the topic.metadata.refresh.interval.ms property to change the polling intervals, but I'm concerned that short intervals might lead to overhead. So I think a notification approach would be better, i.e, when a new topic is created, the creator will notify the consumer service.
I'm looking for an API forcing the consumer to refresh its topic metadata. Didn't find a after looking through kafka Consumer APIs...any ideas?
The only API for this would be to .close() the consumer and re-subscribe it upon receiving such "notification event"
Is there any way we can detect crash or shut down of consumer?
I want that kafka server publish event when mentioned situation to all kafka clients (publishers, consumers....).
Is it possible?
Kafka keeps track of the consumed offsets per consumer on special internal topics. You could setup a special "monitoring service", have that constantly consuming from those offset internal topics, and trigger any notification/alerting mechanisms as needed so that your publishers and other consumers are notified programatically. This other SO question has more details about that.
Depending on your use case, lag monitoring is also a really good way to know if your consumers are falling behind and/or crashed. There's multiple solutions for that out there, or again, you could build your own to customize alerting/notification behavior.
Currently learning about Kafka architecture and I'm confused as to why the consumer polls the broker. Why wouldn't the consumer simply subscribe to the broker and supply some callback information and wait for the broker to get a record? Then when the broker gets a relevant record, look up who needs to know about it and look at the callback information to dispatch the messages? This would reduce the number of network operations hugely.
Kafka can be used as a messaging service, but it is not the only possible usecase. You could also treat it as a remote file that can have bytes (records) read on demand.
Also, if notification mechanism were to implemented in message-broker fashion as you suggest, you'd need to handle slow consumers. Kafka leaves all control to consumers, allowing them to consume at their own speed.
I am beginner level at kafka and have developed consumer for kafka messages which looks good right now.
Though there is a requirement came along while testing of consumer that may be some throttling of messages will be needed at consumer side.
The consumer (.net core, using confluent), after receiving messages, calls api and api processes the message. As part this process, It has few number of read and write to database.
The scenario is, Consumer may receive millions or atleast few thousand of messages daily. This makes load on DB side as part of processing.
So I am thinking to put some throttling on receiving messages on kafka consumer so the DB will not be overloaded. I have checked the option for poll but seems its not all that I want.
For example, within 10 minutes, consumer can receive 100k messages only. Something like that.
Could anybody please suggest how to implement throttling of messages on kafka consumer or is there any better way that this can be handled?
I investigated more and come to know from expert that "throttling on consumer side is not easy to implement, since kafka consumer is implemented in such way to read and process messages as soon as they are available in kafka topic. So, speed is a benefit in kafka world :)"
Seems I can not do much at kafka consumer side. I am thinking to see on the other side and may be separating reads (to replica) and writes to the database can help.
If there is a Kafka server "over there" somewhere across the network I would assume that there might be two ways that a Consumer could consume the messages:
By first of all 'subscribing' to the Topic and in effect telling the Kafka server where it is listening so that when a new message is Produced, Kafka proactively sends the message to the Consumer, across the network.
The Consumer has to poll the Kafka server asking for any new messages, using the offset of the messages it has currently taken.
Is this how Kafka works, and is the option configurable for which one to use?
I'm expanding my comment into an answer.
Reading through the consumer documentation, Kafka only supports option 2 as you've described it. It is the consumers responsibility to get messages from the Kafka server. In the 0.9.x.x Consumer this is accomplished by the poll() method. The Consumer polls the Kafka Server and returns messages if there are any. There are several reasons I believe they've chosen to avoid supporting option 1.
It limits the complexity needed in the Kafka Server. It's not the Server's responsibility to push messages to a consumer, it just holds the messages and waits till a consumer fetches them.
If the Kafka Server was pushing all messages to the consumers, it could overwhelm a consumer. Say a Producer was pushing messaging into a Kafka Server 10 msg/sec, but a certain Consumer could only process 2 msg/sec. If the Kafka Server attempted to push every message it received to that Consumer, the Consumer would quickly be overwhelmed by the number of messages it receives.
There's probably other reasons, but at the moment those were the two I thought about.