If there is a Kafka server "over there" somewhere across the network I would assume that there might be two ways that a Consumer could consume the messages:
By first of all 'subscribing' to the Topic and in effect telling the Kafka server where it is listening so that when a new message is Produced, Kafka proactively sends the message to the Consumer, across the network.
The Consumer has to poll the Kafka server asking for any new messages, using the offset of the messages it has currently taken.
Is this how Kafka works, and is the option configurable for which one to use?
I'm expanding my comment into an answer.
Reading through the consumer documentation, Kafka only supports option 2 as you've described it. It is the consumers responsibility to get messages from the Kafka server. In the 0.9.x.x Consumer this is accomplished by the poll() method. The Consumer polls the Kafka Server and returns messages if there are any. There are several reasons I believe they've chosen to avoid supporting option 1.
It limits the complexity needed in the Kafka Server. It's not the Server's responsibility to push messages to a consumer, it just holds the messages and waits till a consumer fetches them.
If the Kafka Server was pushing all messages to the consumers, it could overwhelm a consumer. Say a Producer was pushing messaging into a Kafka Server 10 msg/sec, but a certain Consumer could only process 2 msg/sec. If the Kafka Server attempted to push every message it received to that Consumer, the Consumer would quickly be overwhelmed by the number of messages it receives.
There's probably other reasons, but at the moment those were the two I thought about.
Related
My consumers stopped consuming topics from a third party system. But They work with internal topics. The topics from the third party system appear in the kafka web view, but are not consumed.
Skipping fetch for partition because previous request to some-cluster has not been processed
I did some research and increased the heartbeat and max-poll-records, without success.
See: Kafka consumer does not fetch new records when using topic pattern and large messages
and
Kafka Consumer stopped consuming messages from topic. We are using SmallRye Reactive Messaging connector to fetch records
How can I further debug or fix this problem
How can I further debug this problem?
I would take a look at the lowest level, i.e. network connectivity provided by org.apache.kafka.clients.NetworkClient. Logging that on trace level is going to show you outbound requests and received responses (or the timeouts received). This should help you identify whether it's something on your side or some kind of backend mis-configuration.
Is there a way to notify consumer about the new events being published to kafka topics which consumer has subscribed to while consumer is not actively listening? I know the question itself seems confusing but i was thinking if it is really necessary to have one process running continuously in order to consume messages. I think it will make consumer process easier if we know when the message is available to read.
Consumers read messages by polling the topic, so fundamentally, you must have a process running continuously. If the consumer does not poll within the value of the property max.poll.interval.ms, the consumer will leave the group. A hallmark feature of event-driven architectures is that consumers and producers are decoupled; the consumer does not know whether the producer even exists. Therefore, there is no way to know when a message is available to read without actively polling.
Currently learning about Kafka architecture and I'm confused as to why the consumer polls the broker. Why wouldn't the consumer simply subscribe to the broker and supply some callback information and wait for the broker to get a record? Then when the broker gets a relevant record, look up who needs to know about it and look at the callback information to dispatch the messages? This would reduce the number of network operations hugely.
Kafka can be used as a messaging service, but it is not the only possible usecase. You could also treat it as a remote file that can have bytes (records) read on demand.
Also, if notification mechanism were to implemented in message-broker fashion as you suggest, you'd need to handle slow consumers. Kafka leaves all control to consumers, allowing them to consume at their own speed.
I am beginner level at kafka and have developed consumer for kafka messages which looks good right now.
Though there is a requirement came along while testing of consumer that may be some throttling of messages will be needed at consumer side.
The consumer (.net core, using confluent), after receiving messages, calls api and api processes the message. As part this process, It has few number of read and write to database.
The scenario is, Consumer may receive millions or atleast few thousand of messages daily. This makes load on DB side as part of processing.
So I am thinking to put some throttling on receiving messages on kafka consumer so the DB will not be overloaded. I have checked the option for poll but seems its not all that I want.
For example, within 10 minutes, consumer can receive 100k messages only. Something like that.
Could anybody please suggest how to implement throttling of messages on kafka consumer or is there any better way that this can be handled?
I investigated more and come to know from expert that "throttling on consumer side is not easy to implement, since kafka consumer is implemented in such way to read and process messages as soon as they are available in kafka topic. So, speed is a benefit in kafka world :)"
Seems I can not do much at kafka consumer side. I am thinking to see on the other side and may be separating reads (to replica) and writes to the database can help.
We are using JBOSS 5.1.0, we using topic for storing our messages. And our client is making a durable subscription to get those messages.
Everything is working fine, but one issue is we are getting data from TCP client, we are processing and keeping it in topic, it is sending around 10 messages per second, and our client is reading one message at a time. There is a huge gap between that, and after sometime JBOSS Topic have many messages and it crashes saying out of memory.
IS there any workaround for this.
Basically the producer is producing 10x more messages than consumer can handle. If this situation is stable (not only during peak), this will never work.
If you limit the producer to send only one message per second (which is of course possible, e.g. check out RateLimiter), what will you do with extra messages on the producer side? If they are not queueing up in the topic, they will queue up on the producer side.
You have few choices:
somehow tune your consumer to process messages faster, so the topic is never filled up
tune the topic to use persistent storage. This is much better. Not only the topic won't store everything in memory, but you might also get transactional behaviour (messages are durable)
put a queue of messages that you want to set to the topic and process one message per second. That queue must be persistent and must be able to keep more messages than the topic currently can