Kafka consumer stopped consuming topics from third party system - apache-kafka

My consumers stopped consuming topics from a third party system. But They work with internal topics. The topics from the third party system appear in the kafka web view, but are not consumed.
Skipping fetch for partition because previous request to some-cluster has not been processed
I did some research and increased the heartbeat and max-poll-records, without success.
See: Kafka consumer does not fetch new records when using topic pattern and large messages
and
Kafka Consumer stopped consuming messages from topic. We are using SmallRye Reactive Messaging connector to fetch records
How can I further debug or fix this problem

How can I further debug this problem?
I would take a look at the lowest level, i.e. network connectivity provided by org.apache.kafka.clients.NetworkClient. Logging that on trace level is going to show you outbound requests and received responses (or the timeouts received). This should help you identify whether it's something on your side or some kind of backend mis-configuration.

Related

Why does kafka consumer poll the broker?

Currently learning about Kafka architecture and I'm confused as to why the consumer polls the broker. Why wouldn't the consumer simply subscribe to the broker and supply some callback information and wait for the broker to get a record? Then when the broker gets a relevant record, look up who needs to know about it and look at the callback information to dispatch the messages? This would reduce the number of network operations hugely.
Kafka can be used as a messaging service, but it is not the only possible usecase. You could also treat it as a remote file that can have bytes (records) read on demand.
Also, if notification mechanism were to implemented in message-broker fashion as you suggest, you'd need to handle slow consumers. Kafka leaves all control to consumers, allowing them to consume at their own speed.

Throttling of messages on consumer side

I am beginner level at kafka and have developed consumer for kafka messages which looks good right now.
Though there is a requirement came along while testing of consumer that may be some throttling of messages will be needed at consumer side.
The consumer (.net core, using confluent), after receiving messages, calls api and api processes the message. As part this process, It has few number of read and write to database.
The scenario is, Consumer may receive millions or atleast few thousand of messages daily. This makes load on DB side as part of processing.
So I am thinking to put some throttling on receiving messages on kafka consumer so the DB will not be overloaded. I have checked the option for poll but seems its not all that I want.
For example, within 10 minutes, consumer can receive 100k messages only. Something like that.
Could anybody please suggest how to implement throttling of messages on kafka consumer or is there any better way that this can be handled?
I investigated more and come to know from expert that "throttling on consumer side is not easy to implement, since kafka consumer is implemented in such way to read and process messages as soon as they are available in kafka topic. So, speed is a benefit in kafka world :)"
Seems I can not do much at kafka consumer side. I am thinking to see on the other side and may be separating reads (to replica) and writes to the database can help.

Process messages pushed through Kafka

I haven't used Kafka before and wanted to know if messages are published through Kafka what are the possible ways to capture that info?
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Haven't used Kafka before and while reading up I did find that Kafka needs ZooKeeper running too.
I don't need to publish info just process data received from Kafka publisher.
Any pointers will help.
Kafka is a distributed streaming platform that allows you to process streams of records in near real-time.
Producers publish records/messages to Topics in the cluster.
Consumers subscribe to Topics and process those messages as they are available.
The Kafka docs are an excellent place to get up to speed on the core concepts: https://kafka.apache.org/intro
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Kafka has its own TCP based protocol, not a native HTTP client (assuming that's what you actually mean by REST)
Consumers are the only way to get and subsequently process data, however plenty of external tooling exists to make it so you don't have to write really any code if you don't want to in order to work on that data

Kafka vs JMS for event publishing

In our scenario we have a set of micro services which interact with other services by sending event messages. We anticipate millions of messages per day at the peak. Every message is targeted to one or more listener types.
Our requirements are as follows:
Zero lost messages.
Ability to dynamically register multiple listeners of a specific
type in order to increase throughput.
Listeners are not guaranteed to be alive when messages are
dispatched.
We consider 2 options:
Send each message to JMS main queue then listeners of that queue will route the messages to specific queues according to message content, and then target services will listen to those specific queues.
Send messages to a Kafka topic by message type then target services will subscribe to the relevant topic and consume the messages.
What are the cons and pros for using either JMS or Kafka for that purpose?
Your first requirement is "zero lost messages". However, if you want publish-subscribe semantics (i.e. topics in JMS), but listeners are not guaranteed to be alive when messages are dispatched then JMS is a non-starter as those messages will simply be discarded (i.e. lost).
I would suggest to go with Kafka as it has fault tolerance mechanism and even if some message lost or not captured by any listener you can easily retrieve it from Kafka cluster.
Along with this you can easily add new listener / listener in group and kafka along with zookeeper will take care of managing it very well.
In summary, Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. Like many publish-subscribe messaging systems, Kafkamaintains feeds of messages in topics. Producers write data to topics and consumers read from topics.
Very easy for integration.

What ways can a Consumer consume message in Kafka?

If there is a Kafka server "over there" somewhere across the network I would assume that there might be two ways that a Consumer could consume the messages:
By first of all 'subscribing' to the Topic and in effect telling the Kafka server where it is listening so that when a new message is Produced, Kafka proactively sends the message to the Consumer, across the network.
The Consumer has to poll the Kafka server asking for any new messages, using the offset of the messages it has currently taken.
Is this how Kafka works, and is the option configurable for which one to use?
I'm expanding my comment into an answer.
Reading through the consumer documentation, Kafka only supports option 2 as you've described it. It is the consumers responsibility to get messages from the Kafka server. In the 0.9.x.x Consumer this is accomplished by the poll() method. The Consumer polls the Kafka Server and returns messages if there are any. There are several reasons I believe they've chosen to avoid supporting option 1.
It limits the complexity needed in the Kafka Server. It's not the Server's responsibility to push messages to a consumer, it just holds the messages and waits till a consumer fetches them.
If the Kafka Server was pushing all messages to the consumers, it could overwhelm a consumer. Say a Producer was pushing messaging into a Kafka Server 10 msg/sec, but a certain Consumer could only process 2 msg/sec. If the Kafka Server attempted to push every message it received to that Consumer, the Consumer would quickly be overwhelmed by the number of messages it receives.
There's probably other reasons, but at the moment those were the two I thought about.