Kafka Streams application stops working after no message have been read for a while - apache-kafka

I have noticed that my Kafka Streams application stops working when it has not read new messages from the Kafka topic for a while. It is the third time that I have seen this happen.
No messages have been produced to the topic since 5 days. My Kafka Streams application, which also hosts a spark-java webserver, is still responsive. However, the messages I produce to the Kafka topic are not being read by Kafka Streams anymore. When I restart the application, all messages will be fetched from the broker.
How can I make my Kafka Streams Application more durable to this kind of scenario? It feels that Kafka Streams has an internal "timeout" after which it closes the connection to the Kafka broker when no messages have been received. I could not find such a setting in the documentation.
I use Kafka 1.1.0 and Kafka Streams 1.0.0

Kafka Streams do not have an internal timeout to control when to permanently close a connection to the Kafka broker; Kafka broker, on the other hand, does have some timeout value to close idle connections from clients. But Streams will keep trying to re-connect once it has some processed result data that is ready to be sent to the brokers. So I'd suspect your observed issue came from some other causes.
Could you share your application topology sketch and the config properties you used, for me to better understand your issue?

Related

Node disconnected errors in Kafka Streams API

I have creates Kafka Consumer code using Kafka Streams API and I'm able to fetch the records from Kafka Topic successfully and able to process those.
I'm seeing below error in the application logs that is showing as "Node -2 disconnected", but still there is no impact and kafka streams API is fetching transactions successfully from kafka topic.
org.apache.kafka.clients.NetworkClient : [AdminClient clientId=consumer-5322b972-1ef9-4976-b7fa-39a934374757-admin] Node -2 disconnected.
Can someone let me know what this error means and is there any way we can avoid these errors. I created Kafka Consumer code using Spring cloud Streams and Kafka Streams
It simply means your client disconnected from a broker. Could happen for many reasons, such as a temporary network blip.
If you have more than one replica of your data that's in sync, then the consumer will continue to work

Can kafka publish messages to AWS lambda

I have to publish messages from a kafka topic to lambda to process them and store in a database using a springboot application, i did some research and found something to consume messages from kafka
public Function<KStream<String, String>, KStream<String, String>> process(){} however, im not sure if this is only used to publish the consumed messages to another kafka topic or can be used as an event source to lambda, I need some guidance on consuming and converting the consumed kafka message to event source.
Brokers do not push. Consumers always poll.
Code shown is for Kafka Streams API, which primarily writes to new Kafka topics. While you could fire HTTP events to start a lambda, that's not recommended.
Alternatively, Kafka is already supported as an event source. You don't need to write any consumer code.
https://aws.amazon.com/about-aws/whats-new/2020/12/aws-lambda-now-supports-self-managed-apache-kafka-as-an-event-source/
This is possible from MSK or a self managed Kafka
process them and store in a database
Your lambda could process the data and send to a new Kafka topic using a producer. You can then use MSK Connect or run your own Kafka Connect cluster elsewhere to dump records into a database. No Spring/Java code would be necessary.

Apache Kafka: how to configure message buffering properly

I run a system comprising an InfluxDB, a Kafka Broker and data sources (sensors) producing time series data. The purpose of the broker is to protect the database from inbound event overload and as a format-agnostic platform for ingesting data. The data is transferred from Kafka to InfluxDB via Apache Camel routes.
I would like to use Kafka a intermediate message buffer in case a Camel route crashes or becomes unavailable - which is the most often error in the system. Up to now, I didn’t achieve to configure Kafka in a manner that inbound messages remain available for later consumption.
How do I configure it properly?
The messages will retain in Kafka topics based on its retention policies (you can choose between time or byte size limits) as described in the Topic Configurations. With
cleanup.policy=delete
Retention.ms=-1
the messages will in a Kafka topic will never be deleted.
Then your camel consumer will be able to re-read all messages (offsets) if you select a new consumer group or reset the offsets of the existing consumer group. Otherwise, your camel consumer might auto commit the messages (check corresponding consumer configuration) and it will not be possible to re-read offsets again for the same consumer group.
To limit the consumption rate of the camel consumer you may adjust configurations like maxPollRecords or fetchMaxBytes which are described in the docs.

Kafka: What happens when the entire Kafka Cluster is down?

We're testing out the Producer and Consumer using Kafka. A few questions:
What happens when all the brokers are down and they're not responding at all?
Does the Producer need to keep pinging the Kafka brokers to know when it is back up online? Or is there a more elegant way for the Producer application to know?
How does Zookeeper help in all this? What if the ZK is down as well?
If one or more brokers are down, the producer will re-try for a certain period of time (based on the settings). And during this time one or more of the consumers will not be able to read anything until the respective brokers are up.
But if the cluster is down for a longer period than your total re-try period, then probably you need to find a way to resend those failed messages again.
This is the one scenario where Kafka Mirroring(MirrorMaker tool) comes into picture.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330
Producer will fail because cluster will be unavailable, this means they will get a non retriable error from kafka client implementation and depending on your client process, message will buffer on the local send queue of your application.
I'm sure that if zookeeper is down your system will not work anymore. This is one of the weakness of Kafka, he need zookeeper to work.

kafka messages monitoring to show the actual messages published or consumed

I have kafka installed on my local server, and through some other application running in the server produers are publishing messages to the brokers inside of my kafka server, through the zookeeper I can easily see the health of my kafka which shows all the topics created inside my kafka server a, offsets inside topics etc etc, so only thing zookeeper is not able to show is the messages that are inside the individual topics, so someone recommended kafka-manager tool, I installed and ran it, it worked fine, it showed lot of information from my kafka server, but still it was not able to show real messages that are published or consumed by respective consumers inside my kafka server, so my question is , is there a way/tool/code to find out the messages published or consumed, I mean in addition to this kafka-manager or I have Install some plugins inside of the same kafka-manager so that it will also show the respective messages.Thanks in advance!!
A Kafka broker cannot tell you how many messages are have been consumed for a given consumer on a given topic. The only thing that a Kafka broker knows about is the current log offset of the consumer, and the current max offset of the log. It cannot however, tell you how many messages before the current offset the consumer actually received, as it keeps no counters around this, and the consumer defines its own initial position (as well as being able to seek to various places in the log).
You can get both of these numbers using the $KAFKA_HOME/bin/kafka-consumer-offset-checker.sh script.