Spring Integration Kafka Partition change detection - apache-kafka

I have a problem to detect kafka partition changes in runtime. I configured kafka using spring integration and i could not figure out how to detect number of partition changes during the application is running.
The main problem is that the kafka topic has 10 partitions and my kafka config is below(it has 10 threads. 1 to 1 relationship between partition and thread). when i increase the kafka topic partition number(assuming 20 partitions), application could not read messages which come into newly created partitions without restarting.
is there any way to configure spring integration to be aware of this kind of changes?
Thanks in advance.
IntegrationFlow flow = IntegrationFlows.from(Kafka.messageDrivenChannelAdapter(kafkaConsumerFactory, topic)
.configureListenerContainer(c-> c.concurrency(10))))
.transform(transformer)
.get();
....

This happens automatically, every metadata.max.age.ms Consumer property (default 5 minutes)

Related

Apache flink: Dynamically change the consumer topic

We are building a flink application which will be deployed to AWS Kinesis data analytics(KDA). This application will consume from Kafka and write to S3.
Our setup is as follows:
We have a Kafka bootstrap server (MSK) with several topics.
We are planning to have multiple Flink applications deployed on KDA. All these applications will be part of the same consumer group.
We want to do the following:
Assume we have 10 kafka topics (topic 1 through topic 10).
Assume we have 5 Flink application (app 1 through app 5).
Initially we will assign applications to topics (ex: app 1 will consume from topic 1 and 2, app 2 will consume from topic 3 and 4 and so on).
We will store this in a config system (say CRUD application) and each Flink app when it comes alive, should be able to see which topic it should consume from based on its name. (This part we are able to do).
Assume, suddenly there is a huge surge in the number of messages coming through topic 4 for example. We will update the config system to point App 4 which is consuming from topic 7 and topic 8 to instead consume from topic 7 and topic 4.
We want the Flink app to stop consuming from the old topic and start consuming from the new topic without re-deploying the Flink app. We will have a poller which can inform the Flink app that it should consume from a different topic. The issue is making the Flink app stop consuming from the old topic and start consuming from the new topic without re-deployment.
Is there any way to do this? As far my research goes, the only way to make the Flink app to read from a new topic is to redeploy it. But want to check if there is some way some one has figured out.
Conversely: Will this situation be automatically handled if we make all the 5 Flink applications to listen to all the 10 topics? I mean, if there is a sudden surge in one of the topics, will the flink applications rebalance themselves to dedicate more resources to read from the hot topic since they are all part of the same consumer group?
Flink's Kafka consumer does not support stopping consumption from a topic (without a restart), but it does support dynamic topic and partition discovery. See https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/datastream/kafka/#dynamic-partition-discovery for details.

Apache Kafka partition offset rewinds during rebalance

We have a Kafka consumer application implemented using SpringBoot and Apache camel with manual commit. Topic which we consume has 30 partitions and retention period of 7 days. Consumer application deployed in 2 instances for HA and parallel processing is implemented using Apache Camel concurrent consumer configuration. Once we consume the data, we do message transformation and send to a REST endpoint. We have implemented the Circuit breaker pattern(Apache Camel Throttling route policy ) and for any runtime time issue with REST, Circuit Breaker kicks in and stops message consumption from Kafka Topic. Also we have are using the
max.poll.records = 100 and heartbeat interval = 1 ms
instead of default values to address the frequent commit offset failure exceptions. Load on the topic = 200 tps.
Problem statement:
Last week, we saw an issue - REST endpoint was slow in processing, and we saw the consumer group rebalance activity in broker logs and during this rebalance one of the partition consumer rewind the offset to 5 days old offset(almost 1 million back, offset id) and started the reprocessing of the messages causing huge duplication.
I looked in to both consumer log and broker log and not seen any exception or error and we are using the offset strategy as Latest. Also as I mentioned above we are using the manual commit and I believe, since consumer commits the offsets for every batch,
I expect when rebalance happens it should have rewind to at most one batch old offset, not 5 days old offset.
We have this implementation more than a year and saw this issue first time. We are using default values for most of the broker and consumer configuration other than max.poll.records, heartbeat interval and session timeout.
Kafka Broker = 2.4
Apache Camel= 3.0

Kafka consumer is not reading from only one partition out of 4

I was using Kafka 0.9 and recently migrated to Kafka 1.0, but the client I am using is still 0.9. Irrespective of this I was facing a problem where our consumers sometimes intermittently stop consuming from one or two of the partitions.
I have 5 consumers reading from 24 partitions, these are consumer JVM threads created from an application deployed in the single server. Frequently one of the consumer (thread) will stop reading from one of the partitions it would be consuming from.
Eg: One consumer thread would be reading from partition 1,2,3,and 4. It will stop reading from partition 1 and end up in building the lag. I have to restart the consumer to start picking those messages from that particular partition.
I want to understand the issue here.
My consumer configuration
session.timeout.ms=150000
request.timeout.ms=300000
max.partition.fetch.bytes=153600

How exactly Apache Nifi ConsumeKafka_1_0 processor works

I have Nifi cluster of and Kafka is also installed there.
Created one topic with 5 partitions, start consuming that topic with one gourp-id. So that each partition will get unique messages.
Now I created the 5 ConsumeKafka_1_0 processors having the intent of getting unique messages on each consumer side. But only 2 of the ConsumeKafka_1_0 are consuming all the messages rest is setting ideal.
Now what I did is started the 5 command line Kafka consumer, and what happened is, I was able to see the all the partitions are getting the messages and able to consume them from command line consumer in round-robin fashion only.
Also, I tried descried the Kafka group and what I saw was only 2 of the Nifi ConsumeKafka_1_0 is consuming all the 5 partitions and rest is ideal, see the snapshot.
Would you please let me what I am doing wrong here with Nifi consumer processor.
Note - i used Nifi version is 1.5 and Kafka version is 1.0.
I've written this article which explains how the integration with Kafka works:
https://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka
The Apache Kafka client (used by NiFi) is what assigns partitions to the consumers.
Typically if you had a 5 node NiFi cluster, with 1 ConsumeKafka processor on the canvas with 1 concurrent task, then each node would be consuming 1 partition.

storm-kafka-client spout consume message at different speed for different partition

I have a storm cluster of 5 nodes and a kafka cluster installed on the same nodes.
storm version: 1.2.1
kafka version: 1.1.0
I also have a kafka topic of 10 partitions.
Now, i want to consume this topic's data and process it by storm. But the message consume speed is really strange.
For test reason, my storm topology have only one component - kafka spout, and i always set kafka spout parallelism of 10, so that one partition will be read by only one thread.
When i run this topology on just 1 worker, all partitions will be read quickly and the lag is almost the same.(very small)
When i run this topology on 2 workers, 5 partitions will be read quickly, but the other 5 partitions will be read very slowly.
When i run this topology on 3 or 4 workers, 7 partitions will be read quickly and the other 3 partitions will be read very slowly.
When i run this topology on more than 5 workers, 8 partitions will be read quickly and the other 2 partitions will be read slowly.
Another strange thing is, when i use a different consumer group id when configure kafka spout, the test result may be different.
For example, when i use a specific group id and run topology on 5 workers, only 2 partitions can be read quickly. Just the opposite of the test using another group id.
I have written a simple java app that call High-level kafka jave api. I run it on each of the 5 storm node and find it can consume data very quickly for every partition. So the network issue can be excluded.
Has anyone met the same problem before? Or has any idea of what may cause such strange problem?
Thanks!