High number of threads created on Java consumer than Scala consumer in Kafka - apache-kafka

In java consumer, Kafka coordinator thread and consumer thread are for each and every consumer. Due to that thread count is very high compared to previous kafka scala version.
Previously, I used kafka 0.10.0 version for queueing purpose. Now,I decide to upgrade kafka version to 2.8. While trying to upgrade, thread count is higher for 2.8.0 version consumer than 0.10.0 version when consuming. In kafka 0.10.0, it is scala consumer but in kafka 2.8.0, consumer is written in Java.
I created a consumer group with 15 consumers in scala consumer(kafka version 0.10.0). Here, 23 threads are created (ConsumerFetcherThread,LeaderFinderThread,Watcher-Executor,EventThread, etc,). But, when I try to create the consumer group with 15 consumers in java consumer (kafka version 2.8.0), 31 threads are created. When I increase consumer group, thread count will be very high.
For java consumer, 2 threads are created for each consumer (KafkaCoordinatorHeartbeatThread,ConsumerThread) so that thread count is higher than scala consumer.
Is this we are missing anything?
If not, then how can I reduce coordinator thread count ?

Related

Apache Kafka partition offset rewinds during rebalance

We have a Kafka consumer application implemented using SpringBoot and Apache camel with manual commit. Topic which we consume has 30 partitions and retention period of 7 days. Consumer application deployed in 2 instances for HA and parallel processing is implemented using Apache Camel concurrent consumer configuration. Once we consume the data, we do message transformation and send to a REST endpoint. We have implemented the Circuit breaker pattern(Apache Camel Throttling route policy ) and for any runtime time issue with REST, Circuit Breaker kicks in and stops message consumption from Kafka Topic. Also we have are using the
max.poll.records = 100 and heartbeat interval = 1 ms
instead of default values to address the frequent commit offset failure exceptions. Load on the topic = 200 tps.
Problem statement:
Last week, we saw an issue - REST endpoint was slow in processing, and we saw the consumer group rebalance activity in broker logs and during this rebalance one of the partition consumer rewind the offset to 5 days old offset(almost 1 million back, offset id) and started the reprocessing of the messages causing huge duplication.
I looked in to both consumer log and broker log and not seen any exception or error and we are using the offset strategy as Latest. Also as I mentioned above we are using the manual commit and I believe, since consumer commits the offsets for every batch,
I expect when rebalance happens it should have rewind to at most one batch old offset, not 5 days old offset.
We have this implementation more than a year and saw this issue first time. We are using default values for most of the broker and consumer configuration other than max.poll.records, heartbeat interval and session timeout.
Kafka Broker = 2.4
Apache Camel= 3.0

Can we update a consumer offset in kafka 0.10?

I am using older version of Kafka, 0.10. Is there any way by which I can update the consumer offset for a topic to an arbitrary number?
In Kafka 0.10, I don't think there was a tool to easily update a consumer offset.
You basically have 2 options:
Use the tools from a more recent Kafka version. Nowadays, consumer offsets can be updated using both the kafka-consumer-groups.sh tool or the AdminClient (only in trunk at the moment, it will be in Kafka 2.5).
Write a small application that starts a consumer and calls commitSync() to update its consumer offsets, like in ConsumerGroupCommand.resetOffsets()

Kafka consumer is not reading from only one partition out of 4

I was using Kafka 0.9 and recently migrated to Kafka 1.0, but the client I am using is still 0.9. Irrespective of this I was facing a problem where our consumers sometimes intermittently stop consuming from one or two of the partitions.
I have 5 consumers reading from 24 partitions, these are consumer JVM threads created from an application deployed in the single server. Frequently one of the consumer (thread) will stop reading from one of the partitions it would be consuming from.
Eg: One consumer thread would be reading from partition 1,2,3,and 4. It will stop reading from partition 1 and end up in building the lag. I have to restart the consumer to start picking those messages from that particular partition.
I want to understand the issue here.
My consumer configuration
session.timeout.ms=150000
request.timeout.ms=300000
max.partition.fetch.bytes=153600

How exactly Apache Nifi ConsumeKafka_1_0 processor works

I have Nifi cluster of and Kafka is also installed there.
Created one topic with 5 partitions, start consuming that topic with one gourp-id. So that each partition will get unique messages.
Now I created the 5 ConsumeKafka_1_0 processors having the intent of getting unique messages on each consumer side. But only 2 of the ConsumeKafka_1_0 are consuming all the messages rest is setting ideal.
Now what I did is started the 5 command line Kafka consumer, and what happened is, I was able to see the all the partitions are getting the messages and able to consume them from command line consumer in round-robin fashion only.
Also, I tried descried the Kafka group and what I saw was only 2 of the Nifi ConsumeKafka_1_0 is consuming all the 5 partitions and rest is ideal, see the snapshot.
Would you please let me what I am doing wrong here with Nifi consumer processor.
Note - i used Nifi version is 1.5 and Kafka version is 1.0.
I've written this article which explains how the integration with Kafka works:
https://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka
The Apache Kafka client (used by NiFi) is what assigns partitions to the consumers.
Typically if you had a 5 node NiFi cluster, with 1 ConsumeKafka processor on the canvas with 1 concurrent task, then each node would be consuming 1 partition.

Will there any data loss while upgrading kafka client from 0.8.0 to 0.10.0.1?

we are planning to upgrade Kafka client from 0.8.0 to 0.10.0.1 but since in consumers the offset in 0.8.0 version is stored in zookeeper where as it is stored in broker in version 0.10.0.1, if we start consumer with the same group and client id as of version 0.8.0 in 0.10.0.1 then will new consumer fetch the messages from where old consumer stopped consuming. If data loss is going to happen can we try migrating the offsets from zookeeper to broker and then start our new consumer
You can continue storing offsets in zookeeper on 0.10. In fact, if you just upgraded the client binaries, you won't see any change in the offset commit behavior. Where you will have to start thinking about migration of data and offsets is when you move to using the new consumer API in your application. This is where you will need to stop your old application instance based on the old API, check the offsets stored in zookeeper, and then start the new consumer API implementation from that offset to about data loss or duplication.