How to improve kafka consumer performance PHP/RDKafka - apache-kafka

I'm consuming messages from kafka using php rdkafka and high-level consuming approach as it is described here https://arnaud.le-blanc.net/php-rdkafka/phpdoc/rdkafka.examples-high-level-consumer.html
The issue I have is that I have to wait for message approx around 3s to be read by consumer, topic has 20 partitions and there are 10 consumers in same group, even if there is only 1 consumer it is taking 3s to read message from kafka

Related

Apache Kafka partition offset rewinds during rebalance

We have a Kafka consumer application implemented using SpringBoot and Apache camel with manual commit. Topic which we consume has 30 partitions and retention period of 7 days. Consumer application deployed in 2 instances for HA and parallel processing is implemented using Apache Camel concurrent consumer configuration. Once we consume the data, we do message transformation and send to a REST endpoint. We have implemented the Circuit breaker pattern(Apache Camel Throttling route policy ) and for any runtime time issue with REST, Circuit Breaker kicks in and stops message consumption from Kafka Topic. Also we have are using the
max.poll.records = 100 and heartbeat interval = 1 ms
instead of default values to address the frequent commit offset failure exceptions. Load on the topic = 200 tps.
Problem statement:
Last week, we saw an issue - REST endpoint was slow in processing, and we saw the consumer group rebalance activity in broker logs and during this rebalance one of the partition consumer rewind the offset to 5 days old offset(almost 1 million back, offset id) and started the reprocessing of the messages causing huge duplication.
I looked in to both consumer log and broker log and not seen any exception or error and we are using the offset strategy as Latest. Also as I mentioned above we are using the manual commit and I believe, since consumer commits the offsets for every batch,
I expect when rebalance happens it should have rewind to at most one batch old offset, not 5 days old offset.
We have this implementation more than a year and saw this issue first time. We are using default values for most of the broker and consumer configuration other than max.poll.records, heartbeat interval and session timeout.
Kafka Broker = 2.4
Apache Camel= 3.0

Kafka losing messages under intense stress - normal?

A few details about the context. I have an application with the following flow:
Put 10k messages in the input topic.
Kafka App 1 will consume messages from input topic and write to mid topic (using exactly once with an idempotent producer writing consumer offsets for input topic plus message to mid topic)
Kafka App 2 will consume messages from mid topic and write to output topic (using exactly once with an idempotent producer writing consumer offsets for mid topic plus message to output topic)
Expect 10k messages in the output topic.
So I've configured both Kafka App 1 and App 2 to have exactly once processing, everything went well when processing messages and:
killing with -9 Zk instances and starting again the instance
killing with -9 Kafka brokers and starting again the broker
killing with -9 Kafka App 1 and starting again the app
killing with -9 Kafka App 2 and starting again the app
In above 4 cases exactly once is achieved and I don't lose messages and I don't have duplicates.
However when processing messages and randomly killing with -9 Zk Instances and Kafka Brokers (in parallel), I've saw that I lose messages.
Is this expected?

Kafka consumer is not reading from only one partition out of 4

I was using Kafka 0.9 and recently migrated to Kafka 1.0, but the client I am using is still 0.9. Irrespective of this I was facing a problem where our consumers sometimes intermittently stop consuming from one or two of the partitions.
I have 5 consumers reading from 24 partitions, these are consumer JVM threads created from an application deployed in the single server. Frequently one of the consumer (thread) will stop reading from one of the partitions it would be consuming from.
Eg: One consumer thread would be reading from partition 1,2,3,and 4. It will stop reading from partition 1 and end up in building the lag. I have to restart the consumer to start picking those messages from that particular partition.
I want to understand the issue here.
My consumer configuration
session.timeout.ms=150000
request.timeout.ms=300000
max.partition.fetch.bytes=153600

storm-kafka-client spout consume message at different speed for different partition

I have a storm cluster of 5 nodes and a kafka cluster installed on the same nodes.
storm version: 1.2.1
kafka version: 1.1.0
I also have a kafka topic of 10 partitions.
Now, i want to consume this topic's data and process it by storm. But the message consume speed is really strange.
For test reason, my storm topology have only one component - kafka spout, and i always set kafka spout parallelism of 10, so that one partition will be read by only one thread.
When i run this topology on just 1 worker, all partitions will be read quickly and the lag is almost the same.(very small)
When i run this topology on 2 workers, 5 partitions will be read quickly, but the other 5 partitions will be read very slowly.
When i run this topology on 3 or 4 workers, 7 partitions will be read quickly and the other 3 partitions will be read very slowly.
When i run this topology on more than 5 workers, 8 partitions will be read quickly and the other 2 partitions will be read slowly.
Another strange thing is, when i use a different consumer group id when configure kafka spout, the test result may be different.
For example, when i use a specific group id and run topology on 5 workers, only 2 partitions can be read quickly. Just the opposite of the test using another group id.
I have written a simple java app that call High-level kafka jave api. I run it on each of the 5 storm node and find it can consume data very quickly for every partition. So the network issue can be excluded.
Has anyone met the same problem before? Or has any idea of what may cause such strange problem?
Thanks!

Kafka Stream reprocessing old messages on rebalancing

I have a Kafka Streams application which reads data from a few topics, joins the data and writes it to another topic.
This is the configuration of my Kafka cluster:
5 Kafka brokers
Kafka topics - 15 partitions and replication factor 3.
My Kafka Streams applications are running on the same machines as my Kafka broker.
A few million records are consumed/produced per hour. Whenever I take a broker down, the application goes into rebalancing state and after rebalancing many times it starts consuming very old messages.
Note: When the Kafka Streams application was running fine, its consumer lag was almost 0. But after rebalancing, its lag went from 0 to 10million.
Can this be because of offset.retention.minutes.
This is the log and offset retention policy configuration of my Kafka broker:
log retention policy : 3 days
offset.retention.minutes : 1 day
In the below link I read that this could be the cause:
Offset Retention Minutes reference
Any help in this would be appreciated.
Offset retention can have an impact. Cf this FAQ: https://docs.confluent.io/current/streams/faq.html#why-is-my-application-re-processing-data-from-the-beginning
Also cf How to commit manually with Kafka Stream? and How to commit manually with Kafka Stream? about how commits work.