Kafka stream consumer skipping a few offset no log compaction enabled - apache-kafka

kafka server version: 3.2.0
kstreams version : 2.7.2
I have a producer, which is producing to topic foo, I can see the offset from the producer in the logs.
We have kafka stream application reading from the same topic foo. What I am observing is that the consumer skips reading offset. Sometime the skip is over 30 to 40 offsets. I am printing the offset in process method using ProcessorContext.offset() method.
Skipping of offset seems to be very common, will using ProcessorContext.offset() result in every offset being printed ?.
Some points
No kafka rebalance has occurred.
No restarts of the container
We have 3 state store defined in the streams application, and the change log topic has replication factor of 1.
We have kafka broker outage where few broker were down some extend period of time, about 3 weeks back. I dont know how things impact the message i should consumer today.
We have NOT set processing.guarantee, so default should be AT_LEAST_ONCE. We do not have transactions enabled, so it cant be transactional messages. which are skipped
The log to print offset if the first line in the process method.
Question:
What internal kafka stream logs can I see to see if messages are consumed.
Any reason why the messages could be skipped

Related

Kafka consumer restart when it is reading from beginning of topic

I am new to Kafka .
Lets say I have one kafka topic topoic1(replicationfactor=1,partitions=1) and one consumer(java process) reading(readfrombegining/earliest) from kafka topic1 . Consumer is running fine for some time and later for some reason it got hung and killed by admin.
So if I Restart the consumer it will read from beginning again leading to data duplication So how to handle this usecase ?
NOTE: I am aware that if the consumer code written as to read from latest then we will not get duplicated data. Other than this is there in solution ?
Consumers will only reset from the beginning when auto.offset.reset=earliest, and
you have auto commits disabled + don't manually commit offsets
or don't manually seek the consumer upon startup; i.e. you can track offsets externally from Kafka

New Kafka consumer ignoring earliest offset

I have a kafka topic(XYZ) with just one partition and one consumer(C1) running on RHEL6 machine. I have copied the same setup on to a RHEL7 machine and stopped the C1 and started the new consumer(C2) using the same group id as the C1. C2 is able to connect, exchange heartbeat messages and print following debug statements repeatedly but not able to consume messages. Received zero records. It uses "earliest" offset reset but seems to ignore it. note: all consumer share the same code version.
Logs:
Fetcher: [Consumer clientId=testclient-0, groupId=mcu] Fetch READ_UNCOMMITTED request at offset 1001 for partition XYZ-0 returned fetch data (error=none, highWaterMark=1001, lastStableOffset=1001, logStartOffset=10 ....
Adding READ_UNCOMMITTED request for partiion...
Sending READ_UNCOMMITTED request for partition...
Resetting offset for partition XYZ-0 to the committed offset 1001
My question is why it is resetting offset to the latest offset value and not consume message from beginning.
Following is what i have tried to resolve the issue.
I am able to consume message from C2 consumer from another topic on same cluster.
I am able to consume message from same topic XYZ from another consumer C3 on a different RHEL 6 machine with exactly same configuration.
I have used a new Group id but no success
Produced message after staring the consumer first but no success.
Note: Kafka client vaersion 1.0
Any help would be highly appreciated. Thank you.
I was able to fix it after setting the following jvm argument.
-Dorg.xerial.snappy.tempdir=/some/other/path/with/execpermissions/
The new host where I set up the consumer didn't have write permissions to /tmp dir(default) which is needed to unpack snappy library. Also make sure /tmp has space left for these libraries. The RHEL6 hosts with old consumers had write permissions. The compression type was not set on topic configuration so it used snappy by default. Other topics where I could consume from had compression type set to producer, so it didn't need snappy libraries and hence worked. The issue had nothing to do with RHEL 6 or RHEL7 version.
More info: Kafka Broker throws error for clients that produce snappy compressed messages

Apache Kafka partition offset rewinds during rebalance

We have a Kafka consumer application implemented using SpringBoot and Apache camel with manual commit. Topic which we consume has 30 partitions and retention period of 7 days. Consumer application deployed in 2 instances for HA and parallel processing is implemented using Apache Camel concurrent consumer configuration. Once we consume the data, we do message transformation and send to a REST endpoint. We have implemented the Circuit breaker pattern(Apache Camel Throttling route policy ) and for any runtime time issue with REST, Circuit Breaker kicks in and stops message consumption from Kafka Topic. Also we have are using the
max.poll.records = 100 and heartbeat interval = 1 ms
instead of default values to address the frequent commit offset failure exceptions. Load on the topic = 200 tps.
Problem statement:
Last week, we saw an issue - REST endpoint was slow in processing, and we saw the consumer group rebalance activity in broker logs and during this rebalance one of the partition consumer rewind the offset to 5 days old offset(almost 1 million back, offset id) and started the reprocessing of the messages causing huge duplication.
I looked in to both consumer log and broker log and not seen any exception or error and we are using the offset strategy as Latest. Also as I mentioned above we are using the manual commit and I believe, since consumer commits the offsets for every batch,
I expect when rebalance happens it should have rewind to at most one batch old offset, not 5 days old offset.
We have this implementation more than a year and saw this issue first time. We are using default values for most of the broker and consumer configuration other than max.poll.records, heartbeat interval and session timeout.
Kafka Broker = 2.4
Apache Camel= 3.0

Kafka consumer is not reading from only one partition out of 4

I was using Kafka 0.9 and recently migrated to Kafka 1.0, but the client I am using is still 0.9. Irrespective of this I was facing a problem where our consumers sometimes intermittently stop consuming from one or two of the partitions.
I have 5 consumers reading from 24 partitions, these are consumer JVM threads created from an application deployed in the single server. Frequently one of the consumer (thread) will stop reading from one of the partitions it would be consuming from.
Eg: One consumer thread would be reading from partition 1,2,3,and 4. It will stop reading from partition 1 and end up in building the lag. I have to restart the consumer to start picking those messages from that particular partition.
I want to understand the issue here.
My consumer configuration
session.timeout.ms=150000
request.timeout.ms=300000
max.partition.fetch.bytes=153600

Kafka Stream reprocessing old messages on rebalancing

I have a Kafka Streams application which reads data from a few topics, joins the data and writes it to another topic.
This is the configuration of my Kafka cluster:
5 Kafka brokers
Kafka topics - 15 partitions and replication factor 3.
My Kafka Streams applications are running on the same machines as my Kafka broker.
A few million records are consumed/produced per hour. Whenever I take a broker down, the application goes into rebalancing state and after rebalancing many times it starts consuming very old messages.
Note: When the Kafka Streams application was running fine, its consumer lag was almost 0. But after rebalancing, its lag went from 0 to 10million.
Can this be because of offset.retention.minutes.
This is the log and offset retention policy configuration of my Kafka broker:
log retention policy : 3 days
offset.retention.minutes : 1 day
In the below link I read that this could be the cause:
Offset Retention Minutes reference
Any help in this would be appreciated.
Offset retention can have an impact. Cf this FAQ: https://docs.confluent.io/current/streams/faq.html#why-is-my-application-re-processing-data-from-the-beginning
Also cf How to commit manually with Kafka Stream? and How to commit manually with Kafka Stream? about how commits work.