Kafka messages are reprocessed

Kafka messages are reprocessed - apache-kafka

We have a micro-services that produces and consumes messages from Kafka using spring-boot and spring-cloud-stream.
versions:
spring-boot: 1.5.8.RELEASE
spring-cloud-stream: Ditmars.RELEASE
Kafka server: kafka_2.11-1.0.0
EDIT:
We are working in a Kubernetes environment using StatefulSets cluster of 3 Kafka nodes and a cluster of 3 Zookeeper nodes.
We experienced several occurrences of old messages that are reprocessed when those messages where already processed few days ago.
Several notes:
Before that happens the following logs were printed (there are more similar lines this is just a summary)
Revoking previously assigned partitions [] for group enrollment-service
Discovered coordinator dev-kafka-1.kube1.iaas.watercorp.com:9092 (id: 2147483646 rack: null)
Successfully joined group enrollment-service with generation 320
The above-mentioned incidents of revoking and reassigning of partitions happens every few hours. And just in few of those incidents old messages are re-consumed. In most cases the reassigning doesn't triggers message consumption.
The messages are from different partitions.
There are more than 1 message per partition that is being reprocessed.
application.yml:
spring:
cloud:
stream:
kafka:
binder:
brokers: kafka
defaultBrokerPort: 9092
zkNodes: zookeeper
defaultZkPort: 2181
minPartitionCount: 2
replicationFactor: 1
autoCreateTopics: true
autoAddPartitions: true
headers: type,message_id
requiredAcks: 1
configuration:
"[security.protocol]": PLAINTEXT #TODO: This is a workaround. Should be security.protocol
bindings:
user-enrollment-input:
consumer:
autoRebalanceEnabled: true
autoCommitOnError: true
enableDlq: true
user-input:
consumer:
autoRebalanceEnabled: true
autoCommitOnError: true
enableDlq: true
enrollment-mail-output:
producer:
sync: true
configuration:
retries: 10000
enroll-users-output:
producer:
sync: true
configuration:
retries: 10000
default:
binder: kafka
contentType: application/json
group: enrollment-service
consumer:
maxAttempts: 1
producer:
partitionKeyExtractorClass: com.watercorp.messaging.PartitionKeyExtractor
bindings:
user-enrollment-input:
destination: enroll-users
consumer:
concurrency: 10
partitioned: true
user-input:
destination: user
consumer:
concurrency: 5
partitioned: true
enrollment-mail-output:
destination: send-enrollment-mail
producer:
partitionCount: 10
enroll-users-output:
destination: enroll-users
producer:
partitionCount: 10
Is there any configuration that I might be missing? What can cause this behavior?

So the actual problem is the one that is described in the following ticket: https://issues.apache.org/jira/browse/KAFKA-3806.
Using the suggested workaround fixed it.

Related

Create topic with partitions from Spring-Cloud Binder

I've a kafka configuration inside of my yaml file and for one input I'm adding multiple topics with different name. I want 3 of them to have 5 partitions and one of them must have 1 partition. How can I set it in my configuration file separately? Kafka version is old and it can't create partitions automatically so I need to make them manually.
spring:
cloud:
stream:
default:
group: xxxx
consumer:
partitioned: true
concurrency: 5
kafka:
binder:
configuration:
max.poll.interval.ms: 100000
max.poll.records: 100
brokers: xx.xx.xx.xx
defaultBrokerPort: 8080
replicationFactor: 1
function:
definition: methodName
bindings:
methodName-in-0:
destination: topic1, topic2, topic3, topic4

I solved this issue with decreasing default partition count 5 to 1. Somehow because of kafka version it can't decrease partition count but it can increase it.

Kafka producer threads keep increasing

We are using Spring Cloud Stream Kafka Binder and we are facing a problem with our application that consumes one topic and process the messages then outputs them to different topics.
These topics are also consumed within the same application and output to a final topic.
We noticed a huge number of producers threads being created whenever new messages are consumed by the first consumer and these threads remain live.
Here is my simplified config :
cloud:
stream:
function:
definition: schedulingConsumer;consumerSearch1;consumerSearch2
default:
group: ${kafka.group}
contentType: application/json
consumer:
maxAttempts: 1
backOffMaxInterval: 30
retryableExceptions:
org.springframework.messaging.converter.MessageConversionException: false
kafka:
binder:
brokers: ${kafka.brokers}
headerMapperBeanName: kafkaHeaderMapper
producerProperties:
linger.ms: 500
batch.size: ${kafka.batchs.size}
compression.type: gzip
consumerProperties:
session.timeout.ms: ${kafka.session.timeout.ms}
max.poll.interval.ms: ${kafka.poll.interval}
max.poll.records: ${kafka.poll.records}
commit.interval.ms: 500
allow.auto.create.topics: false
bindings:
schedulingConsumer-in-0:
destination: ${kafka.topics.schedules}
consumer.concurrency: 5
search1-out:
destination: ${kafka.topics.groups.search1}
search2-out:
destination: ${kafka.topics.groups.search2}
consumerSearch1-in-0:
destination: ${kafka.topics.groups.search1}
consumerSearch2-in-0:
destination: ${kafka.topics.groups.search2}
datasource-out:
destination: ${kafka.topics.search.output}
Here is a screenshot from the threads activity :
We have tried to separate the first consumer schedulingConsumer from others : consumerSearch1 and consumerSearch2 and the problem seems to be resolved.
The problem occurs when we have all these consumers running in the same instance.

It seems like it's a bug in spring cloud stream. I have reported it to the team Kafka producer threads keep increasing when 'spring.cloud.stream.dynamic-destination-cache-size' is exceeded #2452
So, the solution was to override the property spring.cloud.stream.dynamic-destination-cache-size and set a value greater the number of your output bindings.
For my case I had 14 output bindings.

Spring Cloud #StreamListener consumer not registering CONSUMER-ID, HOST and CLIENT-ID in consumer group

We have a spring cloud consumer to read message from one kafka topic. Following is the interface for channel
#Component
public interface CollectionStreams {
String INPUT_REPORT = "report-in";
String OUTPUT_REPORT = "report-out";
#Input(INPUT_REPORT)
SubscribableChannel inboundReport();
#Output(OUTPUT_REPORT)
MessageChannel outboundReportToJM();
}
The problem we are facing is that while listing in the consumer group “report” we are not able to see CONSUMER-ID, HOST and CLIENT-ID as expected.
[root#innolx131112 templates]# kubectl -n tmo-ccm exec kafka-test-client -- /usr/bin/kafka-consumer-groups --bootstrap-server kafka:9092 --describe -group report
Note: This will not show information about old Zookeeper-based consumers.
Consumer group 'report' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
report 3 2 3 1 - - -
report 1 4 4 0 - - -
report 2 1 1 0 - - -
report 4 2 2 0 - - -
report 0 2 2 0 - - -
By the way we are running our application as well as Kafka in Kubernetes.
Due to this issue we are not able to multiple POD of our application as all PODs
Following is the interface for channel
#Component
public interface CollectionStreams {
String INPUT_REPORT = "report-in";
String OUTPUT_REPORT = "report-out";
#Input(INPUT_REPORT)
SubscribableChannel inboundReport();
#Output(OUTPUT_REPORT)
MessageChannel outboundReportToJM();
}
And we have define the method as follows to read message from topic.
#StreamListener(CollectionStreams.INPUT_REPORT)
//public void handleMessage(#Payload MessageT message) {
public void handleMessage(#Payload MessageT message, #Headers MessageHeaders msg) {
Following is configuration yaml
**
cloud:
stream:
kafka:
binder:
brokers: kafka
autoCreateTopics: false
bindings:
report-in:
consumer:
autoCommitOffset: false
autoCommit: false
auto-offset-reset: earliest
autoCommitOnError: false
resetOffsets: false
autoRebalanceEnabled: false
ackEachRecord: false
bindings:
report-in:
destination: report
contentType: application/json
group: report
consumer:
concurrency: 5
partitioned: true
report-out:
destination: jobmanager
contentType: application/json
group: jobmanager
producer:
autoAddPartitions: true
**
We also have another consumer for which we have not set any kafka related consumer props and surprisingly those consumers are registering themselves properly.
Config:
cloud:
stream:
kafka:
binder:
autoCreateTopics: false
brokers: kafka
bindings:
parse-in:
destination: parser
contentType: application/json
group: parser
consumer:
concurrency: 3
partitioned: true
parse-out:
destination: jobmanager
contentType: application/json
group: jobmanager
producer:
partitionKeyExpression: headers['contentType']
autoAddPartitions: true
And describe command output as below
[root#innolx131112 shyama]# kubectl -n tmo-ccm exec kafka-test-client -- /usr/bin/kafka-consumer-groups --bootstrap-server kafka:9092 --describe -group parser
Note: This will not show information about old Zookeeper-based consumers.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
parser 0 14 14 0 consumer-3-cb99e45e-21b3-4efb-ac7b-4f642a9486be /192.168.0.26 consumer-3
parser 1 13 13 0 consumer-3-cb99e45e-21b3-4efb-ac7b-4f642a9486be /192.168.0.26 consumer-3
parser 2 15 15 0 consumer-4-bec1e4af-d771-47fe-83ad-3440f3a6d4bd /192.168.0.26 consumer-4
parser 3 15 15 0 consumer-4-bec1e4af-d771-47fe-83ad-3440f3a6d4bd /192.168.0.26 consumer-4
parser 4 12 12 0 consumer-5-b9ac7e36-58cf-40cb-b37d-a0fa092a0d56 /192.168.0.26 consumer-5
Is it that as we are providing kafka related props for 1st consumer(report consumer), its not able to register???

Consumer group 'report' has no active members.
This simply means your app wasn't running when you entered the command.
That info is transient and not retained when the app stops.
The partitions might be assigned to a different instance next time.
EDIT
whenever we are instantiating more consumers using same group they are processing the same old already processed by old consumers
Well, looking at your configuration more carefully, that's exactly what you asked for...
autoRebalanceEnabled: false
... means that Kafka will not use group management and the partitions will be allocated by Spring Cloud stream.
autoCommitOffset: false
means that Spring will not commit any offsets and that is the responsibility of your application. If you don't commit the offset, you will get the behavior you observe.

Producer Partition Count override not effective

Interpreting this - https://docs.spring.io/spring-cloud-stream/docs/current/reference/htmlsingle/#_producer_properties
my understanding is that, if the partitionCount override is less than the actual number of partitions on an existing kafka topic, then the producer should use the actual number of partitions rather than the override value. My experience is that the producer uses the partitionCount value regardless of how many partitions ( > partitionCount ) are actually configured on the kafka topic.
Ideally, I would like the producer to read the number of partitions on a pre-configured topic from kafka and write messages across all available partitions.
Spring-Cloud version: Finchley.RELEASE
Kafka Broker version : 1.0.0
application.yml:
spring:
application:
name: my-app
cloud:
stream:
default:
contentType: application/json
kafka:
binder:
brokers:
- ${KAFKA_HOST}:${KAFKA_PORT}
auto-create-topics: false
bindings:
input-channel:
destination: input-topic
contentType: application/json
group: input-group
output-channel:
destination: output-topic
contentType: application/json
producer:
partition-count: 2
partition-key-expression: payload['Id']
So, I am expecting that if the output topic is already configured with 6 partitions, the producer will recognise this and write to all of those. Could someone please verify my interpretation above? Or point out what I am missing to get the desired functionality?

spring cloud stream kafka

I've built a producer spring cloud stream app and kafka as binder. Here is the application.yml:
spring:
cloud:
stream:
instanceCount : 1
bindings:
output:
destination: topic-sink
producer:
partitionSelectorClass: com.partition.CustomPartition
partitionCount: 1
...
I have two instances (same app running on a single jvm) as consumers. Here is the application.yml:
spring:
cloud:
stream:
bindings:
input:
destination: topic-sink
group: hdfs-sink
consumer:
partitioned: true
...
My understanding of kafka groups is that messages will be consumed only once, for those consumers in same group. Let's say, if the producer app produces messages A, B and there are two consumer apps in the same group, message A will be read by consumer 1 and messages B, C will be read by consumer 2. However, my consumers are consuming same messages. Are my assumptions wrong?

I got the solution, thanks Arek. For 1 partition and 1 consumer.
I share the solution for producer\consumer in spring cloud stream app.
Producer:
spring:
cloud:
stream:
instanceCount : 1
bindings:
output:
destination: topic-sink
producer:
partitionSelectorClass: com.partition.CustomPartition
partitionCount: 1
Consumer:
spring:
cloud:
stream:
instanceIndex: 0 #between 0 and instanceCount - 1
instanceCount: 1
bindings:
input:
destination: topic-sink
group: hdfs-sink
consumer:
partitioned: true
kafka:
binder:
autoAddPartitions: true

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Kafka messages are reprocessed - apache-kafka

So the actual problem is the one that is described in the following ticket: https://issues.apache.org/jira/browse/KAFKA-3806. Using the suggested workaround fixed it.

Related

Create topic with partitions from Spring-Cloud Binder

Kafka producer threads keep increasing

Spring Cloud #StreamListener consumer not registering CONSUMER-ID, HOST and CLIENT-ID in consumer group

Producer Partition Count override not effective

spring cloud stream kafka

Categories

Resources