Create topic with partitions from Spring-Cloud Binder - apache-kafka

I've a kafka configuration inside of my yaml file and for one input I'm adding multiple topics with different name. I want 3 of them to have 5 partitions and one of them must have 1 partition. How can I set it in my configuration file separately? Kafka version is old and it can't create partitions automatically so I need to make them manually.
spring:
cloud:
stream:
default:
group: xxxx
consumer:
partitioned: true
concurrency: 5
kafka:
binder:
configuration:
max.poll.interval.ms: 100000
max.poll.records: 100
brokers: xx.xx.xx.xx
defaultBrokerPort: 8080
replicationFactor: 1
function:
definition: methodName
bindings:
methodName-in-0:
destination: topic1, topic2, topic3, topic4

I solved this issue with decreasing default partition count 5 to 1. Somehow because of kafka version it can't decrease partition count but it can increase it.

Related

Kafka producer threads keep increasing

We are using Spring Cloud Stream Kafka Binder and we are facing a problem with our application that consumes one topic and process the messages then outputs them to different topics.
These topics are also consumed within the same application and output to a final topic.
We noticed a huge number of producers threads being created whenever new messages are consumed by the first consumer and these threads remain live.
Here is my simplified config :
cloud:
stream:
function:
definition: schedulingConsumer;consumerSearch1;consumerSearch2
default:
group: ${kafka.group}
contentType: application/json
consumer:
maxAttempts: 1
backOffMaxInterval: 30
retryableExceptions:
org.springframework.messaging.converter.MessageConversionException: false
kafka:
binder:
brokers: ${kafka.brokers}
headerMapperBeanName: kafkaHeaderMapper
producerProperties:
linger.ms: 500
batch.size: ${kafka.batchs.size}
compression.type: gzip
consumerProperties:
session.timeout.ms: ${kafka.session.timeout.ms}
max.poll.interval.ms: ${kafka.poll.interval}
max.poll.records: ${kafka.poll.records}
commit.interval.ms: 500
allow.auto.create.topics: false
bindings:
schedulingConsumer-in-0:
destination: ${kafka.topics.schedules}
consumer.concurrency: 5
search1-out:
destination: ${kafka.topics.groups.search1}
search2-out:
destination: ${kafka.topics.groups.search2}
consumerSearch1-in-0:
destination: ${kafka.topics.groups.search1}
consumerSearch2-in-0:
destination: ${kafka.topics.groups.search2}
datasource-out:
destination: ${kafka.topics.search.output}
Here is a screenshot from the threads activity :
We have tried to separate the first consumer schedulingConsumer from others : consumerSearch1 and consumerSearch2 and the problem seems to be resolved.
The problem occurs when we have all these consumers running in the same instance.
It seems like it's a bug in spring cloud stream. I have reported it to the team Kafka producer threads keep increasing when 'spring.cloud.stream.dynamic-destination-cache-size' is exceeded #2452
So, the solution was to override the property spring.cloud.stream.dynamic-destination-cache-size and set a value greater the number of your output bindings.
For my case I had 14 output bindings.

Batch Consumer with a given period doesn't work with multiple partition in Spring Cloud Stream(StreamListener)?

#StreamListener(value = PersonStream.INPUT)
private void personBulkReceiver(List<Person> person) {
//....
}
spring:
cloud:
stream:
kafka:
binders:
bulkKafka:
type: kafka
environment:
spring:
cloud:
stream:
kafka:
binder:
brokers: localhost:9092
configuration:
max.poll.records: 1500
fetch.min.bytes: 10000000
fetch.max.wait.ms: 10000
value.deserializer: tr.cloud.stream.examples.PersonDeserializer
bindings:
person-topic-in:
binder: bulkKafka
destination: person-topic
contentType: application/person
group : person-group
consumer:
batch-mode: true
I'am using Spring Cloud Stream with Kafka. In a StreamListener when partition count is 1 I can consume records in batch mode in every 5000 ms.
My .yml configuration is fetch.min.bytes = 10000000 && fetch.max.wait.ms = 50000 && max.poll.records = 1500 as stated above.
I can receive batch records in every 5000 ms. since batch record size doesn't exceed 10000000 bytes.
But when partition count is more than 1 StreamListener consumes records earlier than 5000 ms.
Is there any configuration for this case?
Or is this case is the natural result of independent threads working for each partition?
When partition count is more than 1 what is the difference in working logic ?
According to your readme...
And there is always a lot of data on the topic.
So that doesn't match your question where you said...
I can receive batch records in every 5000 ms. since batch record size doesn't exceed 10000000 bytes.
When there is more data than that, it will always be pushed to the client.
Consider using a Polled Consumer instead, to receive data at your desired rate.

Spring Cloud #StreamListener consumer not registering CONSUMER-ID, HOST and CLIENT-ID in consumer group

We have a spring cloud consumer to read message from one kafka topic. Following is the interface for channel
#Component
public interface CollectionStreams {
String INPUT_REPORT = "report-in";
String OUTPUT_REPORT = "report-out";
#Input(INPUT_REPORT)
SubscribableChannel inboundReport();
#Output(OUTPUT_REPORT)
MessageChannel outboundReportToJM();
}
The problem we are facing is that while listing in the consumer group “report” we are not able to see CONSUMER-ID, HOST and CLIENT-ID as expected.
[root#innolx131112 templates]# kubectl -n tmo-ccm exec kafka-test-client -- /usr/bin/kafka-consumer-groups --bootstrap-server kafka:9092 --describe -group report
Note: This will not show information about old Zookeeper-based consumers.
Consumer group 'report' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
report 3 2 3 1 - - -
report 1 4 4 0 - - -
report 2 1 1 0 - - -
report 4 2 2 0 - - -
report 0 2 2 0 - - -
By the way we are running our application as well as Kafka in Kubernetes.
Due to this issue we are not able to multiple POD of our application as all PODs
Following is the interface for channel
#Component
public interface CollectionStreams {
String INPUT_REPORT = "report-in";
String OUTPUT_REPORT = "report-out";
#Input(INPUT_REPORT)
SubscribableChannel inboundReport();
#Output(OUTPUT_REPORT)
MessageChannel outboundReportToJM();
}
And we have define the method as follows to read message from topic.
#StreamListener(CollectionStreams.INPUT_REPORT)
//public void handleMessage(#Payload MessageT message) {
public void handleMessage(#Payload MessageT message, #Headers MessageHeaders msg) {
Following is configuration yaml
**
cloud:
stream:
kafka:
binder:
brokers: kafka
autoCreateTopics: false
bindings:
report-in:
consumer:
autoCommitOffset: false
autoCommit: false
auto-offset-reset: earliest
autoCommitOnError: false
resetOffsets: false
autoRebalanceEnabled: false
ackEachRecord: false
bindings:
report-in:
destination: report
contentType: application/json
group: report
consumer:
concurrency: 5
partitioned: true
report-out:
destination: jobmanager
contentType: application/json
group: jobmanager
producer:
autoAddPartitions: true
**
We also have another consumer for which we have not set any kafka related consumer props and surprisingly those consumers are registering themselves properly.
Config:
cloud:
stream:
kafka:
binder:
autoCreateTopics: false
brokers: kafka
bindings:
parse-in:
destination: parser
contentType: application/json
group: parser
consumer:
concurrency: 3
partitioned: true
parse-out:
destination: jobmanager
contentType: application/json
group: jobmanager
producer:
partitionKeyExpression: headers['contentType']
autoAddPartitions: true
And describe command output as below
[root#innolx131112 shyama]# kubectl -n tmo-ccm exec kafka-test-client -- /usr/bin/kafka-consumer-groups --bootstrap-server kafka:9092 --describe -group parser
Note: This will not show information about old Zookeeper-based consumers.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
parser 0 14 14 0 consumer-3-cb99e45e-21b3-4efb-ac7b-4f642a9486be /192.168.0.26 consumer-3
parser 1 13 13 0 consumer-3-cb99e45e-21b3-4efb-ac7b-4f642a9486be /192.168.0.26 consumer-3
parser 2 15 15 0 consumer-4-bec1e4af-d771-47fe-83ad-3440f3a6d4bd /192.168.0.26 consumer-4
parser 3 15 15 0 consumer-4-bec1e4af-d771-47fe-83ad-3440f3a6d4bd /192.168.0.26 consumer-4
parser 4 12 12 0 consumer-5-b9ac7e36-58cf-40cb-b37d-a0fa092a0d56 /192.168.0.26 consumer-5
Is it that as we are providing kafka related props for 1st consumer(report consumer), its not able to register???
Consumer group 'report' has no active members.
This simply means your app wasn't running when you entered the command.
That info is transient and not retained when the app stops.
The partitions might be assigned to a different instance next time.
EDIT
whenever we are instantiating more consumers using same group they are processing the same old already processed by old consumers
Well, looking at your configuration more carefully, that's exactly what you asked for...
autoRebalanceEnabled: false
... means that Kafka will not use group management and the partitions will be allocated by Spring Cloud stream.
autoCommitOffset: false
means that Spring will not commit any offsets and that is the responsibility of your application. If you don't commit the offset, you will get the behavior you observe.

Producer Partition Count override not effective

Interpreting this - https://docs.spring.io/spring-cloud-stream/docs/current/reference/htmlsingle/#_producer_properties
my understanding is that, if the partitionCount override is less than the actual number of partitions on an existing kafka topic, then the producer should use the actual number of partitions rather than the override value. My experience is that the producer uses the partitionCount value regardless of how many partitions ( > partitionCount ) are actually configured on the kafka topic.
Ideally, I would like the producer to read the number of partitions on a pre-configured topic from kafka and write messages across all available partitions.
Spring-Cloud version: Finchley.RELEASE
Kafka Broker version : 1.0.0
application.yml:
spring:
application:
name: my-app
cloud:
stream:
default:
contentType: application/json
kafka:
binder:
brokers:
- ${KAFKA_HOST}:${KAFKA_PORT}
auto-create-topics: false
bindings:
input-channel:
destination: input-topic
contentType: application/json
group: input-group
output-channel:
destination: output-topic
contentType: application/json
producer:
partition-count: 2
partition-key-expression: payload['Id']
So, I am expecting that if the output topic is already configured with 6 partitions, the producer will recognise this and write to all of those. Could someone please verify my interpretation above? Or point out what I am missing to get the desired functionality?

Kafka messages are reprocessed

We have a micro-services that produces and consumes messages from Kafka using spring-boot and spring-cloud-stream.
versions:
spring-boot: 1.5.8.RELEASE
spring-cloud-stream: Ditmars.RELEASE
Kafka server: kafka_2.11-1.0.0
EDIT:
We are working in a Kubernetes environment using StatefulSets cluster of 3 Kafka nodes and a cluster of 3 Zookeeper nodes.
We experienced several occurrences of old messages that are reprocessed when those messages where already processed few days ago.
Several notes:
Before that happens the following logs were printed (there are more similar lines this is just a summary)
Revoking previously assigned partitions [] for group enrollment-service
Discovered coordinator dev-kafka-1.kube1.iaas.watercorp.com:9092 (id: 2147483646 rack: null)
Successfully joined group enrollment-service with generation 320
The above-mentioned incidents of revoking and reassigning of partitions happens every few hours. And just in few of those incidents old messages are re-consumed. In most cases the reassigning doesn't triggers message consumption.
The messages are from different partitions.
There are more than 1 message per partition that is being reprocessed.
application.yml:
spring:
cloud:
stream:
kafka:
binder:
brokers: kafka
defaultBrokerPort: 9092
zkNodes: zookeeper
defaultZkPort: 2181
minPartitionCount: 2
replicationFactor: 1
autoCreateTopics: true
autoAddPartitions: true
headers: type,message_id
requiredAcks: 1
configuration:
"[security.protocol]": PLAINTEXT #TODO: This is a workaround. Should be security.protocol
bindings:
user-enrollment-input:
consumer:
autoRebalanceEnabled: true
autoCommitOnError: true
enableDlq: true
user-input:
consumer:
autoRebalanceEnabled: true
autoCommitOnError: true
enableDlq: true
enrollment-mail-output:
producer:
sync: true
configuration:
retries: 10000
enroll-users-output:
producer:
sync: true
configuration:
retries: 10000
default:
binder: kafka
contentType: application/json
group: enrollment-service
consumer:
maxAttempts: 1
producer:
partitionKeyExtractorClass: com.watercorp.messaging.PartitionKeyExtractor
bindings:
user-enrollment-input:
destination: enroll-users
consumer:
concurrency: 10
partitioned: true
user-input:
destination: user
consumer:
concurrency: 5
partitioned: true
enrollment-mail-output:
destination: send-enrollment-mail
producer:
partitionCount: 10
enroll-users-output:
destination: enroll-users
producer:
partitionCount: 10
Is there any configuration that I might be missing? What can cause this behavior?
So the actual problem is the one that is described in the following ticket: https://issues.apache.org/jira/browse/KAFKA-3806.
Using the suggested workaround fixed it.