Producer Partition Count override not effective - apache-kafka

Interpreting this - https://docs.spring.io/spring-cloud-stream/docs/current/reference/htmlsingle/#_producer_properties
my understanding is that, if the partitionCount override is less than the actual number of partitions on an existing kafka topic, then the producer should use the actual number of partitions rather than the override value. My experience is that the producer uses the partitionCount value regardless of how many partitions ( > partitionCount ) are actually configured on the kafka topic.
Ideally, I would like the producer to read the number of partitions on a pre-configured topic from kafka and write messages across all available partitions.
Spring-Cloud version: Finchley.RELEASE
Kafka Broker version : 1.0.0
application.yml:
spring:
application:
name: my-app
cloud:
stream:
default:
contentType: application/json
kafka:
binder:
brokers:
- ${KAFKA_HOST}:${KAFKA_PORT}
auto-create-topics: false
bindings:
input-channel:
destination: input-topic
contentType: application/json
group: input-group
output-channel:
destination: output-topic
contentType: application/json
producer:
partition-count: 2
partition-key-expression: payload['Id']
So, I am expecting that if the output topic is already configured with 6 partitions, the producer will recognise this and write to all of those. Could someone please verify my interpretation above? Or point out what I am missing to get the desired functionality?

Related

Create topic with partitions from Spring-Cloud Binder

I've a kafka configuration inside of my yaml file and for one input I'm adding multiple topics with different name. I want 3 of them to have 5 partitions and one of them must have 1 partition. How can I set it in my configuration file separately? Kafka version is old and it can't create partitions automatically so I need to make them manually.
spring:
cloud:
stream:
default:
group: xxxx
consumer:
partitioned: true
concurrency: 5
kafka:
binder:
configuration:
max.poll.interval.ms: 100000
max.poll.records: 100
brokers: xx.xx.xx.xx
defaultBrokerPort: 8080
replicationFactor: 1
function:
definition: methodName
bindings:
methodName-in-0:
destination: topic1, topic2, topic3, topic4
I solved this issue with decreasing default partition count 5 to 1. Somehow because of kafka version it can't decrease partition count but it can increase it.

Kafka producer threads keep increasing

We are using Spring Cloud Stream Kafka Binder and we are facing a problem with our application that consumes one topic and process the messages then outputs them to different topics.
These topics are also consumed within the same application and output to a final topic.
We noticed a huge number of producers threads being created whenever new messages are consumed by the first consumer and these threads remain live.
Here is my simplified config :
cloud:
stream:
function:
definition: schedulingConsumer;consumerSearch1;consumerSearch2
default:
group: ${kafka.group}
contentType: application/json
consumer:
maxAttempts: 1
backOffMaxInterval: 30
retryableExceptions:
org.springframework.messaging.converter.MessageConversionException: false
kafka:
binder:
brokers: ${kafka.brokers}
headerMapperBeanName: kafkaHeaderMapper
producerProperties:
linger.ms: 500
batch.size: ${kafka.batchs.size}
compression.type: gzip
consumerProperties:
session.timeout.ms: ${kafka.session.timeout.ms}
max.poll.interval.ms: ${kafka.poll.interval}
max.poll.records: ${kafka.poll.records}
commit.interval.ms: 500
allow.auto.create.topics: false
bindings:
schedulingConsumer-in-0:
destination: ${kafka.topics.schedules}
consumer.concurrency: 5
search1-out:
destination: ${kafka.topics.groups.search1}
search2-out:
destination: ${kafka.topics.groups.search2}
consumerSearch1-in-0:
destination: ${kafka.topics.groups.search1}
consumerSearch2-in-0:
destination: ${kafka.topics.groups.search2}
datasource-out:
destination: ${kafka.topics.search.output}
Here is a screenshot from the threads activity :
We have tried to separate the first consumer schedulingConsumer from others : consumerSearch1 and consumerSearch2 and the problem seems to be resolved.
The problem occurs when we have all these consumers running in the same instance.
It seems like it's a bug in spring cloud stream. I have reported it to the team Kafka producer threads keep increasing when 'spring.cloud.stream.dynamic-destination-cache-size' is exceeded #2452
So, the solution was to override the property spring.cloud.stream.dynamic-destination-cache-size and set a value greater the number of your output bindings.
For my case I had 14 output bindings.

Multiple instance with Spring Cloud Bus Kafka

My question is how to manage the multi instance with Spring Cloud Stream Kafka.
Let me explain, in a Spring Cloud Stream Microservices context (eureka, configserver, kafka) I want to have 2 instances of the same microservice. When I change a configuration in my GIT Repository, the configserver (via a webhook) will push a message into the Kafka topic.
If i use the same group-id in my microservice, only one of two instances will received the notification, and reload his spring context.
But I need to refresh all instances ...
So, to do that, I have configured an unique group-id : ${spring.application.name}.bus.${hostname}
It's work well, but the problem is, each time I start a new instance of my service, it create a new consumer group in kafka. Now i have a lot of unused consumer group.
[![consumers for a microservice][1]][1]
[1]: https://i.stack.imgur.com/6jIzx.png
Here is the Spring Cloud Stream configuration of my service :
spring:
cloud:
bus:
destination: sys.spring-cloud-bus.refresh
enabled: true
refresh:
enabled: true
env:
enabled: true
trace:
enabled: false
stream:
bindings:
# Override spring cloud bus configuration with a specific binder named "bus"
springCloudBusInput:
binder: bus
destination: sys.spring-cloud-bus.refresh
content-type: application/json
group: ${spring.application.name}.bus.${hostname}
springCloudBusOutput:
binder: bus
destination: sys.spring-cloud-bus.refresh
content-type: application/json
group: ${spring.application.name}.bus.${hostname}
binders:
bus:
type: kafka
defaultCandidate: false
environment:
spring:
cloud:
stream:
kafka:
binder:
brokers: kafka-dev.hcuge.ch:9092
kafka:
streams:
bindings:
springCloudBusInput:
consumer:
startOffset: latest # Reset offset to the latest value to avoid consume configserver notifications on startup
resetOffsets: true
How to avoid lot of consumer creation ? Should I remove old consumer group in kafka ?
I think my solution is not the best way to do it, so if you have a better option, I'm interested;)
Thank you
If you don't provide a group, bus will use a random group anyway.
The broker will eventually remove the unused groups according to its offsets.retention.minutes property (currently 7 days by default).

Spring cloud stream with kafka

Need some help in integrating kafka with spring cloud stream. The application is very simple, with 2 parts(run as separate java processes)
A consumer- puts request into RequestTopic and gets response from ResponseTopic
A producer- gets the request from the RequestTopic and puts the response back in ResponseTopic.
I have created RequestSenderChannel and ResponseReceiverChannel interfaces for consumer and RequestReceiverChannel and ResponseSenderChannel
for the producer application. both of them share the same yaml file.
As per the documentation spring.cloud.stream.bindings..destination should specify the topic to which the message is sent or received.
But when i run the application, the application creates topics as 'RequestSender', 'RequestReceiver', 'ResponseSender' and 'ResponseReceiver' in the kafka
My assumption was: since destination in the YAML file specifies only two topics 'RequestTopic' and 'ResponseTopic', it should have created those topics.
but it creates Kafka topics for attributes specified at 'spring.cloud.stream.bindings' in the YAML file.
can someone please point out the issue in the configruation/code?
public interface RequestReceiverChannel
{
String requestReceiver ="RequestReceiver";
#Input(requestReceiver)
SubscribableChannel pathQueryRequest();
}
public interface RequestSenderChannel
{
String RequestSender ="RequestSender";
#Output(RequestSender)
MessageChannel pathQueryRequestSender();
}
public interface ResponseReceiverChannel
{
String ResponseReceiver = "ResponseReceiver";
#Input(ResponseReceiver)
SubscribableChannel pceResponseServiceReceiver();
}
public interface ResponseSenderChannel
{
String ResponseSender = "ResponseSender";
#Output(ResponseSender)
MessageChannel pceResponseService();
}
'''
The YAML configuration file
spring:
cloud:
stream:
defaultBinder: kafka
bindings:
RequestSender:
binder: kafka
destination: RequestTopic
content-type: application/protobuf
group: consumergroup
ResponseSender:
binder: kafka
destination: ResponseTopic
content-type: application/protobuf
group: consumergroup
RequestReceiver:
binder: kafka
destination: RequestTopic
content-type: application/protobuf
group: consumergroup
ResponseReceiver:
binder: kafka
destination: ResponseTopic
content-type: application/protobuf
group: consumergroup
kafka:
bindings:
RequestTopic:
consumer:
autoCommitOffset: false
ResponseTopic:
consumer:
autoCommitOffset: false
binder:
brokers: ${SERVICE_KAFKA_HOST:localhost}
zkNodes: ${SERVICE_ZOOKEEPER_HOST:127.0.0.1}
defaultZkPort: ${SERVICE_ZOOKEEPER_PORT:2181}
defaultBrokerPort: ${SERVICE_KAFKA_PORT:9092}
By doing spring.cloud.stream.bindings.<binding-name>.destination=foo you are expressing desire to map binding specified by <binding-name> (e.g., RequestSender) to a broker destination named foo. If such destination does not exist it will be auto-provisioned.
So there are no issues.
That said, we've just released Horsham.RELEASE (part of cloud Hoxton.RELEASE) and we are moving away from annotation-based model you are currently using in favor of a significantly simpler functional model. You can read more about it in our release blog which also provides links to 4 posts where we elaborate and provide more examples on functional programming paradigm.

Kafka messages are reprocessed

We have a micro-services that produces and consumes messages from Kafka using spring-boot and spring-cloud-stream.
versions:
spring-boot: 1.5.8.RELEASE
spring-cloud-stream: Ditmars.RELEASE
Kafka server: kafka_2.11-1.0.0
EDIT:
We are working in a Kubernetes environment using StatefulSets cluster of 3 Kafka nodes and a cluster of 3 Zookeeper nodes.
We experienced several occurrences of old messages that are reprocessed when those messages where already processed few days ago.
Several notes:
Before that happens the following logs were printed (there are more similar lines this is just a summary)
Revoking previously assigned partitions [] for group enrollment-service
Discovered coordinator dev-kafka-1.kube1.iaas.watercorp.com:9092 (id: 2147483646 rack: null)
Successfully joined group enrollment-service with generation 320
The above-mentioned incidents of revoking and reassigning of partitions happens every few hours. And just in few of those incidents old messages are re-consumed. In most cases the reassigning doesn't triggers message consumption.
The messages are from different partitions.
There are more than 1 message per partition that is being reprocessed.
application.yml:
spring:
cloud:
stream:
kafka:
binder:
brokers: kafka
defaultBrokerPort: 9092
zkNodes: zookeeper
defaultZkPort: 2181
minPartitionCount: 2
replicationFactor: 1
autoCreateTopics: true
autoAddPartitions: true
headers: type,message_id
requiredAcks: 1
configuration:
"[security.protocol]": PLAINTEXT #TODO: This is a workaround. Should be security.protocol
bindings:
user-enrollment-input:
consumer:
autoRebalanceEnabled: true
autoCommitOnError: true
enableDlq: true
user-input:
consumer:
autoRebalanceEnabled: true
autoCommitOnError: true
enableDlq: true
enrollment-mail-output:
producer:
sync: true
configuration:
retries: 10000
enroll-users-output:
producer:
sync: true
configuration:
retries: 10000
default:
binder: kafka
contentType: application/json
group: enrollment-service
consumer:
maxAttempts: 1
producer:
partitionKeyExtractorClass: com.watercorp.messaging.PartitionKeyExtractor
bindings:
user-enrollment-input:
destination: enroll-users
consumer:
concurrency: 10
partitioned: true
user-input:
destination: user
consumer:
concurrency: 5
partitioned: true
enrollment-mail-output:
destination: send-enrollment-mail
producer:
partitionCount: 10
enroll-users-output:
destination: enroll-users
producer:
partitionCount: 10
Is there any configuration that I might be missing? What can cause this behavior?
So the actual problem is the one that is described in the following ticket: https://issues.apache.org/jira/browse/KAFKA-3806.
Using the suggested workaround fixed it.