Kafka producer threads keep increasing - apache-kafka

We are using Spring Cloud Stream Kafka Binder and we are facing a problem with our application that consumes one topic and process the messages then outputs them to different topics.
These topics are also consumed within the same application and output to a final topic.
We noticed a huge number of producers threads being created whenever new messages are consumed by the first consumer and these threads remain live.
Here is my simplified config :
cloud:
stream:
function:
definition: schedulingConsumer;consumerSearch1;consumerSearch2
default:
group: ${kafka.group}
contentType: application/json
consumer:
maxAttempts: 1
backOffMaxInterval: 30
retryableExceptions:
org.springframework.messaging.converter.MessageConversionException: false
kafka:
binder:
brokers: ${kafka.brokers}
headerMapperBeanName: kafkaHeaderMapper
producerProperties:
linger.ms: 500
batch.size: ${kafka.batchs.size}
compression.type: gzip
consumerProperties:
session.timeout.ms: ${kafka.session.timeout.ms}
max.poll.interval.ms: ${kafka.poll.interval}
max.poll.records: ${kafka.poll.records}
commit.interval.ms: 500
allow.auto.create.topics: false
bindings:
schedulingConsumer-in-0:
destination: ${kafka.topics.schedules}
consumer.concurrency: 5
search1-out:
destination: ${kafka.topics.groups.search1}
search2-out:
destination: ${kafka.topics.groups.search2}
consumerSearch1-in-0:
destination: ${kafka.topics.groups.search1}
consumerSearch2-in-0:
destination: ${kafka.topics.groups.search2}
datasource-out:
destination: ${kafka.topics.search.output}
Here is a screenshot from the threads activity :
We have tried to separate the first consumer schedulingConsumer from others : consumerSearch1 and consumerSearch2 and the problem seems to be resolved.
The problem occurs when we have all these consumers running in the same instance.

It seems like it's a bug in spring cloud stream. I have reported it to the team Kafka producer threads keep increasing when 'spring.cloud.stream.dynamic-destination-cache-size' is exceeded #2452
So, the solution was to override the property spring.cloud.stream.dynamic-destination-cache-size and set a value greater the number of your output bindings.
For my case I had 14 output bindings.

Related

Create topic with partitions from Spring-Cloud Binder

I've a kafka configuration inside of my yaml file and for one input I'm adding multiple topics with different name. I want 3 of them to have 5 partitions and one of them must have 1 partition. How can I set it in my configuration file separately? Kafka version is old and it can't create partitions automatically so I need to make them manually.
spring:
cloud:
stream:
default:
group: xxxx
consumer:
partitioned: true
concurrency: 5
kafka:
binder:
configuration:
max.poll.interval.ms: 100000
max.poll.records: 100
brokers: xx.xx.xx.xx
defaultBrokerPort: 8080
replicationFactor: 1
function:
definition: methodName
bindings:
methodName-in-0:
destination: topic1, topic2, topic3, topic4
I solved this issue with decreasing default partition count 5 to 1. Somehow because of kafka version it can't decrease partition count but it can increase it.

Batch Consumer with a given period doesn't work with multiple partition in Spring Cloud Stream(StreamListener)?

#StreamListener(value = PersonStream.INPUT)
private void personBulkReceiver(List<Person> person) {
//....
}
spring:
cloud:
stream:
kafka:
binders:
bulkKafka:
type: kafka
environment:
spring:
cloud:
stream:
kafka:
binder:
brokers: localhost:9092
configuration:
max.poll.records: 1500
fetch.min.bytes: 10000000
fetch.max.wait.ms: 10000
value.deserializer: tr.cloud.stream.examples.PersonDeserializer
bindings:
person-topic-in:
binder: bulkKafka
destination: person-topic
contentType: application/person
group : person-group
consumer:
batch-mode: true
I'am using Spring Cloud Stream with Kafka. In a StreamListener when partition count is 1 I can consume records in batch mode in every 5000 ms.
My .yml configuration is fetch.min.bytes = 10000000 && fetch.max.wait.ms = 50000 && max.poll.records = 1500 as stated above.
I can receive batch records in every 5000 ms. since batch record size doesn't exceed 10000000 bytes.
But when partition count is more than 1 StreamListener consumes records earlier than 5000 ms.
Is there any configuration for this case?
Or is this case is the natural result of independent threads working for each partition?
When partition count is more than 1 what is the difference in working logic ?
According to your readme...
And there is always a lot of data on the topic.
So that doesn't match your question where you said...
I can receive batch records in every 5000 ms. since batch record size doesn't exceed 10000000 bytes.
When there is more data than that, it will always be pushed to the client.
Consider using a Polled Consumer instead, to receive data at your desired rate.

Multiple instance with Spring Cloud Bus Kafka

My question is how to manage the multi instance with Spring Cloud Stream Kafka.
Let me explain, in a Spring Cloud Stream Microservices context (eureka, configserver, kafka) I want to have 2 instances of the same microservice. When I change a configuration in my GIT Repository, the configserver (via a webhook) will push a message into the Kafka topic.
If i use the same group-id in my microservice, only one of two instances will received the notification, and reload his spring context.
But I need to refresh all instances ...
So, to do that, I have configured an unique group-id : ${spring.application.name}.bus.${hostname}
It's work well, but the problem is, each time I start a new instance of my service, it create a new consumer group in kafka. Now i have a lot of unused consumer group.
[![consumers for a microservice][1]][1]
[1]: https://i.stack.imgur.com/6jIzx.png
Here is the Spring Cloud Stream configuration of my service :
spring:
cloud:
bus:
destination: sys.spring-cloud-bus.refresh
enabled: true
refresh:
enabled: true
env:
enabled: true
trace:
enabled: false
stream:
bindings:
# Override spring cloud bus configuration with a specific binder named "bus"
springCloudBusInput:
binder: bus
destination: sys.spring-cloud-bus.refresh
content-type: application/json
group: ${spring.application.name}.bus.${hostname}
springCloudBusOutput:
binder: bus
destination: sys.spring-cloud-bus.refresh
content-type: application/json
group: ${spring.application.name}.bus.${hostname}
binders:
bus:
type: kafka
defaultCandidate: false
environment:
spring:
cloud:
stream:
kafka:
binder:
brokers: kafka-dev.hcuge.ch:9092
kafka:
streams:
bindings:
springCloudBusInput:
consumer:
startOffset: latest # Reset offset to the latest value to avoid consume configserver notifications on startup
resetOffsets: true
How to avoid lot of consumer creation ? Should I remove old consumer group in kafka ?
I think my solution is not the best way to do it, so if you have a better option, I'm interested;)
Thank you
If you don't provide a group, bus will use a random group anyway.
The broker will eventually remove the unused groups according to its offsets.retention.minutes property (currently 7 days by default).

Spring Cloud Stream Binding Consumer properties are not working

I want to override the default values to the below listed values in consumer properties, but I don't see the changes are getting reflected. I was hoping that I will get an exception as the service won't be able to process 500 messages in 10 seconds. Not sure, if this the right way to configure.
spring:
cloud:
stream:
kafka:
bindings:
test-topic:
consumer:
configuration:
max.poll.records: 500
max.poll.interval.ms: 10000

Producer Partition Count override not effective

Interpreting this - https://docs.spring.io/spring-cloud-stream/docs/current/reference/htmlsingle/#_producer_properties
my understanding is that, if the partitionCount override is less than the actual number of partitions on an existing kafka topic, then the producer should use the actual number of partitions rather than the override value. My experience is that the producer uses the partitionCount value regardless of how many partitions ( > partitionCount ) are actually configured on the kafka topic.
Ideally, I would like the producer to read the number of partitions on a pre-configured topic from kafka and write messages across all available partitions.
Spring-Cloud version: Finchley.RELEASE
Kafka Broker version : 1.0.0
application.yml:
spring:
application:
name: my-app
cloud:
stream:
default:
contentType: application/json
kafka:
binder:
brokers:
- ${KAFKA_HOST}:${KAFKA_PORT}
auto-create-topics: false
bindings:
input-channel:
destination: input-topic
contentType: application/json
group: input-group
output-channel:
destination: output-topic
contentType: application/json
producer:
partition-count: 2
partition-key-expression: payload['Id']
So, I am expecting that if the output topic is already configured with 6 partitions, the producer will recognise this and write to all of those. Could someone please verify my interpretation above? Or point out what I am missing to get the desired functionality?