My question is how to manage the multi instance with Spring Cloud Stream Kafka.
Let me explain, in a Spring Cloud Stream Microservices context (eureka, configserver, kafka) I want to have 2 instances of the same microservice. When I change a configuration in my GIT Repository, the configserver (via a webhook) will push a message into the Kafka topic.
If i use the same group-id in my microservice, only one of two instances will received the notification, and reload his spring context.
But I need to refresh all instances ...
So, to do that, I have configured an unique group-id : ${spring.application.name}.bus.${hostname}
It's work well, but the problem is, each time I start a new instance of my service, it create a new consumer group in kafka. Now i have a lot of unused consumer group.
[![consumers for a microservice][1]][1]
[1]: https://i.stack.imgur.com/6jIzx.png
Here is the Spring Cloud Stream configuration of my service :
spring:
cloud:
bus:
destination: sys.spring-cloud-bus.refresh
enabled: true
refresh:
enabled: true
env:
enabled: true
trace:
enabled: false
stream:
bindings:
# Override spring cloud bus configuration with a specific binder named "bus"
springCloudBusInput:
binder: bus
destination: sys.spring-cloud-bus.refresh
content-type: application/json
group: ${spring.application.name}.bus.${hostname}
springCloudBusOutput:
binder: bus
destination: sys.spring-cloud-bus.refresh
content-type: application/json
group: ${spring.application.name}.bus.${hostname}
binders:
bus:
type: kafka
defaultCandidate: false
environment:
spring:
cloud:
stream:
kafka:
binder:
brokers: kafka-dev.hcuge.ch:9092
kafka:
streams:
bindings:
springCloudBusInput:
consumer:
startOffset: latest # Reset offset to the latest value to avoid consume configserver notifications on startup
resetOffsets: true
How to avoid lot of consumer creation ? Should I remove old consumer group in kafka ?
I think my solution is not the best way to do it, so if you have a better option, I'm interested;)
Thank you
If you don't provide a group, bus will use a random group anyway.
The broker will eventually remove the unused groups according to its offsets.retention.minutes property (currently 7 days by default).
Related
We are using Spring Cloud Stream Kafka Binder and we are facing a problem with our application that consumes one topic and process the messages then outputs them to different topics.
These topics are also consumed within the same application and output to a final topic.
We noticed a huge number of producers threads being created whenever new messages are consumed by the first consumer and these threads remain live.
Here is my simplified config :
cloud:
stream:
function:
definition: schedulingConsumer;consumerSearch1;consumerSearch2
default:
group: ${kafka.group}
contentType: application/json
consumer:
maxAttempts: 1
backOffMaxInterval: 30
retryableExceptions:
org.springframework.messaging.converter.MessageConversionException: false
kafka:
binder:
brokers: ${kafka.brokers}
headerMapperBeanName: kafkaHeaderMapper
producerProperties:
linger.ms: 500
batch.size: ${kafka.batchs.size}
compression.type: gzip
consumerProperties:
session.timeout.ms: ${kafka.session.timeout.ms}
max.poll.interval.ms: ${kafka.poll.interval}
max.poll.records: ${kafka.poll.records}
commit.interval.ms: 500
allow.auto.create.topics: false
bindings:
schedulingConsumer-in-0:
destination: ${kafka.topics.schedules}
consumer.concurrency: 5
search1-out:
destination: ${kafka.topics.groups.search1}
search2-out:
destination: ${kafka.topics.groups.search2}
consumerSearch1-in-0:
destination: ${kafka.topics.groups.search1}
consumerSearch2-in-0:
destination: ${kafka.topics.groups.search2}
datasource-out:
destination: ${kafka.topics.search.output}
Here is a screenshot from the threads activity :
We have tried to separate the first consumer schedulingConsumer from others : consumerSearch1 and consumerSearch2 and the problem seems to be resolved.
The problem occurs when we have all these consumers running in the same instance.
It seems like it's a bug in spring cloud stream. I have reported it to the team Kafka producer threads keep increasing when 'spring.cloud.stream.dynamic-destination-cache-size' is exceeded #2452
So, the solution was to override the property spring.cloud.stream.dynamic-destination-cache-size and set a value greater the number of your output bindings.
For my case I had 14 output bindings.
I want to override the default values to the below listed values in consumer properties, but I don't see the changes are getting reflected. I was hoping that I will get an exception as the service won't be able to process 500 messages in 10 seconds. Not sure, if this the right way to configure.
spring:
cloud:
stream:
kafka:
bindings:
test-topic:
consumer:
configuration:
max.poll.records: 500
max.poll.interval.ms: 10000
After install and config Suricata 5.0.2 according to document https://suricata.readthedocs.io/.
I try to change some configuration in suricata.yaml by adding:
- alert-json-log:
enabled: yes
filetype: kafka
kafka:
brokers: >
xxx-kafka-online003:9092,
xxx-kafka-online004:9092,
xxx-kafka-online005:9092,
xxx-kafka-online006:9092,
xxx-kafka-online007:9092
topic: nsm_event
partitions: 5
http: yes
Next I run Suricata, and receive the error
Invalid entry for alert-json-log.filetype. Expected "regular" (default), "unix_stream", "pcie" or "unix_dgram"
I don't know to configure on Suricata to enable sending log to Kafka topics.
Please help.
I don't see Kafka listed as an output type, therefore "no, there is not"
Refer docs: https://suricata.readthedocs.io/en/suricata-5.0.2/output/index.html
Plus, I'm not sure I understand what you expect http: yes to do since Kafka is not an HTTP service
What you could do is set filetype: unix_stream, then I assume that is Syslog, and you can add another service like Kafka Connect or Fluentd or Logstash to route that data to Kafka.
In other words, services don't need to integrate with Kafka. Plenty of alternatives exist to read files or stdout/stderr/syslog streams
Need some help in integrating kafka with spring cloud stream. The application is very simple, with 2 parts(run as separate java processes)
A consumer- puts request into RequestTopic and gets response from ResponseTopic
A producer- gets the request from the RequestTopic and puts the response back in ResponseTopic.
I have created RequestSenderChannel and ResponseReceiverChannel interfaces for consumer and RequestReceiverChannel and ResponseSenderChannel
for the producer application. both of them share the same yaml file.
As per the documentation spring.cloud.stream.bindings..destination should specify the topic to which the message is sent or received.
But when i run the application, the application creates topics as 'RequestSender', 'RequestReceiver', 'ResponseSender' and 'ResponseReceiver' in the kafka
My assumption was: since destination in the YAML file specifies only two topics 'RequestTopic' and 'ResponseTopic', it should have created those topics.
but it creates Kafka topics for attributes specified at 'spring.cloud.stream.bindings' in the YAML file.
can someone please point out the issue in the configruation/code?
public interface RequestReceiverChannel
{
String requestReceiver ="RequestReceiver";
#Input(requestReceiver)
SubscribableChannel pathQueryRequest();
}
public interface RequestSenderChannel
{
String RequestSender ="RequestSender";
#Output(RequestSender)
MessageChannel pathQueryRequestSender();
}
public interface ResponseReceiverChannel
{
String ResponseReceiver = "ResponseReceiver";
#Input(ResponseReceiver)
SubscribableChannel pceResponseServiceReceiver();
}
public interface ResponseSenderChannel
{
String ResponseSender = "ResponseSender";
#Output(ResponseSender)
MessageChannel pceResponseService();
}
'''
The YAML configuration file
spring:
cloud:
stream:
defaultBinder: kafka
bindings:
RequestSender:
binder: kafka
destination: RequestTopic
content-type: application/protobuf
group: consumergroup
ResponseSender:
binder: kafka
destination: ResponseTopic
content-type: application/protobuf
group: consumergroup
RequestReceiver:
binder: kafka
destination: RequestTopic
content-type: application/protobuf
group: consumergroup
ResponseReceiver:
binder: kafka
destination: ResponseTopic
content-type: application/protobuf
group: consumergroup
kafka:
bindings:
RequestTopic:
consumer:
autoCommitOffset: false
ResponseTopic:
consumer:
autoCommitOffset: false
binder:
brokers: ${SERVICE_KAFKA_HOST:localhost}
zkNodes: ${SERVICE_ZOOKEEPER_HOST:127.0.0.1}
defaultZkPort: ${SERVICE_ZOOKEEPER_PORT:2181}
defaultBrokerPort: ${SERVICE_KAFKA_PORT:9092}
By doing spring.cloud.stream.bindings.<binding-name>.destination=foo you are expressing desire to map binding specified by <binding-name> (e.g., RequestSender) to a broker destination named foo. If such destination does not exist it will be auto-provisioned.
So there are no issues.
That said, we've just released Horsham.RELEASE (part of cloud Hoxton.RELEASE) and we are moving away from annotation-based model you are currently using in favor of a significantly simpler functional model. You can read more about it in our release blog which also provides links to 4 posts where we elaborate and provide more examples on functional programming paradigm.
Interpreting this - https://docs.spring.io/spring-cloud-stream/docs/current/reference/htmlsingle/#_producer_properties
my understanding is that, if the partitionCount override is less than the actual number of partitions on an existing kafka topic, then the producer should use the actual number of partitions rather than the override value. My experience is that the producer uses the partitionCount value regardless of how many partitions ( > partitionCount ) are actually configured on the kafka topic.
Ideally, I would like the producer to read the number of partitions on a pre-configured topic from kafka and write messages across all available partitions.
Spring-Cloud version: Finchley.RELEASE
Kafka Broker version : 1.0.0
application.yml:
spring:
application:
name: my-app
cloud:
stream:
default:
contentType: application/json
kafka:
binder:
brokers:
- ${KAFKA_HOST}:${KAFKA_PORT}
auto-create-topics: false
bindings:
input-channel:
destination: input-topic
contentType: application/json
group: input-group
output-channel:
destination: output-topic
contentType: application/json
producer:
partition-count: 2
partition-key-expression: payload['Id']
So, I am expecting that if the output topic is already configured with 6 partitions, the producer will recognise this and write to all of those. Could someone please verify my interpretation above? Or point out what I am missing to get the desired functionality?