I have an application already implemented using Spring Cloud Stream (SCS) with 3 components: 1 source #EnableBinding(Source.class), 1 processor #EnableBinding(Processor.class) and 1 sink #EnableBinding(Sink.class) that I communicate using Apache Kafka binders.
As part of the configuration for these components, I'm using several properties from Spring Cloud Stream, such as the topics to use, the number of partitions, the serializers, the max poll, etc.:
spring:
application:
name: myapp
cloud:
stream:
bindings:
output:
destination: topic1
producer:
partitionCount: 5
partitionKeyExpression: headers.kafka_messageKey
kafka:
binder:
brokers: 10.138.128.62
defaultBrokerPort: 9092
zkNodes: 10.138.128.62
defaultZkPort: 2181
requiredAcks: -1
replicationFactor: 1
autoCreateTopics: true
autoAddPartitions: true
bindings:
output:
producer:
configuration:
key.serializer: org.apache.kafka.common.serialization.StringSerializer
value.serializer: org.apache.kafka.common.serialization.ByteArraySerializer
All these properties are defined in an external file 'application.yml' that I indicate at the time of executing the component:
java -jar mycomponent.jar --spring.config.location=/conf/application.yml
Currently, I orchestrate those 3 components "manually", but I would like to use Spring Cloud Data Flow (SCDF) to create a stream and be able to operate them much better.
Based on the SCDF documentation, any SCS application can be straight-forwardly used as an application to be defined in a stream. Besides that, properties for the application can be provided through a external properties file. However, I'm providing my 'application.yml' properties file and it's not working:
stream deploy --name mystream --definition "mysource | myprocessor | mysink' --deploy --propertiesFile /conf/application.yml
After some research, I realized that the documentation states that any property for any application must be passed in this format:
app.<app-name>.<property-name>=<value>
So I have some questions:
Do I have add that "app." to all my existing properties?
Is there any way I can provide something like "--spring.config.location" to my application in SCDF?
If I already provide a "spring.application.name" property in the application.yml, how does it impact SCDF, as I also provide an application name when defining the stream?
If I already provide a "server.port" property in the application.yml, how does it impact SCDF? Will SCDF pick it as the port to use for the application or will it just ignore it?
Thanks in advance for your support.
Do I have to add that "app." to all my existing properties?
Yes. You can have something like this:
app:
app-name:
spring:
cloud:
...
Is there any way I can provide something like "--spring.config.location" to my application in SCDF?
For the stream that is deployed, only --propertiesFile can provide properties at runtime. But, you can still use application specific properties like:
stream deploy mystream --properties "app.*.spring.config.location=configFile"
or, different configFiles for each app with the app.app-name prefix.
This way, all the apps that are deployed would get these properties.
If I already provide a "spring.application.name" property in the application.yml, how does it impact SCDF, as I also provide an application name when defining the stream?
Do you use spring.application.name explicitly in your application for some reason. I guess there will be some impact in metrics collector if you change the spring.application.name.
If I already provide a "server.port" property in the application.yml, how does it impact SCDF? Will SCDF pick it as the port to use for the application or will it just ignore it?
It works the same way as Spring Boot property sources precedence. The server.port in your application.yml of your application will get the least precedence over the other property sources that can be set via stream definition/deployment properties.
Related
I am using the following configurations to define the topic name
spring:
cloud:
stream:
bindings:
output:
destination: topicA
I want to know how can we do this programmatically at runtime so that we don't have to define it here. I am using Spring Kafka Binders. TIA
You can just use StreamBridge and send to any topic. If it doesn't exist the binder will provision it for you - https://docs.spring.io/spring-cloud-stream/docs/3.1.5/reference/html/spring-cloud-stream.html#_sending_arbitrary_data_to_an_output_e_g_foreign_event_driven_sources
We have a Spring Boot Microservice that as well as having HTTP endpoints uses Spring Cloud Bus to pick up refresh events (from rabbit) and also has a Spring Cloud Stream Sink that picks up custom messages from another rabbit topic.
After updating to Spring Boot 2.4.1 and Spring Cloud 2020.0.0 everything seemed to be working until we discovered Spring Cloud Bus was no longer picking up events.
Looking into this it turned out some of the Spring Cloud Bus internal channels where not getting created.
This wasn't happening in another service that didn't have the stream functionality as well so we tested disabling that and the bus functionality then started working.
So it was obviously some sort of interference between the old style stream model and the newer Spring Cloud Bus.
After updating the our sink to use the new function model I still had issues and eventually got both to work by including the following lines in our application.yml:
spring:
cloud:
stream:
bindings.mySink-in-0.destination: mytopic
function.definition: busConsumer;mySink
So I have the following questions
Did I miss something or should there be better documentation on how stream / bus can affect each other and the migration to 2020.0.0?
Does my current configuration look correct?
It doesn't seem right to have to include busConsumer here - should the auto configuration for it not be able to 'combine it in' with any other stream config?
What's the difference between spring.cloud.stream.function.definition and spring.cloud.function.definition? I've seen both in documentation and Spring Cloud Bus seems to be also setting spring.cloud.function.definition=busConsumer
In org.springframework.cloud.stream.function.FunctionConfiguration, It does a search for #EnableBinding.
if (ObjectUtils.isEmpty(applicationContext.getBeanNamesForAnnotation(EnableBinding.class)))
If found, functional binding is disabled. See this
logger.info("Functional binding is disabled due to the presense of #EnableBinding annotation in your configuration");
After the upgrade, we need to transform our Listener classes to use functional interface in order to activate the functional binding. After that, cloud bus consumer binding will be created too.
In SpringBoot-application, which have been deployed in Stream, I use properties in yaml-file:
spring:
kafka:
producer:
retries: 8
Is it possible to show and manage this properties in UI of Spring Cloud DataFlow, like this:
The binder specific properties are not currently exposed as Spring Cloud Data Flow isn't aware of (doesn't need to be aware of) the underlying middleware in use.
Nifi and Kafka are now both available in Cloudera Data Platform, CDP public cloud. Nifi is great at talking to everything and Kafka is a mainstream message bus, I just wondered:
What are the minimal steps needed to Produce/Consume data to Kafka from Apache Nifi within CDP Public Cloud
I would Ideally look for steps that work in any cloud, for instance Amazon AWS and Microsoft Azure.
I am satisfied with answers that follow best practices and work with the default configuration of the platform, but if there are common alternatives these are welcome as well.
There will be multiple form factors available in the future, for now I will assume you have an environment that contains 1 datahub with NiFi, and 1 Data Hub with Kafka. (The answer still works if both are on the same datahub).
Prerequisites
Data Hub(s) with NiFi and Kafka
Permission to access these (e.g. add processor, create Kafka topic)
Know your Workload User Name (Cdp management console>Click your name (bottom left) > Click profile)
You should have set your Workload Password in the same location
These steps allow you to Produce data from NiFi to Kafka in CDP Public Cloud
Unless mentioned otherwise, I have kept everything to its default settings.
In Kafka Data Hub Cluster:
Gather the FQDN links of the brokers, and the used ports.
If you have Streams Messaging Manager: Go to the brokers tab to see the FQDN and port already together
If you cannot use Streams Messaging Manager: Go to the hardware tab of your Data Hub with Kafka and get the FQDN of the relevant nodes. (Currently these are called broker). Then add :portnumber behind each one. The default port is 9093.
Combine the links together in this format: FQDN:port,FQDN:port,FQDN:port it should now look something like this:
broker1.abc:9093,broker2.abc:9093,broker3.abc:9093
In NiFi GUI:
Make sure you have some data in NiFi to produce, for example by using the GenerateFlowFile processor
Select the relevant processor for writing to kafka, for example PublishKafka_2_0, configure it as follows:
Settings
Automatically terminate relationships: Tick both success and faillure
Properties
Kafka Brokers: The combined list we created earlier
Security Protocol: SASL_SSL
SASL Mechanism: PLAIN
SSL Context Service: Default NiFi SSL Context Service
Username: your Workload User Name (see prerequisites above)
Password: your Workload Password
Topic Name: dennis
Use Transactions: false
Max Metadata Wait Time: 30 sec
Connect your GenerateFlowFile processor to your PublishKafka_2_0 processor and start the flow
These are the minimal steps, a more extensive explanation can be found on in the Cloudera Documentation. Note that it best practice to create topics explicitly (this example leverages the feature of Kafka that automatically lets it create topics when produced to).
These steps allow you to Consume data with NiFi from Kafka in CDP Public Cloud
A good check to see if data was written to Kafka, is consuming it again.
In NiFi GUI:
Create a Kafka consumption processor, for instance ConsumeKafka_2_0, configure its Properties as follows:
Kafka Brokers, Security Protocol, SASL Mechanism, SSL Context Service, Username, Password, Topic Name: All the same as in our producer example above
Consumer Group: 1
Offset Reset: earliest
Create another processor, or a funnel to send the messages to, and start the consumption processor.
And that is it, within 30 seconds you should see that the data that you published to Kafka is now flowing into NiFi again.
Full Disclosure: I am an employee of Cloudera, the driving force behind Nifi.
Is there a way to allow (Spring Cloud Stream) applications autocreate the topics they need in Confluent Cloud?
So far I've had to manually create them which is error prone when you consider you also have to setup changelog topics.
The binder property autoCreateTopics is true by default. So topics should be created automatically, (unless the broker permissions don't allow it - but I would expect to see errors in the logs in that case).
There is another property autoAddPartitions which is false by default.