We have an existing microservice environment with logstash, config and eureka servers. We are now setting up a Spring Cloud Dataflow (Kubernetes) environment (primarily intially to run tasks/batch jobs).
Ideally we would like the tasks to use the existing logstash, config and eureka servers via the standard spring boot configuration (annotations etc) to support the following scenarios:
Logstash: When a task runs its logs are output to logstash and viewable from Kibana
Config Server: To support changing configuration properties for tasks. eg a periodic task's configuration can be tweaked by altering the values on the configuration server and next time the task runs it will use the new values.
My understanding is that config server properties will override properties in the task definition which override properties in the internal application.properties.
Eureka: Each task would register itself in Eureka. The main reason for this is that our tasks have web actuator endpoints exposed and we can then can use Spring Boot Admin (which can discover services via eureka) to access the actuator endpoints and information while a task is running.
(Some of our tasks can take hours to run and this would enable us to monitor them, adjust logging etc)
Is this a sensible approach - or are there any potential issues to look out for here (eg short lived tasks with eureka). I can’t find any discussion of this in the existing spring cloud data flow or spring cloud task documentation.
You may try logstash-logback-encoder for SCDF integration with ELK stack. It works fine for our SCDF on Yarn stream application.
Config Server should work for any Spring Boot application.
Related
Is it possible to make auto-refresh properties for Spring Cloud clients in a multi-pod environment (Google Kubernetes Engine)?
I found several work arounds:
Using Spring Cloud Bus (too heavy solution).
Running refresh inside code using RefreshEvent and #Schedule it (not recommended by Spring).
Creating a new endpoint in Config Server to perform a refresh on all Spring Cloud clients.
Platform used : Kubernetes.
I have an issue with Spring cloud stream url. I am launching my spring cloud tasks using spring cloud stream. Streams are deployed in kubernetes platform. Stream contains http-kafka as source and taskLauncerKafka as sink. I used http-kafka kubernetes service url to launch tasks. Kubernetes service url changes after each deployment which causes problem.The changes in the service name after each stream deployment is difficult to manage. I have tried enabling loadbalacer also. In that case also external ip-address changed after each stream roll-out.
I am using skipper for managing the deployments. Every time stream is deployed stream version changes which also changes stream url.
In my case , I have multiple instances from where I can launch spring-cloud task. If the stream url changes I need to make changes in the configmap of the deployment project for all instance and need to redeploy all instance.
Any solution ? I am thinking of centralised configuration management using spring-cloud-config server or zookeeper. In this case also I need to update the url. I can avoid deploying multiple instances using centralised configuration management.
Skipper server version : 2.4.1.RELEASE
Dataflow server version : 2.5.1.RELEASE
Which version of SCDF/Skipper you are running?
This looks similar to the issue https://github.com/spring-cloud/spring-cloud-skipper/issues/953 which was subsequently addressed in Skipper 2.6.0.
I am building a data pipeline for batch processing. And I find that Spring Cloud Data Flow is a quite attractive framework to use. Without much knowledge in SCDF and Kubernetes, I am not sure whether it is possible to conditionally launch a Spring Cloud Task on a specific machine.
Suppose I have two physical servers that are for running the batch process (Server A and Server B). By default, I would like my Spring cloud task to be launched on Server A. If the Server A is shut down, the task should be deployed on server B. Can Kubernetes / SCDF handle this kind of mechanism? I am wondering whether the nodeselector is the thing that I should look into.
Yes, you can pass deployment.nodeSelector as a deployment property when launching the task.
The deployment.nodeSelector is a Kubernetes deployment property and hence, you need to pass something like this:
task launch mytask --properties "deployer.<taskAppName>.kubernetes.deployment.nodeSelector=foo1:bar1,foo2:bar2"
You can check the list of supported Kubernetes deployer properties here
I use spring cloud dataflow deployed to pivotal cloud foundry, to run spring batch jobs as spring cloud tasks, and the jobs require aws credentials to access an s3 bucket.
I've tried passing the aws credentials as task properties, but the credentials are showing up in the task's log files as arguments or properties. (https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#spring-cloud-dataflow-global-properties)
For now, I am manually setting the credentials as env variables in pcf after each deployment, but I'm trying to automate this. The tasks aren't deployed until the tasks are actually launched, so on a deployment I have to launch the task, then wait for it to fail due to missing credentials, then set the env variable credentials with the cf cli. How so I provide these credentials, without them showing in the pcf app's logs?
I've also explored using vault and spring cloud config, but again, I would need to pass credentials to the task to access spring cloud config.
Thanks!
Here's a Task/Batch-Job example.
This App uses spring-cloud-starter-aws. And this starter already provides the Boot autoconfiguration and the ability to override AWS creds as Boot properties.
You'd override the properties while launching from SCDF like:
task launch --name S3LoaderJob --arguments "--cloud.aws.credentials.accessKey= --cloud.aws.credentials.secretKey= --cloud.aws.region.static= --cloud.aws.region.auto=false"
You can decide to control the log-level of the Task, so it doesn't log them in plain text.
Secure credentials for tasks should be configured either via environment variables in your task definition or by using something like Spring Cloud Config Server to provide them (and store them encrypted at rest). Spring Cloud Task stores all command line arguments in the database in clear text which is why they should not be passed that way.
After considering the approaches included in the provided answers, I continued testing and researching and concluded that the best approach is to use a Cloud Foundry "User Provided Service" to supply AWS credentials to the task.
https://docs.cloudfoundry.org/devguide/services/user-provided.html
Spring Boot auto-processes the VCAP_SERVICES environment variable included in each app's container.
http://engineering.pivotal.io/post/spring-boot-injecting-credentials/
I then used properties placeholders in the application-cloud.properties to map the processed properties into spring-cloud-aws properties:
cloud.aws.credentials.accessKey=${vcap.services.aws-s3.credentials.aws_access_key_id}
cloud.aws.credentials.secretKey=${vcap.services.aws-s3.credentials.aws_secret_access_key}
I wanna deploy the Spring-cloud-data-flow on several hosts.
I will deploy the server of Spring-cloud-data-flow on one host-A, and deploy the agents on the other hosts(These hosts are in charge of executing the tasks).
Except the host-A, all the other hosts run the same tasks.
Shall I modify on the basis of the Spring Cloud Data Flow Local Server or on the Spring Cloud Data Flow Apache Yarn Server or other better choice?
Do you mean how the apps are deployed on several hosts? If so, the apps are deployed using the underlying deployer implementation. For instance, if it is local deployer then, each app is deployed by spawning a new process. You can scale out the number apps deployment using the count property during the stream deploy. I am not sure what do you mean by the agents here.