I am building a data pipeline for batch processing. And I find that Spring Cloud Data Flow is a quite attractive framework to use. Without much knowledge in SCDF and Kubernetes, I am not sure whether it is possible to conditionally launch a Spring Cloud Task on a specific machine.
Suppose I have two physical servers that are for running the batch process (Server A and Server B). By default, I would like my Spring cloud task to be launched on Server A. If the Server A is shut down, the task should be deployed on server B. Can Kubernetes / SCDF handle this kind of mechanism? I am wondering whether the nodeselector is the thing that I should look into.
Yes, you can pass deployment.nodeSelector as a deployment property when launching the task.
The deployment.nodeSelector is a Kubernetes deployment property and hence, you need to pass something like this:
task launch mytask --properties "deployer.<taskAppName>.kubernetes.deployment.nodeSelector=foo1:bar1,foo2:bar2"
You can check the list of supported Kubernetes deployer properties here
Related
I have developed a backend server using multiple microservices, using spring cloud.
I have discovery service, config service, and different other services.
Right now for testing purposes, I use docker-compose to run them in the right order. Now I have decided to deploy my application on AWS.
I thought of using running them using ECS using fargare, But I am not able to understand how can I define dependency among my tasks.
I found this article https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_dependson
It defines dependency among containers in the same task.
But I do not think that I can run all my services with just one task as there will be complications in assigning vCPUs, even if I use 4vCPUs and huge memory then also I am not sure how well my containers will run. and after that scaling them will be another issue. Overall having such huge vCPUs and memory will incur a lot of costs as well.
Is there any way to define dependency among ECS tasks?
CloudFormation supports the DependsOn attribute which allows you to control the sequence of deployment (you basically trade off speed of parallelism for ordered deployments when you need them).
Assuming your tasks are started as part of ECS services you can set the DependsOn to a service that needs to start first.
E.g.
Resources:
WebService:
DependsOn:
- AppService
Properties:
....
....
Out of curiosity, how did you move from Compose to CloudFormation? FYI we have been working with Docker to add capabilities into the Docker toolset to deploy directly to ECS (basically converting docker compose files into CloudFormation IaC). See here for more background. BTW this mechanism honors the compose dependency chain. That is, if you set one service being dependent on the other in compose, the resulting CFN template uses the DependsOn attribute I described above.
I’m setting up a new Spring Batch Jobs and want to deploy it using SCDF. However, I have found that SCDF does not support scheduler feature in local framework.
I have 3 questions to ask you:
Can someone explain how scheduler of SCDF work?
Are there any ways to schedule 1 job using SCDF?
Can I use my local server as a Cloud Foundry? and how?
Yes, Spring Cloud Data Flow does not support scheduling on local platform. Please note that the local SCDF server is for development purposes only and by design, the scheduling support is intended to be relying on the platform. Hence, SCDF scheduling feature is supported on Cloud Foundry and Kubernetes using the CF and K8s schedulers.
1) Can s/o explain how scheduler of SCDF work?
sure, Similar to how the deployer is used for launching task/deploying the stream, there is an SPI for scheduling the tasks under spring-cloud-deployer project. The underlying scheduler implementations can implement this. Currently, we have CF and K8s scheduler implementations in spring-cloud-deployer-cloudfoundry and spring-cloud-deployer-kubernetes.
As a user, you can configure a scheduler for a task (batch) application (via SCDF Dashboard, shell etc.,). You can specify a cron expression to schedule the task. Once configured, the SCDF delegates the schedule request to the platform scheduler using the above-mentioned scheduler implementations. Once scheduled, it is the platform (PCF scheduler on CF, K8s scheduler on K8s) that takes care of the task using the schedule.
2) Are there any ways to schedule 1 job using SCDF?
Yes, based on the answer from 1
3) Can I use my local server as a cloud Foundry? and How?
To run SCDF on local pointing to the CF instance, you can set the necessary CF deployer properties and start the SCDF server instance. It is similar to how you configure multi platforms in SCDF server. You can find more documentation on this here.
I wanna deploy the Spring-cloud-data-flow on several hosts.
I will deploy the server of Spring-cloud-data-flow on one host-A, and deploy the agents on the other hosts(These hosts are in charge of executing the tasks).
Except the host-A, all the other hosts run the same tasks.
Shall I modify on the basis of the Spring Cloud Data Flow Local Server or on the Spring Cloud Data Flow Apache Yarn Server or other better choice?
Do you mean how the apps are deployed on several hosts? If so, the apps are deployed using the underlying deployer implementation. For instance, if it is local deployer then, each app is deployed by spawning a new process. You can scale out the number apps deployment using the count property during the stream deploy. I am not sure what do you mean by the agents here.
We have an existing microservice environment with logstash, config and eureka servers. We are now setting up a Spring Cloud Dataflow (Kubernetes) environment (primarily intially to run tasks/batch jobs).
Ideally we would like the tasks to use the existing logstash, config and eureka servers via the standard spring boot configuration (annotations etc) to support the following scenarios:
Logstash: When a task runs its logs are output to logstash and viewable from Kibana
Config Server: To support changing configuration properties for tasks. eg a periodic task's configuration can be tweaked by altering the values on the configuration server and next time the task runs it will use the new values.
My understanding is that config server properties will override properties in the task definition which override properties in the internal application.properties.
Eureka: Each task would register itself in Eureka. The main reason for this is that our tasks have web actuator endpoints exposed and we can then can use Spring Boot Admin (which can discover services via eureka) to access the actuator endpoints and information while a task is running.
(Some of our tasks can take hours to run and this would enable us to monitor them, adjust logging etc)
Is this a sensible approach - or are there any potential issues to look out for here (eg short lived tasks with eureka). I can’t find any discussion of this in the existing spring cloud data flow or spring cloud task documentation.
You may try logstash-logback-encoder for SCDF integration with ELK stack. It works fine for our SCDF on Yarn stream application.
Config Server should work for any Spring Boot application.
I have multiple quartz cron jobs in a load balanced environment. Currently these jobs are running on each node, which is not desirable. I want a node to run only a particular scheduler and if the node crashes, another node should run the scheduler intended for the node that crashed.
How can this be done with spring 2.5.6/tomcat load balancer.
I think there's a few aspects to this question.
Firstly, Quartz has API methods for pausing and resuming the Scheduler, or even individual triggers and jobs
e.g.
http://www.jarvana.com/jarvana/view/opensymphony/quartz/1.6.1/quartz-1.6.1-javadoc.jar!/org/quartz/Scheduler.html#standby()
I would create a spring bean with a reference to the Quartz scheduler or trigger, and a simple isMasterNode boolean member for storing state. I'd then expose 2 [restricted-access] web service calls: makeMaster and makeSlave, which will call Scheduler.resume() or standby/pause, respectively.
Finall, the big question is how & with what you determine that another node has 'crashed'.
If you're using a hardware loadbalancer to manage this, you could configure it to call the 'makeMaster' WS on the new 'primary' node, which in turn calls Scheduler.resume() or similar.
hth