Can Spring XD be used as a platform for comprehensive Spring batch workflows? - spring-batch

Spring XD provides platform for launching for the batch jobs. Does that cover comprehensive workflows for all batch job use-cases? Or it is meant to be used within the context of Spring XD use-cases.
For example someone who wants to use just spring-batch not necessarily all the features of data ingestion/real-time analytics, will they still be benefited by setting up the Spring XD DIRT just for executing batch workflows? In that case, are there any limitations not being able to use all batch workflows supported by spring-batch?

In short, yes it can be used as a comprehensive batch platform. Spring XD provides a number of compelling features currently with more coming in the future. Features Spring XD provide for batch solutions:
Job orchestration - Spring Batch explicitly avoids the problem of job orchestration so that the developer can use whatever tool they want. Spring XD brings orchestration in a distributed environment via scheduling of jobs, executing ad hoc jobs, and executing jobs on the result of some form of logic (polling a directory for a file for example).
Abstraction of Spring Batch and Spring Integration - Spring Batch and Spring Integration are commonly used in solutions to address more complex scenarios. For example, if you need to FTP a file to a server, then kick off a batch job once it's there, you'd use Spring Integration for the FTP piece and to kick off the job with Spring Batch handling the processing of the job. Spring XD provides an elegent abstraction of those components to allow for easy assembly of these into more robust solutions.
Simplification of remote partitioning - Spring XD provides facilities to simplify the wiring of the communication aspects of remote partitioning within Spring Batch.
Interaction of jobs via UI, shell, or REST - Spring XD exposes a number of metrics and functionality to be consumable via their web based UI, the interactive shell, or REST based end points.
The main limit as of Spring XD 1.0 for batch processing is the inability to execute nested jobs (using a JobStep). I believe this will be part of Spring XD 1.1 (https://jira.spring.io/browse/XD-1972).
Looking forward, other features that I would expect in future versions of Spring XD are around high availability for jobs. Currently if a job is deployed on a node and the node goes down, it will be redeployed automatically. In future releases, the ability to restart the job automatically upon redeployment would be possible.

Related

Is there an integration between Spring Batch and Spring Cloud Stream?

My project has a lot of Spring Batch Jobs.
I have requirement to create externalized configuration for message brokers (example Kafka, rabbitMQ etc.).
I want to use spring cloud stream since it has various binders to solve this problem.
Hence i wanted to understand if there is an integration between the two frameworks, please explain.

setting up stand alone spring batch job admin portal for existing jobs

Currently, I have spring batch jobs developed in spring-batch 2.1
as there are so many jobs and they are running fine for a long time - so upgrading to the latest version will take some time.
Till then I want to set up spring batch admin portal have not found any firm solution so far
posting it for information
decided to go further with spring cloud data flow -
we needed minimal changes in our existing spring batch jobs - so this is what we did
Spring batch upgrade - to - Spring batch 3.0.1
and then use - spring-boot-starter-batch 1.1.4
This helped us to build a batch admin tool - with spring cloud data flow with the spring batch jobs.
We referred this blog from the Baeldung
BATCH_Processing_with_SPRING_CLOUD_DATA_FLOW

how to write task information in the spring data flow UI manually

I integrate spring batch into a restful controller of a spring boot, which means now we operate spring batch program by send a restful call. In this case, we cannot make a jar and register the jar on spring data flow server. So my question is that how to register a task if we don't have jar
You've asked a few similar questions today.
My recommendation is that you could consider referring to the ref. guide of Spring Cloud Task and Spring Cloud Data Flow. Specifically, pay attention to the Spring Batch section.
Once you have the understanding as to what to do, you can build a batch-job as a Spring Cloud Task application, and run it standalone successfully.
If it runs locally as expected, you can switch to SCDF and register the JAR using the REST-API, Shell, or in the GUI. You'd need a physical uber-jar of the application for it. With that registered, you can then build a Task definition with it, and launch it from SCDF.
If you want to do all of the above programmatically, please have a look at the acceptance-test suite for examples.

spring Batch flow job Vs spring composed task

I want to execute my apps using spring-complex-task and I have already build complex spring-batch Flow Jobs which executes perfectly fine.
could you please explain what is difference between spring Batch flow job Vs spring composed task? and which is best among them?
A composed task within Spring Cloud Data Flow is actually built on Spring Batch in that the transition from task to task is managed by a dynamically generated Spring Batch job. This model allows the decomposition of a batch job into reusable parts that can be independently tested, deployed, and orchestrated at a level higher than a job. This allows for things like writing a single step job that is reusable across multiple workflows.
They are really complimentary. You can use a composed task within Spring Cloud Data Flow to orchestrate both Spring Cloud Tasks as well as Spring Batch jobs (run as tasks). It really depends on how you want to slice up your process. If you have processes that are tightly coupled, package them as a single job. From there, you can orchestrate them with Spring Cloud Data Flow's composed task functionality.
In general, there's not one that's "better". It's going to be dependent on your use case and requirements.
Spring Batch is a nice framework to run batch processing applications.
Spring Cloud Task is a wrapper that allows you to run short lived microservices using Spring Cloud along with Spring Boot. Once you setup a test with #EnableTask it will then launch your *Runner. The framework also comes with Spring Batch integration points and ComposedTaskRunner helps facilitate that integration.
I'd start with the Spring Cloud Task batch documentation and then come back to ask more specific questions.

Spring Cloud components confusion

How do these Spring components relate/differ to/from each other? What does each represent conceptually? Would one use them together or are they competing projects?
Spring Cloud Data Flow
Spring Cloud Stream
Spring Cloud Task
Spring Cloud Task App Starters
Spring Batch
From my understanding, SC Tasks are just "units of work" to execute, a processing unit in the form of a short-lived/task-based microservice. SC Data Flow is orchestration for the tasks. These two I (think I) understand how they relate and what they represent conceptually, but a lot of documentation and examples talk about the other projects in the same context.
I also thought that SC Task was a replacement for Spring Batch but in some examples they seem to imply that Spring Batches are executed inside SC Tasks
Thanks for your interest in Spring Cloud projects! Find below the high-level introductions for the primary projects involved in Spring Cloud Data Flow (SCDF) ecosystem. The launch blog covers the backstory and among other details.
Spring Cloud Stream is a lightweight event-driven microservices framework to quickly build applications that can connect to external systems (eg: Kafka, Cassandra, MySQL, Hadoop, ..).
Spring Cloud Task is a short-lived microservices framework to quickly build applications that perform finite amounts of data processing (eg: batch-jobs, ..). The connection with Spring Batch framework is explained in the launch blog linked above.
Spring Cloud Data Flow provides the orchestration mechanics to deploy applications built with Spring Cloud Stream and Spring Cloud Task programming model to a variety of runtime platforms including Cloud Foundry, Apache Yarn, Apache Mesos and Kubernetes. There's community developed SCDF implementations for OpenShift and Nomad, too. More details here.
The building blocks visual from the project site should cover the high-level interaction between the various projects in SCDF's ecosystem.