spring Batch flow job Vs spring composed task - spring-batch

I want to execute my apps using spring-complex-task and I have already build complex spring-batch Flow Jobs which executes perfectly fine.
could you please explain what is difference between spring Batch flow job Vs spring composed task? and which is best among them?

A composed task within Spring Cloud Data Flow is actually built on Spring Batch in that the transition from task to task is managed by a dynamically generated Spring Batch job. This model allows the decomposition of a batch job into reusable parts that can be independently tested, deployed, and orchestrated at a level higher than a job. This allows for things like writing a single step job that is reusable across multiple workflows.
They are really complimentary. You can use a composed task within Spring Cloud Data Flow to orchestrate both Spring Cloud Tasks as well as Spring Batch jobs (run as tasks). It really depends on how you want to slice up your process. If you have processes that are tightly coupled, package them as a single job. From there, you can orchestrate them with Spring Cloud Data Flow's composed task functionality.

In general, there's not one that's "better". It's going to be dependent on your use case and requirements.
Spring Batch is a nice framework to run batch processing applications.
Spring Cloud Task is a wrapper that allows you to run short lived microservices using Spring Cloud along with Spring Boot. Once you setup a test with #EnableTask it will then launch your *Runner. The framework also comes with Spring Batch integration points and ComposedTaskRunner helps facilitate that integration.
I'd start with the Spring Cloud Task batch documentation and then come back to ask more specific questions.

Related

Is there an integration between Spring Batch and Spring Cloud Stream?

My project has a lot of Spring Batch Jobs.
I have requirement to create externalized configuration for message brokers (example Kafka, rabbitMQ etc.).
I want to use spring cloud stream since it has various binders to solve this problem.
Hence i wanted to understand if there is an integration between the two frameworks, please explain.

how to write task information in the spring data flow UI manually

I integrate spring batch into a restful controller of a spring boot, which means now we operate spring batch program by send a restful call. In this case, we cannot make a jar and register the jar on spring data flow server. So my question is that how to register a task if we don't have jar
You've asked a few similar questions today.
My recommendation is that you could consider referring to the ref. guide of Spring Cloud Task and Spring Cloud Data Flow. Specifically, pay attention to the Spring Batch section.
Once you have the understanding as to what to do, you can build a batch-job as a Spring Cloud Task application, and run it standalone successfully.
If it runs locally as expected, you can switch to SCDF and register the JAR using the REST-API, Shell, or in the GUI. You'd need a physical uber-jar of the application for it. With that registered, you can then build a Task definition with it, and launch it from SCDF.
If you want to do all of the above programmatically, please have a look at the acceptance-test suite for examples.

Spring Cloud components confusion

How do these Spring components relate/differ to/from each other? What does each represent conceptually? Would one use them together or are they competing projects?
Spring Cloud Data Flow
Spring Cloud Stream
Spring Cloud Task
Spring Cloud Task App Starters
Spring Batch
From my understanding, SC Tasks are just "units of work" to execute, a processing unit in the form of a short-lived/task-based microservice. SC Data Flow is orchestration for the tasks. These two I (think I) understand how they relate and what they represent conceptually, but a lot of documentation and examples talk about the other projects in the same context.
I also thought that SC Task was a replacement for Spring Batch but in some examples they seem to imply that Spring Batches are executed inside SC Tasks
Thanks for your interest in Spring Cloud projects! Find below the high-level introductions for the primary projects involved in Spring Cloud Data Flow (SCDF) ecosystem. The launch blog covers the backstory and among other details.
Spring Cloud Stream is a lightweight event-driven microservices framework to quickly build applications that can connect to external systems (eg: Kafka, Cassandra, MySQL, Hadoop, ..).
Spring Cloud Task is a short-lived microservices framework to quickly build applications that perform finite amounts of data processing (eg: batch-jobs, ..). The connection with Spring Batch framework is explained in the launch blog linked above.
Spring Cloud Data Flow provides the orchestration mechanics to deploy applications built with Spring Cloud Stream and Spring Cloud Task programming model to a variety of runtime platforms including Cloud Foundry, Apache Yarn, Apache Mesos and Kubernetes. There's community developed SCDF implementations for OpenShift and Nomad, too. More details here.
The building blocks visual from the project site should cover the high-level interaction between the various projects in SCDF's ecosystem.

Can Spring XD be used as a platform for comprehensive Spring batch workflows?

Spring XD provides platform for launching for the batch jobs. Does that cover comprehensive workflows for all batch job use-cases? Or it is meant to be used within the context of Spring XD use-cases.
For example someone who wants to use just spring-batch not necessarily all the features of data ingestion/real-time analytics, will they still be benefited by setting up the Spring XD DIRT just for executing batch workflows? In that case, are there any limitations not being able to use all batch workflows supported by spring-batch?
In short, yes it can be used as a comprehensive batch platform. Spring XD provides a number of compelling features currently with more coming in the future. Features Spring XD provide for batch solutions:
Job orchestration - Spring Batch explicitly avoids the problem of job orchestration so that the developer can use whatever tool they want. Spring XD brings orchestration in a distributed environment via scheduling of jobs, executing ad hoc jobs, and executing jobs on the result of some form of logic (polling a directory for a file for example).
Abstraction of Spring Batch and Spring Integration - Spring Batch and Spring Integration are commonly used in solutions to address more complex scenarios. For example, if you need to FTP a file to a server, then kick off a batch job once it's there, you'd use Spring Integration for the FTP piece and to kick off the job with Spring Batch handling the processing of the job. Spring XD provides an elegent abstraction of those components to allow for easy assembly of these into more robust solutions.
Simplification of remote partitioning - Spring XD provides facilities to simplify the wiring of the communication aspects of remote partitioning within Spring Batch.
Interaction of jobs via UI, shell, or REST - Spring XD exposes a number of metrics and functionality to be consumable via their web based UI, the interactive shell, or REST based end points.
The main limit as of Spring XD 1.0 for batch processing is the inability to execute nested jobs (using a JobStep). I believe this will be part of Spring XD 1.1 (https://jira.spring.io/browse/XD-1972).
Looking forward, other features that I would expect in future versions of Spring XD are around high availability for jobs. Currently if a job is deployed on a node and the node goes down, it will be redeployed automatically. In future releases, the ability to restart the job automatically upon redeployment would be possible.

Spring batch application integration with spring batch admin

I developed one spring batch application which is deployed as executable jar using batch/shell script. It works fine.
Now recently I read about spring batch admin application release. As per their doc, they say you have to point to job-context.xml and that will allow to manage spring batch app to be started,restarted and stopped from admin app. Now my question is do I have to keep my job-context.xml outside the jar or what are the exact steps, i am confused about this configuration.
Any insight on this is very useful and by the way I am using spring batch 2.1.
Thanks
The Spring Batch admin application is a good reference implementation and is highly customizable. All interface implementations may be replaced via Spring DI using your own classes. UI is also template driven(FreeMarker I think) and therefore may be customized to display relevant information, change skin etc.
I had a similar need like yours - need admin functionality included in an app built as jar. I did not quite like the fact that I had to package my jobs as a .war file. Instead I extracted relevant configurations from Spring Batch Admin source and created a deployment that works off file system and runs on embedded Jetty server.
See screen shots here : https://github.com/regunathb/Trooper/wiki/Trooper-Batch-Web-Console
Source, configurations etc are available here : https://github.com/regunathb/Trooper/tree/master/batch-core . This project actually creates a .jar and not .war
If your application has custom classes and is deployed as a runnable jar and not contained within the spring batch admin, you cannot start jobs. You can only view the status of jobs and "kill" their status in the database.
If you look at http://static.springsource.org/spring-batch-admin/reference/reference.xhtml at the end of the Configuration Upload section it states
You can see a new entry in the job registry ("test-job") which is
launchable in-process because the application has a reference to the
Job. (Jobs which are not launchable were executed out of process, but
used the same database for its JobRepository, so they show up with
their executions in the UI.)
If your jobs are strictly configurable jobs, as-in you use only XML to define them and do not need to do any customized item readers/processors/writers or other custom classes, then you can upload the job XML and it will be runnable from within the admin site. If you have custom classes then, from my experience, you will have to have the spring batch application deployed within your web application and then upload an XML that contains the jobs you want to run separately.
I personally just used the Admin tool to view job status and provide me with statistics through some custom pages. I left the scheduler to run the jobs and I didn't want those with access to the admin site to kick off a job when they knew nothing about it. Basically, used it to give the users a warm fuzzy without allowing them to muck it up. (leave it to a user to find an edge case you didn't account for)