Spring Cloud components confusion - spring-batch

How do these Spring components relate/differ to/from each other? What does each represent conceptually? Would one use them together or are they competing projects?
Spring Cloud Data Flow
Spring Cloud Stream
Spring Cloud Task
Spring Cloud Task App Starters
Spring Batch
From my understanding, SC Tasks are just "units of work" to execute, a processing unit in the form of a short-lived/task-based microservice. SC Data Flow is orchestration for the tasks. These two I (think I) understand how they relate and what they represent conceptually, but a lot of documentation and examples talk about the other projects in the same context.
I also thought that SC Task was a replacement for Spring Batch but in some examples they seem to imply that Spring Batches are executed inside SC Tasks

Thanks for your interest in Spring Cloud projects! Find below the high-level introductions for the primary projects involved in Spring Cloud Data Flow (SCDF) ecosystem. The launch blog covers the backstory and among other details.
Spring Cloud Stream is a lightweight event-driven microservices framework to quickly build applications that can connect to external systems (eg: Kafka, Cassandra, MySQL, Hadoop, ..).
Spring Cloud Task is a short-lived microservices framework to quickly build applications that perform finite amounts of data processing (eg: batch-jobs, ..). The connection with Spring Batch framework is explained in the launch blog linked above.
Spring Cloud Data Flow provides the orchestration mechanics to deploy applications built with Spring Cloud Stream and Spring Cloud Task programming model to a variety of runtime platforms including Cloud Foundry, Apache Yarn, Apache Mesos and Kubernetes. There's community developed SCDF implementations for OpenShift and Nomad, too. More details here.
The building blocks visual from the project site should cover the high-level interaction between the various projects in SCDF's ecosystem.

Related

Is there an integration between Spring Batch and Spring Cloud Stream?

My project has a lot of Spring Batch Jobs.
I have requirement to create externalized configuration for message brokers (example Kafka, rabbitMQ etc.).
I want to use spring cloud stream since it has various binders to solve this problem.
Hence i wanted to understand if there is an integration between the two frameworks, please explain.

Difference between Spring Cloud Kafka Streams Vs Spring Cloud Stream?

Whats the difference between Spring Cloud Kafka Streams Vs Spring Cloud Stream Vs Spring Cloud Function Vs Spring AMQP and Spring for Apache Kafka?
Spring for Apache Kafka and Spring AMQP are foundational libraries for writing Spring friendly applications for Apache Kafka and AMQP respectively. They provide design patterns such as templates, message listener containers, and a wide array of other mechanisms to interact with the middleware systems at a lower level. These libraries do not require Spring Boot, but Spring Framework is the least common denominator. In other words, you can write a traditional Spring application with only Spring Framework contexts using these libraries.
Spring Cloud Function is a library that is part of the Spring Cloud portfolio projects. This is used as part of Spring Boot applications. It gives a consistent programming model for writing applications that involve various paradigms such as request-response (HTTP), event-driven (pub-sub), stream-processing (pub-sub/streaming), reactive streams, etc. The programming model at the application level is through the Java 8 functional model - for example you can write your business logic as a java.util.function.Function<?, ?>. Spring Cloud Function is not coupled with any middleware or other such technologies.
Spring Cloud Stream is another Spring Cloud project that is specifically built for event-driven and stream-processing usecases. Because this is a Spring Cloud project, it requires to be used as part of a Spring Boot application. The recent versions of Spring Cloud Stream is built on the foundations that Spring Cloud Function provides. This is essentially a destination binding framework that allows you to provide a destination - such as a Kafka topic or a RabbitMQ exchange. Spring Cloud Stream will bind those destinations for the application. The core Spring Cloud Stream does not have any middleware dependencies. That's where the binder implementations come in.
Spring Cloud Stream provides two kinds of Kafka binders - spring-cloud-stream-binder-kafka and spring-cloud-stream-binder-kafka-streams. The first one is a binder implementation where it provides programming model support for writing regular Kafka producers and consumers. For the most part, you can take this same application and provide another binder (such as spring-cloud-stream-binder-rabbit) and it should work (provided that the application makes the right configuration changes). This is because the binders are the ones concerned with lower-level details of communicating to the middleware and not the app itself. Apps can largely focus on the business logic at hand. The Kafka Streams binder in Spring Cloud Stream is a binder implementation specifically built for writing streaming applications using Kafka Streams. Both Kafka binder implementations use Spring for Apache Kafka under the hood.
The rabbit binder in Spring Cloud Stream uses Spring AMQP internally.
To summarize:
Spring for Apache Kafka/Spring AMQP - lower-level foundational libraries, do not require Spring Boot.
Spring Cloud Function - Spring Cloud project providing Java 8 functional programming model, Used with Spring Boot applicaitons.
Spring Cloud Stream - Framework for event-driven applications using Spring Cloud Function. Used with Spring Boot applications.
Spring Cloud Stream Kafka/Kafka Streams - Spring Cloud Stream binder implementation using Spring for Apache Kafka. Used with Spring Boot applications.

Implementation of Spring Boot microservice using Spring Cloud

I am a beginner in Spring mvc, Spring Boot and Spring Data JPA. I am trying to create Microservices using Spring Boot. I created a sample database CRUD operation as microservice in Spring Boot. Now I have A requirement that develop a microservice using Spring Cloud.
When I referring documentation seeing Spring tools for creating application in distributed environment. I am confused about why we are using Spring Cloud? And what is actually meant by Spring Cloud? Is there any relation with Spring mvc?
Spring Cloud is for developing some of the common patterns in distributed systems.
Spring Cloud provides tools for developers to quickly build some of the common patterns in distributed systems (e.g. configuration management, service discovery, circuit breakers, intelligent routing, micro-proxy, control bus, one-time tokens, global locks, leadership election, distributed sessions, cluster state)
Spring Cloud
For Spring Boot and Spring MVC, see this nice answer difference-between-spring-mvc-and-spring-boot

spring Batch flow job Vs spring composed task

I want to execute my apps using spring-complex-task and I have already build complex spring-batch Flow Jobs which executes perfectly fine.
could you please explain what is difference between spring Batch flow job Vs spring composed task? and which is best among them?
A composed task within Spring Cloud Data Flow is actually built on Spring Batch in that the transition from task to task is managed by a dynamically generated Spring Batch job. This model allows the decomposition of a batch job into reusable parts that can be independently tested, deployed, and orchestrated at a level higher than a job. This allows for things like writing a single step job that is reusable across multiple workflows.
They are really complimentary. You can use a composed task within Spring Cloud Data Flow to orchestrate both Spring Cloud Tasks as well as Spring Batch jobs (run as tasks). It really depends on how you want to slice up your process. If you have processes that are tightly coupled, package them as a single job. From there, you can orchestrate them with Spring Cloud Data Flow's composed task functionality.
In general, there's not one that's "better". It's going to be dependent on your use case and requirements.
Spring Batch is a nice framework to run batch processing applications.
Spring Cloud Task is a wrapper that allows you to run short lived microservices using Spring Cloud along with Spring Boot. Once you setup a test with #EnableTask it will then launch your *Runner. The framework also comes with Spring Batch integration points and ComposedTaskRunner helps facilitate that integration.
I'd start with the Spring Cloud Task batch documentation and then come back to ask more specific questions.

Can Spring XD be used as a platform for comprehensive Spring batch workflows?

Spring XD provides platform for launching for the batch jobs. Does that cover comprehensive workflows for all batch job use-cases? Or it is meant to be used within the context of Spring XD use-cases.
For example someone who wants to use just spring-batch not necessarily all the features of data ingestion/real-time analytics, will they still be benefited by setting up the Spring XD DIRT just for executing batch workflows? In that case, are there any limitations not being able to use all batch workflows supported by spring-batch?
In short, yes it can be used as a comprehensive batch platform. Spring XD provides a number of compelling features currently with more coming in the future. Features Spring XD provide for batch solutions:
Job orchestration - Spring Batch explicitly avoids the problem of job orchestration so that the developer can use whatever tool they want. Spring XD brings orchestration in a distributed environment via scheduling of jobs, executing ad hoc jobs, and executing jobs on the result of some form of logic (polling a directory for a file for example).
Abstraction of Spring Batch and Spring Integration - Spring Batch and Spring Integration are commonly used in solutions to address more complex scenarios. For example, if you need to FTP a file to a server, then kick off a batch job once it's there, you'd use Spring Integration for the FTP piece and to kick off the job with Spring Batch handling the processing of the job. Spring XD provides an elegent abstraction of those components to allow for easy assembly of these into more robust solutions.
Simplification of remote partitioning - Spring XD provides facilities to simplify the wiring of the communication aspects of remote partitioning within Spring Batch.
Interaction of jobs via UI, shell, or REST - Spring XD exposes a number of metrics and functionality to be consumable via their web based UI, the interactive shell, or REST based end points.
The main limit as of Spring XD 1.0 for batch processing is the inability to execute nested jobs (using a JobStep). I believe this will be part of Spring XD 1.1 (https://jira.spring.io/browse/XD-1972).
Looking forward, other features that I would expect in future versions of Spring XD are around high availability for jobs. Currently if a job is deployed on a node and the node goes down, it will be redeployed automatically. In future releases, the ability to restart the job automatically upon redeployment would be possible.