External Kafka Stream Source in Spring Cloud Data Flow - spring-data

I am migrating from Streamset to Spring Cloud Data Flow. When I am looking for module list I realized that some of the sources are not listed in Spring Cloud Flow - One of them is KAFKA source.
My question is why external KAFKA source is removed from standard sources list in spring cloud data flow ?

It is not that it is removed, but rather does not exist yet. See https://github.com/spring-cloud/stream-applications/issues/265

Related

Is there an integration between Spring Batch and Spring Cloud Stream?

My project has a lot of Spring Batch Jobs.
I have requirement to create externalized configuration for message brokers (example Kafka, rabbitMQ etc.).
I want to use spring cloud stream since it has various binders to solve this problem.
Hence i wanted to understand if there is an integration between the two frameworks, please explain.

Difference between Spring Cloud Kafka Streams Vs Spring Cloud Stream?

Whats the difference between Spring Cloud Kafka Streams Vs Spring Cloud Stream Vs Spring Cloud Function Vs Spring AMQP and Spring for Apache Kafka?
Spring for Apache Kafka and Spring AMQP are foundational libraries for writing Spring friendly applications for Apache Kafka and AMQP respectively. They provide design patterns such as templates, message listener containers, and a wide array of other mechanisms to interact with the middleware systems at a lower level. These libraries do not require Spring Boot, but Spring Framework is the least common denominator. In other words, you can write a traditional Spring application with only Spring Framework contexts using these libraries.
Spring Cloud Function is a library that is part of the Spring Cloud portfolio projects. This is used as part of Spring Boot applications. It gives a consistent programming model for writing applications that involve various paradigms such as request-response (HTTP), event-driven (pub-sub), stream-processing (pub-sub/streaming), reactive streams, etc. The programming model at the application level is through the Java 8 functional model - for example you can write your business logic as a java.util.function.Function<?, ?>. Spring Cloud Function is not coupled with any middleware or other such technologies.
Spring Cloud Stream is another Spring Cloud project that is specifically built for event-driven and stream-processing usecases. Because this is a Spring Cloud project, it requires to be used as part of a Spring Boot application. The recent versions of Spring Cloud Stream is built on the foundations that Spring Cloud Function provides. This is essentially a destination binding framework that allows you to provide a destination - such as a Kafka topic or a RabbitMQ exchange. Spring Cloud Stream will bind those destinations for the application. The core Spring Cloud Stream does not have any middleware dependencies. That's where the binder implementations come in.
Spring Cloud Stream provides two kinds of Kafka binders - spring-cloud-stream-binder-kafka and spring-cloud-stream-binder-kafka-streams. The first one is a binder implementation where it provides programming model support for writing regular Kafka producers and consumers. For the most part, you can take this same application and provide another binder (such as spring-cloud-stream-binder-rabbit) and it should work (provided that the application makes the right configuration changes). This is because the binders are the ones concerned with lower-level details of communicating to the middleware and not the app itself. Apps can largely focus on the business logic at hand. The Kafka Streams binder in Spring Cloud Stream is a binder implementation specifically built for writing streaming applications using Kafka Streams. Both Kafka binder implementations use Spring for Apache Kafka under the hood.
The rabbit binder in Spring Cloud Stream uses Spring AMQP internally.
To summarize:
Spring for Apache Kafka/Spring AMQP - lower-level foundational libraries, do not require Spring Boot.
Spring Cloud Function - Spring Cloud project providing Java 8 functional programming model, Used with Spring Boot applicaitons.
Spring Cloud Stream - Framework for event-driven applications using Spring Cloud Function. Used with Spring Boot applications.
Spring Cloud Stream Kafka/Kafka Streams - Spring Cloud Stream binder implementation using Spring for Apache Kafka. Used with Spring Boot applications.

Configuring Spring Cloud Stream in Camden.SR5 with Spring boot 1.5.1

First off, thanks to the Spring team for all their work pushing this work forward!
Now that Camden.SR5 is official, I have some questions on how to correctly configure the spring cloud stream kafka binder when using Spring Boot 1.5.1.
Spring boot 1.5.1 has auto configuration for kafka and those configuration options seem to be redundant with those in the spring cloud stream kafka binder.
Do we use the core spring boot properties (spring.kafka.) or do we use (spring.cloud.stream.kafka.binder.)?
I did find this issue, but I am curious if this work will be included in the next Camden release?
https://github.com/spring-cloud/spring-cloud-stream-binder-kafka/issues/73
Additionally, I saw this issue reported on Stack Overflow and I believe it will also be an issue with Camden.SR5?
Failed to start bean 'inputBindingLifecycle' when using spring-boot:1.5.1 and spring-cloud-stream
Thanks
Supporting the Boot 1.5 configuration options is an issue in progress. Also, since dedicated 1.5 support is coming only with Spring Cloud Stream Chelsea release train (which is included in the Dalston release of Spring Cloud), it will be available only there.
Also, when using Spring Cloud Camden with Boot 1.5 you will need to override the Kafka dependencies as described in Failed to start bean 'inputBindingLifecycle' when using spring-boot:1.5.1 and spring-cloud-stream. This will be avoided in future versions of Spring Cloud Stream (and Spring Cloud) but only starting in the Chelsea release train of Spring Cloud Stream (and the Dalston release of Spring Cloud) - see https://github.com/spring-cloud/spring-cloud-stream-binder-kafka/issues/88 for details.

Spring Cloud components confusion

How do these Spring components relate/differ to/from each other? What does each represent conceptually? Would one use them together or are they competing projects?
Spring Cloud Data Flow
Spring Cloud Stream
Spring Cloud Task
Spring Cloud Task App Starters
Spring Batch
From my understanding, SC Tasks are just "units of work" to execute, a processing unit in the form of a short-lived/task-based microservice. SC Data Flow is orchestration for the tasks. These two I (think I) understand how they relate and what they represent conceptually, but a lot of documentation and examples talk about the other projects in the same context.
I also thought that SC Task was a replacement for Spring Batch but in some examples they seem to imply that Spring Batches are executed inside SC Tasks
Thanks for your interest in Spring Cloud projects! Find below the high-level introductions for the primary projects involved in Spring Cloud Data Flow (SCDF) ecosystem. The launch blog covers the backstory and among other details.
Spring Cloud Stream is a lightweight event-driven microservices framework to quickly build applications that can connect to external systems (eg: Kafka, Cassandra, MySQL, Hadoop, ..).
Spring Cloud Task is a short-lived microservices framework to quickly build applications that perform finite amounts of data processing (eg: batch-jobs, ..). The connection with Spring Batch framework is explained in the launch blog linked above.
Spring Cloud Data Flow provides the orchestration mechanics to deploy applications built with Spring Cloud Stream and Spring Cloud Task programming model to a variety of runtime platforms including Cloud Foundry, Apache Yarn, Apache Mesos and Kubernetes. There's community developed SCDF implementations for OpenShift and Nomad, too. More details here.
The building blocks visual from the project site should cover the high-level interaction between the various projects in SCDF's ecosystem.

is it possible to Use Kafka with Google cloud Dataflow

i have two question
1) I want to use Kafka with Google cloud Dataflow Pipeline program. in my pipeline program I want to read data from kafka is it possible?
2) I created Instance with BigQuery enabled now i want to enable Pubsub how can i do ?
(1) Ad mentioned by Raghu, support for writing to/reading from Kafka was added to Apache Beam in mid-2016 with the KafkaIO package. You can check the package's documentation[1] to see how to use it.
(2) I'm not quite sure what you mean. Can you provide more details?
[1] https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.html
Kafka support was added to Dataflow (and Apache Beam) in mid 2016. You can read and write to Kafka streaming pipelines. See JavaDoc for KafkaIO in Apache Beam.
(2) As of April 27, 2015, you can enable Cloud Pub/Sub API as follows:
Go to your project page on the Developer Console
Click APIs & auth -> APIs
Click More within Google Cloud APIs
Click Cloud Pub/Sub API
Click Enable API