Spring cloud data flow deployment - deployment

I wanna deploy the Spring-cloud-data-flow on several hosts.
I will deploy the server of Spring-cloud-data-flow on one host-A, and deploy the agents on the other hosts(These hosts are in charge of executing the tasks).
Except the host-A, all the other hosts run the same tasks.
Shall I modify on the basis of the Spring Cloud Data Flow Local Server or on the Spring Cloud Data Flow Apache Yarn Server or other better choice?

Do you mean how the apps are deployed on several hosts? If so, the apps are deployed using the underlying deployer implementation. For instance, if it is local deployer then, each app is deployed by spawning a new process. You can scale out the number apps deployment using the count property during the stream deploy. I am not sure what do you mean by the agents here.

Related

How to use static spring cloud stream url for launching spring cloud tasks?

Platform used : Kubernetes.
I have an issue with Spring cloud stream url. I am launching my spring cloud tasks using spring cloud stream. Streams are deployed in kubernetes platform. Stream contains http-kafka as source and taskLauncerKafka as sink. I used http-kafka kubernetes service url to launch tasks. Kubernetes service url changes after each deployment which causes problem.The changes in the service name after each stream deployment is difficult to manage. I have tried enabling loadbalacer also. In that case also external ip-address changed after each stream roll-out.
I am using skipper for managing the deployments. Every time stream is deployed stream version changes which also changes stream url.
In my case , I have multiple instances from where I can launch spring-cloud task. If the stream url changes I need to make changes in the configmap of the deployment project for all instance and need to redeploy all instance.
Any solution ? I am thinking of centralised configuration management using spring-cloud-config server or zookeeper. In this case also I need to update the url. I can avoid deploying multiple instances using centralised configuration management.
Skipper server version : 2.4.1.RELEASE
Dataflow server version : 2.5.1.RELEASE
Which version of SCDF/Skipper you are running?
This looks similar to the issue https://github.com/spring-cloud/spring-cloud-skipper/issues/953 which was subsequently addressed in Skipper 2.6.0.

Possible to deploy or use several containers as one service in Google Cloud Run?

I am testing Google Cloud Run by following the official instruction:
https://cloud.google.com/run/docs/quickstarts/build-and-deploy
Is it possible to deploy or use several containers as one service in Google Cloud Run? For example: DB server container, Web server container, etc.
Short Answer NO. You can't deploy several container on the same service (as you could do with a Pod on K8S).
However, you can run several binaries in parallel on the same container -> This article has been written by a Googler that work on Cloud Run.
In addition, keep in mind
Cloud Run is a serverless product. It scales up and down (to 0) as it wants (but especially according with the traffic). If the startup duration is long and a new instance of your service is created, the query will take time to be served (and your use will wait)
You pay as you use, I means, you are billed only when HTTP requests are processed. Out of processing period, the CPU allocated to the instance is close to 0.
That implies that Cloud Run serves container that handle HTTP requests. You can't run a batch processing out of any HTTP request, in background.
Cloud Run is stateless. You have an ephemeral and in memory writable directory (/tmp) but when the instance goes down, all the data goes down. You can't run a DB server container that store data. You can interact with external services (Cloud SQL, Cloud Storage,...) but store only transient file locally
To answer your question directly, I do not think it is possible to deploy a service that has two different containers: DB server container, and Web server container. This does not include scaling (service is automatically scaled to a certain number of container instances).
However, you can deploy a container (a service) that contains multiple processes, although it might not be considered as best practices, as mentioned in this article.
Cloud Run takes a user's container and executes it on Google infrastructure, and handles the instantiation of instances (scaling) of that container, seamlessly based on parameters specified by the user.
To deploy to Cloud Run, you need to provide a container image. As the documentation points out:
A container image is a packaging format that includes your code, its packages, any needed binary dependencies, the operating system to use, and anything else needed to run your service.
In response to incoming requests, a service is automatically scaled to a certain number of container instances, each of which runs the deployed container image. Services are the main resources of Cloud Run.
Each service has a unique and permanent URL that will not change over time as you deploy new revisions to it. You can refer to the documentation for more details about the container runtime contract.
As a result of the above, Cloud Run is primarily designed to run web applications. If you are after a microservice architecture, which consists of different servers running each in unique containers, you will need to deploy multiple services. I understand that you want to use Cloud Run as database server, but perhaps you may be interested in Google's database solutions, like Cloud SQL, Datastore, BigTable or Spanner.

Spring Cloud Data Flow with Azure Event Hub limitations?

We plan to use Spring Cloud Data Flow on Azure Cloud using Azure EventHub as a messaging binder.
On Azure EventHub, there are hard limits :
100 Namespaces
10 topics per namespaces.
The Spring Cloud Azure Event Hub Stream Binder seems to be able to configure only one namespace, so how can we manage multiple namespaces?
Maybe we should use multiple binders, to have multiple instances of the Spring Cloud Azure Event Hub Stream Binder?
Does anyone have any ideas? or documentation we did not find?
Regards
RĂ©mi
Spring Cloud Data Flow and Spring Cloud Skipper support the concept of "platform accounts". Using that, you can set up multiple accounts, for each namespace or any other K8s clusters even. This opens a lot of flexibility to work around these hard limits in Azure stack.
We have a recipe on multi-platform deployments.
When deploying the streams from SCDF, you'd pick and choose the platform account (aka namespace or other configs), so automatically the deployed stream apps (with Azure binder in the classpath) would be running in different namespaces. Effectively, dodging the limits enforced in Azure.
The provenance tracking of where the apps run and the audit trail is automatically also captured in SCDF, so at any given time, you'd know who did what and in which namespace.

Conditionally launch Spring Cloud Task on a specific node of Kubernetes cluster

I am building a data pipeline for batch processing. And I find that Spring Cloud Data Flow is a quite attractive framework to use. Without much knowledge in SCDF and Kubernetes, I am not sure whether it is possible to conditionally launch a Spring Cloud Task on a specific machine.
Suppose I have two physical servers that are for running the batch process (Server A and Server B). By default, I would like my Spring cloud task to be launched on Server A. If the Server A is shut down, the task should be deployed on server B. Can Kubernetes / SCDF handle this kind of mechanism? I am wondering whether the nodeselector is the thing that I should look into.
Yes, you can pass deployment.nodeSelector as a deployment property when launching the task.
The deployment.nodeSelector is a Kubernetes deployment property and hence, you need to pass something like this:
task launch mytask --properties "deployer.<taskAppName>.kubernetes.deployment.nodeSelector=foo1:bar1,foo2:bar2"
You can check the list of supported Kubernetes deployer properties here

Is Spring Cloud Dataflow local server able to be distributed?

I have been using SCDF for a while, and realise the main diffs between XD and SCDF is XD is born to be distributed, but SCDF seems work like a platform for SC stream apps. At least local server works like so.
So my question is, is it possible that scdf local server being distributed? I see no trends on local server being distributed.
Any idea on this? thanks
As you might have known already, the local server is not intended for production deployments and for the development purpose only.
The local SCDF server is not intended to have distributed use case as there is no co-ordination service for the number of local servers running. But all the apps deployed by the local server run as separate processes.