I have my current application batches developed on Spring Batch framework. Now I need to forklift the same to a Kubernetes platform for cloud nativity. Please help me with the following queries.
How do I achieve auto scaling (HPA) for the spring batches.
Whether spring batch remote partitioning is the recommended approach for auto scaling in k8s
platform and any best practices for this approach. Like how to effectively scale-up and scale-down
etc.
What are the advantages of refactoring the current spring batch over Spring Cloud Task. Is this a
best practice for cloud compliance.
Thanks
UPDATE
while choosing spring batch remote partitioning - whether worker containers should be configured over k8s deployment(pods) or k8s jobs. Any recommended approach ?
do we have hpa/autoscaling for k8s jobs ?
spring batch remote partitioning over k8s platform which is better - using MessagingPartitionHandler+k8s jobs(worker queue pattern) or (DeployerPartitionHandler+KubernetesTaskLauncher) ?
do we have hpa/autoscaling for k8s jobs ?
No. There is a concept of Jobs Parallelism, but it is not quite the HPA.
If you have a continuous stream of background processing work to run (so work queue Jobs don't fit you), then consider running your background workers with a ReplicaSet instead, and consider running a background processing library such as https://github.com/resque/resque.
A ReplicaSet can also be a target for Horizontal Pod Autoscalers (HPA). That is, a ReplicaSet can be auto-scaled by an HPA.
Related
Did anyone use MuleSoft Batch process on Runtime Fabric on Azure/AWS? How was your experience with that implementation? Any best practices? I am trying to work on an example where we need to push 100Million messages to Cosmos and the solution is supposed to be deployed on RTF on Azure. Batch process supports persistent queues and I don't see any settings that can help configuring external queues for persistence as Pods may crash and the persistent files will be lost.
Are there any other alternatives for this other than batch job? If we use Parallel for each, that works as splitter and aggregator and it is not efficient.
Any suggestions are appreciated.
Persistent queues are a feature only for Anypoint Platforms CloudHub deployments and are not available for Anypoint Runtime Fabric. Even in CloudHub they are guaranteed to provide reliability of Mule batch processes (see this KB for more information). Assume that a crash will restart the worker or pod respectively and the batch queues and stores may be lost.
I am looking into a deploying a Flink job on Kubernetes. When looking through the documentations I'm having a hard time coming up with what the best practices are regarding how to deploy the job specifically when the job has to maintain state.
There are two main points regarding this job:
It is a streaming job dealing with unbounded data (never ending stream)
Keeps and uses state that needs to be maintained over different job versions
Currently, we are running on Hadoop. There it is quite easy when you want to deploy a new version of the job and keep state. The steps are: cancel the job with savepoint, then deploy a new job and point to that savepoint.
Kubernetes:
Based on the definitions, it seems that for our use case a Job Cluster is the best fit for the requirements. There will only be one job running on this cluster.
The issue with the Kubernetes setup is that the savepoint location needs to be added as an argument to the Deployment. In the case that a pod is taken offline, it will restart the application with the original savepoint in the Deployment. Specifically this will reset the Kafka offset to whenever the job was deployed and reprocess a lot of data.
In addition to that, how would i go about canceling a job with savepoint when running on a Job cluster from something like ci/cd? Would i need to create another deployer pod and use the rest api?
What is the best practice regarding deploying a stateful Flink job on kubernetes and upgrading it without losing the state?
Anyone here have experience about batch processing (e.g. spring batch) on kubernetes ? Is it good idea ? How to prevent batch processing process same data if we use kubernetes auto scaling feature ? Thank you.
Anyone here have experience about batch processing (e.g. spring batch) on kubernetes ? Is it good idea ?
For Spring Batch, we (the Spring Batch team) do have some experience on the matter which we share in the following talks:
Cloud Native Batch Processing on Kubernetes, by Michael Minella
Spring Batch on Kubernetes, by me.
Running batch jobs on kubernetes can be tricky:
pods may be re-scheduled by k8s on different nodes in the middle of processing
cron jobs might be triggered twice
etc
This requires additional non-trivial work on the developer's side to make sure the batch application is fault-tolerant (resilient to node failure, pod re-scheduling, etc) and safe against duplicate job execution in a clustered environment.
Spring Batch takes care of this additional work for you and can be a good choice to run batch workloads on k8s for several reasons:
Cost efficiency: Spring Batch jobs maintain their state in an external database, which makes it possible to restart them from the last save point in case of job/node failure or pod re-scheduling
Robustness: Safe against duplicate job executions thanks to a centralized job repository
Fault-tolerance: Retry/Skip failed items in case of transient errors like a call to a web service that might be temporarily down or being re-scheduled in a cloud environment
I wrote a blog post in which I explain all these aspects in details with code examples. You can find it here: Spring Batch on Kubernetes: Efficient batch processing at scale
How to prevent batch processing process same data if we use kubernetes auto scaling feature ?
Making each job process a different data set is the way to go (a job per file for example). But there are different patterns that you might be interested in, see Job Patterns from k8s docs.
I’m setting up a new Spring Batch Jobs and want to deploy it using SCDF. However, I have found that SCDF does not support scheduler feature in local framework.
I have 3 questions to ask you:
Can someone explain how scheduler of SCDF work?
Are there any ways to schedule 1 job using SCDF?
Can I use my local server as a Cloud Foundry? and how?
Yes, Spring Cloud Data Flow does not support scheduling on local platform. Please note that the local SCDF server is for development purposes only and by design, the scheduling support is intended to be relying on the platform. Hence, SCDF scheduling feature is supported on Cloud Foundry and Kubernetes using the CF and K8s schedulers.
1) Can s/o explain how scheduler of SCDF work?
sure, Similar to how the deployer is used for launching task/deploying the stream, there is an SPI for scheduling the tasks under spring-cloud-deployer project. The underlying scheduler implementations can implement this. Currently, we have CF and K8s scheduler implementations in spring-cloud-deployer-cloudfoundry and spring-cloud-deployer-kubernetes.
As a user, you can configure a scheduler for a task (batch) application (via SCDF Dashboard, shell etc.,). You can specify a cron expression to schedule the task. Once configured, the SCDF delegates the schedule request to the platform scheduler using the above-mentioned scheduler implementations. Once scheduled, it is the platform (PCF scheduler on CF, K8s scheduler on K8s) that takes care of the task using the schedule.
2) Are there any ways to schedule 1 job using SCDF?
Yes, based on the answer from 1
3) Can I use my local server as a cloud Foundry? and How?
To run SCDF on local pointing to the CF instance, you can set the necessary CF deployer properties and start the SCDF server instance. It is similar to how you configure multi platforms in SCDF server. You can find more documentation on this here.
Is it a good practice to setup Elasticsearch, logstash and kiban on 3 different servers, with each server having RAM of 8GB.
Or
Setup ELK on 1 single machine with higher memory of 16GB.
The machine needs to be highly available.
Can anyone suggest or share inputs
it depends on your task and situation. normally it is good practice to setup Elasticsearch, logstash and kiban on 3 different server. or if you data if more so you have to make a cluster of elastic search or may have more than one server of logstash .
filefeats will be on all the data(log) server .
there are an example of handling 25000 logs per secoung
https://engineering.viki.com/blog/2015/log-processing-at-scale-elk-cluster-at-25k-events-per-second/
Its slightly more complicated than explained here,
Any distributed component would try to offer features with sharded or partioned way. In a similar way the Elastic Search at ELK which is based out of Master Slave model and maintains the data at ES data nodes. This means one needs to set up a cluster of nodes for Elastic search itself for its various components such as ES Master, ES data and ES client.
The next level if the system is required at production grade which requires Multi master setup with minimum 3 master nodes.
This would be the beginning of ELK.
If one needs to run such a complex system in a limited resources, then Containerizing the ELK components and running them in a container orchestration framework is the recommended option. Kubernetes/Docker swarm are the options to run ELK cluster based on the dockerized instances of ELK. Again these orchestration frameworks also require multimaster setup , but that would be fair as one would have lot more components in a cloud environment and all of them could be controlled under these orchestration frameworks.