Configuring Quartz Jobs to run on specific machine - quartz-scheduler

I am using quartz 2.2.0. I want to configure IP address with each job and want to check that each job runs on the machine configured to it.
I can check it manually while starting the job on server startup, but do we have any configuration parameter to configure the job like this? so that I can reduce coding.

Related

How to run MirrorMaker 2.0 in production?

From the documentation, I see MirrorMaker 2.0 like this on the command line -
./bin/connect-mirror-maker.sh mm2.properties
In my case I can go to an EC2 instance and enter this command.
But what is the correct practice if this needs to run for many days. It is possible that EC2 instance can get terminated etc.
Therefore, I am trying to find out what are best practices if you need to run MirrorMaker 2.0 for a long period and want to make sure that it is able to stay up and running without any manual intervention.
You have many options, these include:
Add it as a service to systemd. Then you can specify that it should be started automatically and restarted on failure. systemd is very common now, but if you're not running systemd, there are many other process managers. https://superuser.com/a/687188/80826.
Running in a Docker container, where you can specify a restart policy.

Conditionally launch Spring Cloud Task on a specific node of Kubernetes cluster

I am building a data pipeline for batch processing. And I find that Spring Cloud Data Flow is a quite attractive framework to use. Without much knowledge in SCDF and Kubernetes, I am not sure whether it is possible to conditionally launch a Spring Cloud Task on a specific machine.
Suppose I have two physical servers that are for running the batch process (Server A and Server B). By default, I would like my Spring cloud task to be launched on Server A. If the Server A is shut down, the task should be deployed on server B. Can Kubernetes / SCDF handle this kind of mechanism? I am wondering whether the nodeselector is the thing that I should look into.
Yes, you can pass deployment.nodeSelector as a deployment property when launching the task.
The deployment.nodeSelector is a Kubernetes deployment property and hence, you need to pass something like this:
task launch mytask --properties "deployer.<taskAppName>.kubernetes.deployment.nodeSelector=foo1:bar1,foo2:bar2"
You can check the list of supported Kubernetes deployer properties here

How to set scheduler for Spring Batch jobs in Spring Cloud Data Flow?

I’m setting up a new Spring Batch Jobs and want to deploy it using SCDF. However, I have found that SCDF does not support scheduler feature in local framework.
I have 3 questions to ask you:
Can someone explain how scheduler of SCDF work?
Are there any ways to schedule 1 job using SCDF?
Can I use my local server as a Cloud Foundry? and how?
Yes, Spring Cloud Data Flow does not support scheduling on local platform. Please note that the local SCDF server is for development purposes only and by design, the scheduling support is intended to be relying on the platform. Hence, SCDF scheduling feature is supported on Cloud Foundry and Kubernetes using the CF and K8s schedulers.
1) Can s/o explain how scheduler of SCDF work?
sure, Similar to how the deployer is used for launching task/deploying the stream, there is an SPI for scheduling the tasks under spring-cloud-deployer project. The underlying scheduler implementations can implement this. Currently, we have CF and K8s scheduler implementations in spring-cloud-deployer-cloudfoundry and spring-cloud-deployer-kubernetes.
As a user, you can configure a scheduler for a task (batch) application (via SCDF Dashboard, shell etc.,). You can specify a cron expression to schedule the task. Once configured, the SCDF delegates the schedule request to the platform scheduler using the above-mentioned scheduler implementations. Once scheduled, it is the platform (PCF scheduler on CF, K8s scheduler on K8s) that takes care of the task using the schedule.
2) Are there any ways to schedule 1 job using SCDF?
Yes, based on the answer from 1
3) Can I use my local server as a cloud Foundry? and How?
To run SCDF on local pointing to the CF instance, you can set the necessary CF deployer properties and start the SCDF server instance. It is similar to how you configure multi platforms in SCDF server. You can find more documentation on this here.

AWS Fargate vs Batch vs ECS for a once a day batch process

I have a batch process, written in PHP and embedded in a Docker container. Basically, it loads data from several webservices, do some computation on data (during ~1h), and post computed data to an other webservice, then the container exit (with a return code of 0 if OK, 1 if failure somewhere on the process). During the process, some logs are written on STDOUT or STDERR. The batch must be triggered once a day.
I was wondering what is the best AWS service to use to schedule, execute, and monitor my batch process :
at the very begining, I used a EC2 machine with a crontab : no high-availibilty function here, so I decided to switch to a more PaaS approach.
then, I was using Elastic Beanstalk for Docker, with a non-functional Webserver (only to reply to the Healthcheck), and a Crontab inside the container to wake-up my batch command once a day. With autoscalling rule min=1 max=1, I have HA (if the container crash or if the VM crash, it is restarted by AWS)
but now, to be more efficient, I decided to move to some ECS service, and have an approach where I do not need to have EC2 instances awake 23/24 for nothing. So I tried Fargate.
with Fargate I defined my task (Fargate type, not the EC2 type), and configure everything on it.
I create a Cluster to run my task : I can run "by hand, one time" my task, so I know every settings are corrects.
Now, going deeper in Fargate, I want to have my task executed once a day.
It seems to work fine when I used the Scheduled Task feature of ECS : the container start on time, the process run, then the container stop. But CloudWatch is missing some metrics : CPUReservation and CPUUtilization are not reported. Also, there is no way to know if the batch quit with exit code 0 or 1 (all execution stopped with status "STOPPED"). So i Cant send a CloudWatch alarm if the container execution failed.
I use the "Services" feature of Fargate, but it cant handle a batch process, because the container is started every time it stops. This is normal, because the container do not have any daemon. There is no way to schedule a service. I want my container to be active only when it needs to work (once a day during at max 1h). But the missing metrics are correctly reported in CloudWatch.
Here are my questions : what are the best suitable AWS managed services to trigger a container once a day, let it run to do its task, and have reporting facility to track execution (CPU usage, batch duration), including alarm (SNS) when task failed ?
We had the same issue with identifying failed jobs. I propose you take a look into AWS Batch where logs for FAILED jobs are available in CloudWatch Logs; Take a look here.
One more thing you should consider is total cost of ownership of whatever solution you choose eventually. Fargate, in this regard, is quite expensive.
may be too late for your projects but still I thought it could benefit others.
Have you had a look at AWS Step Functions? It is possible to define a workflow and start tasks on ECS/Fargate (or jobs on EKS for that matter), wait for the results and raise alarms/send emails...

Jenkins trigger job by another which are running on offline node

Is there any way to do the following:
I have 2 jobs. One job on offline node has to trigger the second one. Are there any plugins in Jenkins that can do this. I know that TeamCity has a way of achieving this, but I think that Jenkins is more constrictive
When you configure your node, you can set Availability to Take this slave on-line when in demand and off-line when idle.
Set Usage as Leave this machine for tied jobs only
Finally, configure the job to be executed only on that node.
This way, when the job goes to queue and cannot execute (because the node is offline), Jenkins will try to bring this node online. After the job is finished, the node will go back to offline.
This of course relies on the fact that Jenkins is configured to be able to start this node.
One instance will always be turn on, on which the main job can be run. And have created the job which will look in DB and if in the DB no running instances, it will prepare one node. And the third job after running tests will clean up my environment.