How to launch scheduled spark jobs even if previous jobs are still executing on rundeck? - scala

Why rundeck not launching scheduled spark jobs even if the previous job is still executing?
Rundeck is skipping the jobs set to launch during the execution of the previous job, then after the completion of its execution launch new job based on the schedule.
But I want to launch a scheduled job even if the previous job is executing.

Check your workflow strategy, here you have an explanation about that:
https://www.rundeck.com/blog/howto-controlling-how-and-where-rundeck-jobs-execute
You can design a workflow strategy based on "Parallel" to launch the jobs simultaneously on your node.
Example using the parallel strategy with a parent job.
Example jobs:
Job one, Job two and Parent Job (using parallel strategy).

Related

ADF Scheduling when existing Job not yet finished

Having read https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-scheduling-and-execution, it is unclear to me if:
A schedule is made every hr for a job to run,
can we stop the concurrent execution of the next job at hr+1 if the job for hr+0 is still running?
It looks if concurrency = 1 means this,
But is that invocation simply not start until concurrent execution is finished?
Or will it be discarded?
When we set the concurrency 1, only one instance will be allowed to run at a time. When the scheduled trigger runs again and tries to run the pipeline, If the pipeline is already running, the next invocation will be queued. It will start after finishing the current instance.
For your question, the following invocation will be queued. After the first run finishes, the next run will start.

How to queue multiple runs of same azure pipeline on one agent

My pipeline triggers on resources, schedule and merges. Sometimes these can happen almost at the same time and many pipeline runs can be created. I've noticed that the jobs that run don't always belong to the same run.
Example
one pipeline A includes 2 jobs j.1 and j.2
a resource triggers A.1 and starts j.1
another resource triggers A.2 also and queues j.1.
A.1 finishes a job and instead of starting j.2 it is A.2 j.1 that starts.
How do I lock the run so that A.1 j.1 and j.2 runs to completion before A.2 starts?
On the agent, the queue is for the job-level not pipeline-level. So, normally the agent will be allocate to the higher priority jobs in the pipelines regardless of whether the jobs are in the same pipeline run.
Currently, we have not method or settings to manager the sort of the queued jobs.

Kubernetes concurrencyPolicy Forbid not preventing concurrent jobs

I have a backup job running, scheduled to run every 24 hours. I have the concurrency policy set to "Forbid." I am testing my backup, and I create jobs manually for testing, but these tests are not forbidding concurrent runs. I use:
kubectl create job --from=cronjob/my-backup manual-backup-(timestamp)
... and when I run them twice in close succession, I find that both begin the work.
Does the concurrency policy only apply to jobs created by the Cron job scheduler? Does it ignore manually-created jobs? If it is ignoring those, are there other ways to manually run the job such that the Cron job scheduler knows they are there?
...Does the concurrency policy only apply to jobs created by the Cron job scheduler?
concurrencyPolicy applies to CronJob as it influences how CronJob start job. It is part of CronJob spec and not the Job spec.
...Does it ignore manually-created jobs?
Yes.
...ways to manually run the job such that the Cron job scheduler knows they are there?
Beware that when concurrencyPolicy is set to Forbid and when the time has come for CronJob to run job; but it detected there is job belongs to this CronJob is running; it will count the current attempt as missed. It is better to temporary set the CronJob spec.suspend to true if you manually start a job base out of the CronJob and the execution time will span over the next schedule time.

Should we unschedule Sling Jobs running within AEM after they are completed?

I am creating multiple SlingJobs on the fly using org.apache.sling.commons.scheduler.Scheduler OSGi service in AEM.
i.e. scheduler.schedule(Runnable, ScheduleOptions);
I have requirement that these Sling Jobs be run only once, so I am using ScheduleOptions.AT(Date date,int times,long period) ScheduleOptions Docs
And passing times=1 as a parameter.
(Also what is period parameter ?)
The Job successfully runs only once.
My question is am I supposed to keep a track of this Job by name and UnSchedule it using Scheduler.unschedule(String jobName) after it has finished running ?
Will completed SlingJobs that are not UnScheduled, consume memory in the AEM server ?
Will these completed BUT unscheduled jobs cause my AEM server to slow down and later on require some purge activity as maintenance?
According to https://sling.apache.org/documentation/bundles/apache-sling-eventing-and-job-handling.html#scheduled-jobs
Internally the scheduled Jobs use the Commons Scheduler Service. But in addition they are persisted (by default below /var/eventing/scheduled-jobs) and survive therefore even server restarts. When the scheduled time is reached, the job is automatically added as regular Sling Job through the JobManager.
I had a problem with a scheduled jobs before(they were triggered on the daily basis). When the server was restarted scheduled jobs wasn't un-persisted and a new job doing the same action was scheduled(job was scheduled on #Activate method). As a result, I got several jobs doing the same action at the scheduled time, so I had to unschedule them in #Deactivate method.
You may make an experiment and make sure that there is no duplicated jobs under /var/eventing/scheduled-jobs

Does rundeck support jobs dependencies?

I've been searching for days on how to layout a rundeck workflow with job dependencies. what I need to do is to have 3 jobs: job-1 and job-2 are scheduled to run in parallel while job-3 will only be triggered after the completion of both job-1, and job-2. assuming that job-1 and job-2 have different execution times.
I tried using job state conditionals to do that but it seems that the condition if not met will halt or fail only. My idea is to halt the execution until all the parent jobs completes and then resume the workflow.
You can achieve this by compiling a master job which includes 2 steps:
step: job-1 and job-2 as a sub-job which includes both (run in parallel if node oriented execution is selected)
step: job-3
But not all 3 in in the same flow.
Right now you can use Job State Conditional feature for that: https://docs.rundeck.com/2.9.4/plugins-user-guide/bundled-plugins.html#job-state-plugin
Rundeck cannot do this for you automatically. You can set a scheduler for job-3 to run after the max timestamp of job1 or job2. Enable "retry" for job3 incase the dependencies would be fail.