is it possible to define Must Start Times for dependent jobs in AutoSys - scheduler

I'm looking for an option in autosys to send an alert if the dependant jobs don't start within 10 min of the avg run time, for example, the last 30 days. The dependant jobs, do not have a fixed starting condition. They might have run at varying times in the last 30 days based on the completion of starting jobs. Would it be possible in autosys, to dynamically set the must start time for the jobs which don't have starting time rather they are dependant on starting conditions?

I use something like this in my critical jobs. You can try them too, if you want :
must_start_times: "19:01"
days_of_week: mo,tu,we,th,fr,sa,su
start_times: "19:00"

Related

Is using "pg_sleep" in plpgsql Procedures/Functions while using multiple worker background processes concurrently bad practice?

I am running multiple background worker processes in my Postgres database. I am doing this using Pg_Cron extension. I cannot unfortunately use Pg_Timetables, as suggested by another user here.
Thus, I have 5 dependent "Jobs" that need 1 other independent Procedure/Function to execute and complete before they can start. I originally having my Cron jobs simply check every 30minutes or-so some "job_log" table I created to see if the independent Job completed (i.e. if yes, execute Procedure, if not, return out of Procedure and check at next Cron interval)
However, I believe I could simplify the way I am triggering/orchestrating all these Jobs/Procedures greatly if I utilize pg_sleep and start all the Jobs at one -time (so no more checking every 30minutes). I would be running these Jobs in the night time concurrently so I believe it shouldn't effect my actual traffic that much.
i.e.
WHILE some_variable != some_condition LOOP
PERFORM pg_sleep(1);
some_variable := some_value; -- update variable here
END LOOP;
My question is
Would starting all these Jobs at one time (i.e. Setting a concrete time in the Cron expression e.g. 15 18 * * *), and utilizing pg_sleep be bad practice/inefficient as I would be idling 5 background workers while 1 Job completes. The 1 job these are dependent on could take any amount of time to finish i.e. 15 min, 30 min, 1hr (should be < 1 hr though).
Or is better to simply just use a Cron expression to check every 5min or so if the main/independent Job is done, so my other Jobs that are dependent can then run?
Running two schedulers, one of them home-built, seems more complex than just running one scheduler that does 2 (or 6, 1+5, however you count it) different things. If your goal is to make things simpler, does your proposal really achieve that?
I wouldn't worry about 5 backends sleeping on pg_sleep at the same time. But you might worry about them holding back the xid horizon while they do so, which would make vacuuming and HOT pruning less effective. But if you already have one long-running task in one snapshot (the thing they are waiting for) then more of them aren't going to make the matter worse.

Airflow limit daily trigger

there is a "natural" ( I mean thought parameter) way to limit the number of triggering a dag (let say every 24 hours).
I don't want to schedule it, but some user can trigger the same dag multiple time, and for resources and others reason, I want it only once .
As I see "depends_on_past" depend only against the previous run, but it could be many time a day.
Thx
Not directly, but you could likely implement task_instance_mutation_hook in the first task of the DAG, it could then immediately fail the task if you check if it's been run several times the same day.
https://airflow.apache.org/docs/apache-airflow/stable/concepts/cluster-policies.html#task-instance-mutation

Run scheduler to execute jobs at an interval from the completion of the previous job

I need to create schedulers to execute jobs(class files) at specified intervals..For Now, I'm using Quartz Scheduler which triggers the jobs at defined intervals from the time of triggering of it.
For Eg: Consider I'm giving a cron expression to run for every one hour starting at morning 9.My first run will be at 9 and my second run will be at 10 and so on.
If my job is taking 20 minutes to execute then in that case this method is not that much efficient.
What I need to do is to schedule a job for every one hour from the completion time of the previously ran job
For Eg: Consider my job to run every one hour is triggered at 9 and for the first run it took 20 minutes to run, so for the next time the job should trigger only at 10:20 instead of 10 (ie., one hour from the completion of previous ran job)
I need to know whether there are any methods in Quartz Scheduling to achieve this or any other logic I need to do.
If anyone could help me out on this,it would be very helpful for me.
You can easily achieve this by job-chaining your job executions. There are various approaches you can choose from:
(1) Implement a Quartz JobListener and in its jobWasExecuted method, that is invoked by Quartz whenever a job finishes executing, re-fire your job.
(2) Look at the Quartz JobChainingJobListener that you can use to implement simple job chaining scenarios. Please note that the functionality of this listener is very limited as it does not allow you to insert delays between job executions, there is no support for conditions that must be met before target jobs are executed etc. But you can use it as a good starting point to implement (1).
(3) Use QuartzDesk (our commercial product) or any other product that allows you to create job chains while externalizing and managing all job dependencies outside of your application. A job chain can have multiple target jobs that can be executed immediately, with a fixed delay or at arbitrary time in the future produced by a JavaScript expression. It also allows you to implement somewhat more sophisticated works flows, such as firing a target job when multiple source jobs complete their execution etc. I am attaching screenshots showing you what a simple job chain that re-executes Job1 with a 1 minute delay upon Job1's completion (with any job execution status) looks like:

Is this an intelligent use case for optaPlanner?

I'm trying to clean up an enterprise BI system that currently is using a prioritized FIFO scheduling algorithm (so a priority 4 report from Tuesday will be executed before priority 4 reports from Thursday and priority 3 reports from Monday.) Additional details:
The queue is never empty, jobs are always being added
Jobs range in execution time from under a minute to upwards of 24 hours
There are 40 some odd identical app servers used to execute jobs
I think I could get optaPlanner up and running for this scenario, with hard rules around priority and some soft rules around average time in the queue. I'm new to scheduling optimization so I guess my question is what should I be looking for in this situation to decide if optaPlanner is going to help me or not?
The problem looks like a form of bin packing (and possibly job shop scheduling), which are NP-complete, so OptaPlanner will do better than a FIFO algorithm.
But is it really NP-complete? If all of these conditions are met, it might not be:
All 40 servers are identical. So running a priority report on server A instead of server B won't deliver a report faster.
All 40 servers are identical. So total duration (for a specific input set) is a constant.
Total makespan doesn't matter. So given 20 small jobs of 1 hour and 1 big job of 20 hours and 2 machines, it's fine that it takes all small jobs are done after 10 hours before the big job starts, given a total makespan of 30 hours. There's no desire to reduce the makespan to 20 hours.
"the average time in the queue" is debatable: do you care about how long the jobs are in the queue until they are started or until they are finished? If the total duration is a constant, this can be done by merely FIFO'ing the small jobs first or last (while still respecting priority of course).
There are no dependencies between jobs.
If all these conditions are met, OptaPlanner won't be able to do better than a correctly written greedy algorithm (which schedules the highest priority job that is the smallest/largest first). If any of these conditions aren't met (for example you buy 10 new servers which are faster), then OptaPlanner can do better. You just have to evaluate if it's worth spending 1 thread to figure that out.
If you use OptaPlanner, definitely take a look at real-time scheduling and daemon mode, to replan as new reports enter the system.

Use beanstalkd, for periodic tasks, how to always make a job replaced by its latest one?

I am trying to use beanstalk for queuing a large number of periodic
tasks (for example, tasks need processed every N minutes), for each
task, if the last queued job is not completed (not reserved, i mean)
when current job to be added, the last queued job should be replaced
with current job, in other words, only the latest queued job of a task
should be processed.
how can i achieve that using beanstalk?
Ideas i have got right now is:
for each task, use memcached store its latest timestamps (set this
when add jobs to queue),
every time the worker reserved a job successfully, it first checks
timestamps for this task in memcached,
if timestamps of the job is same as timestamps in memcached, then
process this job,
otherwise skip this job, and delete it from the queue.
So is there better ways to do such work? please give your suggestions,
thanks.
I found a memcache/beanstalk combination also the best solution for an implementation where I didnt want a newer but identical job entering a queue.
Until 'named jobs' are done and the software released, that may be one of the better solutions.