As expected an airflow DAG when unpaused runs the last schedule. For example, if I have an hourly DAG and I paused it at 2:50pm today and then restarted at 3:44pm, it automatically triggers the DAG with a run time of 3:00pm. Is there a way I can prevent the automatic triggering on unpausing a DAG. I am currently on Airflow 2.2.3. Thanks!
Related
We're running Airflow 1.10.12, with KubernetesExecutor and KubernetesPodOperator.
In the past few days, we’re seeing tasks getting stuck in queued state for a long time (to be honest, unless we restart the scheduler, it will remain stuck in that state), new tasks of the same DAG are getting scheduled properly.
The only thing that helps is either clearing it manually, or restarting the scheduler service
We usually see it happen when we run our E2E tests, which spawns ~20 DAG runs for everyone of our 3 DAGs, due to limited parallelism, some will be queued (which is fine by us)
These are our parallelism params in airflow.cfg
parallelism = 32
dag_concurrency = 16
max_active_runs_per_dag = 16
2 of our DAGs, overwrite the max_active_runs and set it to 10
Any idea what could be causing it?
I am creating multiple SlingJobs on the fly using org.apache.sling.commons.scheduler.Scheduler OSGi service in AEM.
i.e. scheduler.schedule(Runnable, ScheduleOptions);
I have requirement that these Sling Jobs be run only once, so I am using ScheduleOptions.AT(Date date,int times,long period) ScheduleOptions Docs
And passing times=1 as a parameter.
(Also what is period parameter ?)
The Job successfully runs only once.
My question is am I supposed to keep a track of this Job by name and UnSchedule it using Scheduler.unschedule(String jobName) after it has finished running ?
Will completed SlingJobs that are not UnScheduled, consume memory in the AEM server ?
Will these completed BUT unscheduled jobs cause my AEM server to slow down and later on require some purge activity as maintenance?
According to https://sling.apache.org/documentation/bundles/apache-sling-eventing-and-job-handling.html#scheduled-jobs
Internally the scheduled Jobs use the Commons Scheduler Service. But in addition they are persisted (by default below /var/eventing/scheduled-jobs) and survive therefore even server restarts. When the scheduled time is reached, the job is automatically added as regular Sling Job through the JobManager.
I had a problem with a scheduled jobs before(they were triggered on the daily basis). When the server was restarted scheduled jobs wasn't un-persisted and a new job doing the same action was scheduled(job was scheduled on #Activate method). As a result, I got several jobs doing the same action at the scheduled time, so I had to unschedule them in #Deactivate method.
You may make an experiment and make sure that there is no duplicated jobs under /var/eventing/scheduled-jobs
I've been running into an issue where I can successfully trigger a dag from airflow's rest api command(s) (https://airflow.apache.org/api.html); however, the dag INSTANCES do not run. I'm calling -> POST /api/experimental/dags/dag_id/dag_runs where dag_id is the dag I'm running. The only thing that happens is that the dag immediately returns success. I trigged the dag manually and I get running dag instances (see picture 2nd dag run). Note the 2nd DAG run fails - this should not affect the issue I am trying to fix.
DAG
Fixed the issue -> Had to deal with scheduler. I added 'depends_on_past': False, 'start_date': datetime(2019, 6, 1) and it got fixed
The dag runs created outside the scheduler still must occur after the start_date; if there are no existing runs already you might want to set the schedule to #once and the start_date to a past date for which you want to have the execution_date run. This will give you a successful run (once it completes) against which other manual runs can compare themselves for depends_on_past.
Why rundeck not launching scheduled spark jobs even if the previous job is still executing?
Rundeck is skipping the jobs set to launch during the execution of the previous job, then after the completion of its execution launch new job based on the schedule.
But I want to launch a scheduled job even if the previous job is executing.
Check your workflow strategy, here you have an explanation about that:
https://www.rundeck.com/blog/howto-controlling-how-and-where-rundeck-jobs-execute
You can design a workflow strategy based on "Parallel" to launch the jobs simultaneously on your node.
Example using the parallel strategy with a parent job.
Example jobs:
Job one, Job two and Parent Job (using parallel strategy).
When for whatever reasons I delete the pod running the Job that was started by a CronJob, I immediately see a new pod being created. It is only once I delete something like six times the backoffLimit number of pods, that new ones stop being created.
Of course, if I'm actively monitoring the process, I can delete the CronJob, but what if the Pod inside the job fails when I'm not looking? I would like it not to be recreated.
How can I stop the CronJob from persisting in creating new jobs (or pods?), and wait until the next scheduled time if the current job/pod failed? Is there something similar to Jobs' backoffLimit, but for CronJobs?
Set startingDeadlineSeconds to a large value or left unset (the default).
At the same time set .spec.concurrencyPolicy as Forbid and the CronJobs skips the new job run while previous created job is still running.
If startingDeadlineSeconds is set to a large value or left unset (the default) and if concurrencyPolicy is set to Forbid, the job will not be run if failed.
Concurrent policy field you can add to specification to defintion of your CronJob (.spec.concurrencyPolicy), but this is optional.
It specifies how to treat concurrent executions of a job that is created by this CronJob. The spec may specify only one of these three concurrency policies:
Allow (default) - The cron job allows concurrently running jobs
Forbid - The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn’t finished yet, the cron job skips the new job run
Replace - If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run
It is good to know that currency policy applies just to the jobs created by the same CronJob.
If there are multiple CronJobs, their respective jobs are always allowed to run concurrently.
A CronJob is counted as missed if it has failed to be created at its scheduled time. For example, If concurrencyPolicy is set to Forbid and a CronJob was attempted to be scheduled when there was a previous schedule still running, then it would count as missed.
For every CronJob, the CronJob controller checks how many schedules it missed in the duration from its last scheduled time until now. If there are more than 100 missed schedules, then it does not start the job and logs the error
More information you can find here: CronJobs and AutomatedTask.
I hope it helps.
CronJob creates a job by a "backoffLimit" with a default value (6) in your case, and restart policy by default is (Always)
Better to make backoffLimit > (0) and make restart policy = (Never) and increase startingDeadlineSeconds to be lower than or equal to your interval or you can customize it up on your request to control the run time of each CronJob run
Additionally, you may stop "concurrencyPolicy" >> (Forbid)