In my case, i write a dag file under the dags path. After start the airflow scheduler, it successfully loads the dag file. However, it fialed to load dag file after changing dag file. Is there any suggestion to loade dag file without restart scheduler?
Your DAG should be automatically reloaded on the scheduler heartbeat and this will always be done before a DagRun is started.
It can take a while before changes in the DAG are also shown in the web interface, you can manually reload the Dag in the UI by pressing the refresh button in the top right of the Dag view
Related
As expected an airflow DAG when unpaused runs the last schedule. For example, if I have an hourly DAG and I paused it at 2:50pm today and then restarted at 3:44pm, it automatically triggers the DAG with a run time of 3:00pm. Is there a way I can prevent the automatic triggering on unpausing a DAG. I am currently on Airflow 2.2.3. Thanks!
I've been running into an issue where I can successfully trigger a dag from airflow's rest api command(s) (https://airflow.apache.org/api.html); however, the dag INSTANCES do not run. I'm calling -> POST /api/experimental/dags/dag_id/dag_runs where dag_id is the dag I'm running. The only thing that happens is that the dag immediately returns success. I trigged the dag manually and I get running dag instances (see picture 2nd dag run). Note the 2nd DAG run fails - this should not affect the issue I am trying to fix.
DAG
Fixed the issue -> Had to deal with scheduler. I added 'depends_on_past': False, 'start_date': datetime(2019, 6, 1) and it got fixed
The dag runs created outside the scheduler still must occur after the start_date; if there are no existing runs already you might want to set the schedule to #once and the start_date to a past date for which you want to have the execution_date run. This will give you a successful run (once it completes) against which other manual runs can compare themselves for depends_on_past.
I am using Quartz scheduler to execute jobs. But while trying to schedule jobs for a future time, the jobs get triggered at the right time and immediately goes to failed state without displaying anything in the scheduler logs.
Could not find the root cause for this. But the issue was solved by pointing to a freshly created Quartz database.
The reason could be that the database might have got corrupted in some way.
I'm trying to solve integration between Control-M scheduler and batch jobs running within spring-xd.
In our existing environment, Control-M agents run on the host and batch jobs are triggered via bash script from Control-M.
In the spring-xd architecture a batch job is pushed out into the XD container cluster and will run on an available container. This means however I don't know what XD container the job will run on. I could pin it to a single container with a deployment manifest, but that goes against the whole point of the cluster.
One potential solution.
Run a VM outside the XD container cluster with the Control-M agent and trigger jobs through the XD API via a bash script. The script would need to wait for the job to complete, by either polling for the job completion via the XD API or wait for an event to signal the completion.
Thinking further ahead this could be a solution to triggering batch jobs deployed in PCF.
In a previous life, I had the enterprise scheduler use Perl scripts to interact with the old Spring Batch Admin REST API to start jobs and poll for completion.
So, yes, the same technique should work fine with XD.
You can also tap into the job events.
I will schedule some job first for file creation under the project. (Refering "On the Job: The Eclipse Jobs API" article example.)
Scheduling rule used is job.setRule(ResourcesPlugin.getWorkspace().getRoot()).This means job will acquire a lock on workspace root itself. And any other operations which I perform like "delete" or File menu "new" project creation will go to waiting state.
But why eclipse "delete" or File "new" operation will block my entire UI, wheraes the jobs which I created will only goes to waiting state when I acquire lock on workspace root?
Can I able to implement my own "delete" operation where like any other jobs will go to waiting state but not block the UI when some other job is already running and given the same scheduling rule?
The File New and Delete operations don't use the Jobs API but they do wait for access to the workspace so they can block the UI until it is available.
You could write New and Delete operations that use the Jobs API so that the operations run in the background.