Is there any way to configure the custom trigger rules in airflow?
Find below a sample case of my problem statement:
Let's suppose I have 4 tasks with the below lineage(dependencies):
[Task1, Task2, Task3] >> Task4
Here I want to trigger Task4 only when Task1 failed and Task2 and Task3 get Succeed.
I am aware of the possible trigger_rule provided by airflow. But did not get any rule that can handle the above scenario. Is there any workaround or solution?
There is no existing rule to do that, but there are some simple solutions to achieve your goal:
Add an EmptyOperator (DummyOperator) task between Task1 and Task4 with rule all_failed:
Task1 >> Empty
[Empty, Task2, Task3] >> Task4
In this case Empty will not be triggered if Task1 is success.
Add a ShortCircuitOperator task before Task4 to check the state of the other tasks:
You can check the state of the 3 tasks with a simple python method, then decide if you want to trigger Task4 or not:
def get_state(task_id, **context):
return context["dag_run"].get_task_instance(task_id).state
ShortCircuitOperator(
task_id="check_logic",
python_callable=lambda: get_state("Task1") == "failed" and get_state("Task2") == "success" and get_state("Task3") == "success",
provide_context=True,
)
Related
I call an API and perform some actions based on the response.
let test = apiPublisher
.subscribe(...)
.receive(...)
.share()
test
.sink {
//do task1
}.store(...)
test
.sink {
//do task2
}.store(...)
test
.sink {
//do task3
}.store(...)
Now how can I execute the task1, task2, task3 one after another. I know I can have all the code in one sink block. For code readability I'm using the share() operator.
Code put in sinks need to be independent. If you want them to depend on each other (one should not start until the other finishes) then you can't put them in sinks.
You will have to put each task in its own Publisher. That way the system will know when each is finished and you can concat them.
test.task1
.append(test.task2)
.append(test.task3)
.sink { }
.store(...)
I'm assuming that each task needs something from test in order to perform its side effect. Also each task needs to emit a Void event before completing.
I want to launch 10 OS subprocess with asyncio. I can do that with gather for example and then I can find out at the end of the event loop, the status of each tasks. But I have to wait for the whole thing to finish. Even when each task run concurrently.
Is there a way to know that subprocess 1 already finished and react to that event, even before the other 9 tasks have completed?
I am working with Python >3.7 (3.8.6 and 3.9.1).
Maybe my question should be: Once that the event loop is running. Is there a way to find out the status of the tasks being running?
Or, the way it is expected that the task itself would do any after work after the await statement is completed but before returning and leaving the event loop.
I'll try that approach. In the meantime this is the code I am using for my basic testings:
Example of what I want:
import time
async def osrunner(cmd):
proc = await asyncio.create_subprocess_shell(
cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE)
stdout, stderr = await proc.communicate()
if stdout:
print(f'[stdout]\n{stdout.decode()}')
if stderr:
print(f'[stderr]\n{stderr.decode()}')
return True
async def main():
cmd00='sleep 35'
cmd01='sleep 15'
cmd02='sleep 25'
cmd03='sleep 5'
task0 = asyncio.create_task( osrunner(cmd00) )
task1 = asyncio.create_task( osrunner(cmd01) )
task2 = asyncio.create_task( osrunner(cmd02) )
task3 = asyncio.create_task( osrunner(cmd03) )
await task0
await task1
await task2
await task3
print(f"started main at {time.strftime('%X')}")
asyncio.run(main()) #<------------------I want to poll the status of the tasks and do something while the others are still unfinished
print(f"finished main at {time.strftime('%X')}")
For example i have the following class. How i can prevent execution of get_entity task if create_entity task was not executed?
class MyTaskSequence(TaskSequence):
#seq_task(1)
def create_entity(self):
self.round += 1
with self.client.post('/entities', json={}, catch_response=True) as resp:
if resp.status_code != HTTPStatus.CREATED:
resp.failure()
# how to stop other tasks for that run?
self.entity_id = resp.json()['data']['entity_id']
#seq_task(2)
def get_entity(self):
# It is being always executed,
# but it should not be run if create_entity task failed
resp = self.client.get(f'/entities/{self.entity_id}')
...
I found TaskSet.interrupt method in documentation, but does not allow to cancel root TaskSet. I tried to make parent TaskSet for my task sequence, so TaskSet.interrupt works.
class MyTaskSet(TaskSet):
tasks = {MyTaskSequence: 10}
But now i see that all results in ui are cleared after i call interrupt!
I just need to skip dependent tasks in this sequence. I need the results.
The easiest way to solve this is just to use a single #task with multiple requests inside it. Then, if a request fails just do a return after resp.failure()
Might self.interrupt() be what you are looking for?
See https://docs.locust.io/en/latest/writing-a-locustfile.html#interrupting-a-taskset for reference.
Why not using on_start(self): which runs once whenever a locust created, it can set a global which can be checked whether the locust executes the tasks
class MyTaskSequence(TaskSequence):
entity_created = false
def on_start(self):
self.round += 1
with self.client.post('/entities', json={}, catch_response=True) as resp:
if resp.status_code != HTTPStatus.CREATED:
self.entity_created = true
resp.failure()
self.entity_id = resp.json()['data']['entity_id']
#seq_task(2)
def get_entity(self):
if self.entity_created:
resp = self.client.get(f'/entities/{self.entity_id}')
...
My application A calls a celery task longtask in application B. However, longtask is registered in B but not in A, so A calls it by using send_task. I want a mechanism in A to check periodically if longtask is complete. How do I do it?
send_task returns an AsyncResult that contains the task id. You can use this id to periodically check on the result of longtask.
result = my_app.send_task('longtask', kwargs={})
task_id = result.id
# anywhere else in your code you can reuse the
# task_id to check the status
from celery.result import AsyncResult
import time
done = False
while not done:
result = AsyncResult(task_id)
current_status = result.status
if current_status == 'SUCCESS':
print('yay! we are done')
done = True
time.sleep(10)
I'm using Jenkins Rest API to build and schedule job.
The problem that i schedule one job for the Week-end but it execute it several times (Same job executed every minute).
For the rest of the week the job is executed only once, so if there any GUI options to empty the week-end job list ?
you can use the following groovy script to clean all ( or part of your queue ....)
this example delete all jobs that start with a specific branch name
import jenkins.model.*
def branchName = build.environment.get("GIT_BRANCH_NAME")
println "=========before clean the queue ... =="
def q = Jenkins.instance.queue
q.items.each {
println("${it.task.name}:")
}
q.items.findAll { it.task.name.startsWith(branchName) }.each { q.cancel(it.task) }
println "=========after clean the queue ... =="
q = Jenkins.instance.queue
q.items.each {
println("${it.task.name}:")
}