#task_postrun.connect signal in Celery and executing another task results in some infite loop of executions - celery

I need following workflow for my celery tasks.
when taskA finishes with success I want to execute taskB.
I know there is signal #task_success but this returns only task's result, and I need access to parameters of previous task's arguments. So I decided for code like these:
#app.task
def taskA(arg):
# not cool, but... https://github.com/celery/celery/issues/3797
from shopify.tasks import taskA
taskA(arg)
#task_postrun.connect
def fetch_taskA_success_handler(sender=None, **kwargs):
from gcp.tasks import taskB
if kwargs.get('state') == 'SUCCESS':
taskB.apply_async((kwargs.get('args')[0], ))
The problem is the taskB seems to be executed in some endless loop many, many times instead only once.

This way it works correctly:
#app.task
def taskA(arg):
# not cool, but... https://github.com/celery/celery/issues/3797
# otherwise it won't added in periodic tasks
from shopify.tasks import taskA
return taskA(arg)
#task_postrun.connect
def taskA_success_handler(sender=None, state=None, **kwargs):
resource_name = kwargs.get('kwargs', {}).get('resource_name')
if resource_name and state == 'SUCCESS':
if sender.name == 'shopify.tasks.taskA':
from gcp.tasks import taskB
taskB.apply_async(kwargs={
'resource_name': resource_name
})
just for reference:
celery==4.1.0
Django==2.0
django-celery-beat==1.1.0
django-celery-results==1.0.1
flower==0.9.2
amqp==2.2.2
Python 3.6

Related

Pytest asyncio validate working asyncio.wait( , return_when )

Within a class I have a have a asyncio loop with is created with run_until_complete and the argument return_when. The functions within tasks are running as long as the application is running
I would like to to create a test with Pytest that validate the following situation:
task asyncfoobar or asynctesting is finished for some reason
the loop should stop running and the finally statement will be called.
When testing the application for this situation, it works as expected.
A test is preferred to easily validate it keeps working correct.
Pytest is used for this. How can this be done?
Catching the log line at the finally statement?
Snippets of the code that need to be tested:
def asyncfoobar:
try:
....
finally:
return
def asynctesting(a,b):
while True:
....
await asyncio.sleep(10)
class Do_something:
def start
try:
self.loop = asyncio.new_event_loop()
asyncio.set_event_loop(self.loop)
self.tasks=[
self.loop.create_task(asyncfoobar(config)),
asyncio.get_event_loop().create_task(asynctesting(a, b), )
]
self.loop.run_until_complete(
asyncio.wait(self.tasks,
return_when=asyncio.FIRST_COMPLETED)
)
finally:
logging.info("We are stopping")
return

Celery Add new tasks dinamically

Is it possible to schedule a task in celery from another task?
I've got this Python script:
import logging
from celery import Celery
from datetime import datetime
logger = logging.getLogger(__name__)
app = Celery('app', backend='amqp://', broker='pyamqp://guest#localhost:5672//')
#app.task()
def add(x, y):
result = x + y
logger.info(f'Add: {x} + {y} = {result}')
return result
#app.task()
def setPeriodicTask():
#option 1
app.add_periodic_task(10, add.s(30, 1))
#option 2
app.conf.beat_schedule = {
'add-every-5-seconds': {
'task': 'app.add',
'schedule': 5.0,
'args': (now.hour, now.second)
}
}
logger.info(f'setPeriodicTask succeeded')
return 1
When I call the add task, it works OK.
If I call the setPeriodicTask task it does not throw any error but the add task is not scheduled.
I've tried both options, none is working:
add_periodic_task
modify the beat_schedule
If I add this code to my Python script (as I've seen in the celery docs):
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(5.0, add.s(10, 1))
I can see the add task running scheduled as expected. So celery and celery beat seem to be working fine.
But I want to enable/disable the task on demand.
Is it possible? And if so, what am I doing wrong?
In case someone else faces this issue.
I ended up using a database with a similar approach as the one mention in the django -celery-beat doc: django-celery-beat - Database-backed Periodic Tasks
Slightly off topic question here but I'm looking to mock the calls to the sender.add_periodic_task(5.0, add.s(10, 1)) and I can not find anything that would tell me where it is
I'm using #mock.patch("celery.add_periodic_task")
any ideas would be welcome
import pytest
from ddt import ddt, data, unpack
from unittest import mock
from django.test import TestCase
from myapp.celery import setup_periodic_tasks
class CeleryTasksTest(TestCase):
#mock.patch("celery.add_periodic_task")
#mock.patch("os.environ.get")
#data(
{"env_to_test":"local", "number_of_tasks":0},
{"env_to_test":"TEST", "number_of_tasks":0},
{"env_to_test":"STAGING", "number_of_tasks":5},
{"env_to_test":"LIVE", "number_of_tasks":1},
)
#unpack
def test_setup_periodic_tasks_per_environment(self, mock_os_environ_get, mock_add_periodic_task, env_to_test, number_of_tasks):
setup_periodic_tasks([])
mock_os_environ_get.return_value=env_to_test
self.assertEqual(mock_add_periodic_task.call_count, number_of_tasks)

If I launch a group of task to be executed in an event loop asyncio, can I react to the return of each separately?

I want to launch 10 OS subprocess with asyncio. I can do that with gather for example and then I can find out at the end of the event loop, the status of each tasks. But I have to wait for the whole thing to finish. Even when each task run concurrently.
Is there a way to know that subprocess 1 already finished and react to that event, even before the other 9 tasks have completed?
I am working with Python >3.7 (3.8.6 and 3.9.1).
Maybe my question should be: Once that the event loop is running. Is there a way to find out the status of the tasks being running?
Or, the way it is expected that the task itself would do any after work after the await statement is completed but before returning and leaving the event loop.
I'll try that approach. In the meantime this is the code I am using for my basic testings:
Example of what I want:
import time
async def osrunner(cmd):
proc = await asyncio.create_subprocess_shell(
cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE)
stdout, stderr = await proc.communicate()
if stdout:
print(f'[stdout]\n{stdout.decode()}')
if stderr:
print(f'[stderr]\n{stderr.decode()}')
return True
async def main():
cmd00='sleep 35'
cmd01='sleep 15'
cmd02='sleep 25'
cmd03='sleep 5'
task0 = asyncio.create_task( osrunner(cmd00) )
task1 = asyncio.create_task( osrunner(cmd01) )
task2 = asyncio.create_task( osrunner(cmd02) )
task3 = asyncio.create_task( osrunner(cmd03) )
await task0
await task1
await task2
await task3
print(f"started main at {time.strftime('%X')}")
asyncio.run(main()) #<------------------I want to poll the status of the tasks and do something while the others are still unfinished
print(f"finished main at {time.strftime('%X')}")

Is it possible to prevent execution of further tasks in locust TaskSequence if some task has failed?

For example i have the following class. How i can prevent execution of get_entity task if create_entity task was not executed?
class MyTaskSequence(TaskSequence):
#seq_task(1)
def create_entity(self):
self.round += 1
with self.client.post('/entities', json={}, catch_response=True) as resp:
if resp.status_code != HTTPStatus.CREATED:
resp.failure()
# how to stop other tasks for that run?
self.entity_id = resp.json()['data']['entity_id']
#seq_task(2)
def get_entity(self):
# It is being always executed,
# but it should not be run if create_entity task failed
resp = self.client.get(f'/entities/{self.entity_id}')
...
I found TaskSet.interrupt method in documentation, but does not allow to cancel root TaskSet. I tried to make parent TaskSet for my task sequence, so TaskSet.interrupt works.
class MyTaskSet(TaskSet):
tasks = {MyTaskSequence: 10}
But now i see that all results in ui are cleared after i call interrupt!
I just need to skip dependent tasks in this sequence. I need the results.
The easiest way to solve this is just to use a single #task with multiple requests inside it. Then, if a request fails just do a return after resp.failure()
Might self.interrupt() be what you are looking for?
See https://docs.locust.io/en/latest/writing-a-locustfile.html#interrupting-a-taskset for reference.
Why not using on_start(self): which runs once whenever a locust created, it can set a global which can be checked whether the locust executes the tasks
class MyTaskSequence(TaskSequence):
entity_created = false
def on_start(self):
self.round += 1
with self.client.post('/entities', json={}, catch_response=True) as resp:
if resp.status_code != HTTPStatus.CREATED:
self.entity_created = true
resp.failure()
self.entity_id = resp.json()['data']['entity_id']
#seq_task(2)
def get_entity(self):
if self.entity_created:
resp = self.client.get(f'/entities/{self.entity_id}')
...

Can I use async/await to accelerate init classes?

I'm a python starter, and I'm trying to write some data analysis programs. the program is like below:
import asyncio
import time
class Test:
def __init__(self, task):
self.task = task
time.sleep(5) # here's some other jobs...
print(f'{self.task = }')
async def main():
result = []
tasks = ['task1', 'task2', 'task3', 'task4', 'task5', 'task6', 'task7', 'task8', 'task9']
print(f"started at {time.strftime('%X')}")
# I have a program structure like this, can I use async?
# how to start init tasks at almost the same time?
for task in tasks:
result.append(Test(task))
print(f"finished at {time.strftime('%X')}")
asyncio.run(main())
I've tried some other way like multiprocessing, it works, code like below:
...
def main():
result = []
tasks = ['task1', 'task2', 'task3', 'task4', 'task5', 'task6', 'task7', 'task8', 'task9']
print(f"started at {time.strftime('%X')}")
# I have a program structure like this, can I use async?
# how to start init tasks at the same time?
p = Pool()
result = p.map(operation, [(task,) for task in tasks])
print(f"finished at {time.strftime('%X')}")
...
but I still want to learn some 'modern way' to do this. I've found a module named 'ray', it's new.
But could async do this? I'm still wondering...
If someone can give me some advice, thanks a lot.
Your example code won't necessarily benefit from async IO, because __init__ is not "awaitable". You might be able to benefit from async if your code were structured differently and has an appropriate bottleneck. For example, if we had:
class Task:
def __init__(self):
<some not io bound stuff>
<some io bound task>
We could re-structure this to:
class Task:
def __init__(self):
<some not io bound stuff>
async def prime(self):
await <some io bound task>
Then in your main loop you can initialise the tasks as you're doing, then run the slow prime step in your event loop.
My advice here though would be to resist doing this unless you know you definitely have a problem. Coroutines can be quite fiddly, so you should only do this if you need to do it!