I'm a python starter, and I'm trying to write some data analysis programs. the program is like below:
import asyncio
import time
class Test:
def __init__(self, task):
self.task = task
time.sleep(5) # here's some other jobs...
print(f'{self.task = }')
async def main():
result = []
tasks = ['task1', 'task2', 'task3', 'task4', 'task5', 'task6', 'task7', 'task8', 'task9']
print(f"started at {time.strftime('%X')}")
# I have a program structure like this, can I use async?
# how to start init tasks at almost the same time?
for task in tasks:
result.append(Test(task))
print(f"finished at {time.strftime('%X')}")
asyncio.run(main())
I've tried some other way like multiprocessing, it works, code like below:
...
def main():
result = []
tasks = ['task1', 'task2', 'task3', 'task4', 'task5', 'task6', 'task7', 'task8', 'task9']
print(f"started at {time.strftime('%X')}")
# I have a program structure like this, can I use async?
# how to start init tasks at the same time?
p = Pool()
result = p.map(operation, [(task,) for task in tasks])
print(f"finished at {time.strftime('%X')}")
...
but I still want to learn some 'modern way' to do this. I've found a module named 'ray', it's new.
But could async do this? I'm still wondering...
If someone can give me some advice, thanks a lot.
Your example code won't necessarily benefit from async IO, because __init__ is not "awaitable". You might be able to benefit from async if your code were structured differently and has an appropriate bottleneck. For example, if we had:
class Task:
def __init__(self):
<some not io bound stuff>
<some io bound task>
We could re-structure this to:
class Task:
def __init__(self):
<some not io bound stuff>
async def prime(self):
await <some io bound task>
Then in your main loop you can initialise the tasks as you're doing, then run the slow prime step in your event loop.
My advice here though would be to resist doing this unless you know you definitely have a problem. Coroutines can be quite fiddly, so you should only do this if you need to do it!
Related
Within a class I have a have a asyncio loop with is created with run_until_complete and the argument return_when. The functions within tasks are running as long as the application is running
I would like to to create a test with Pytest that validate the following situation:
task asyncfoobar or asynctesting is finished for some reason
the loop should stop running and the finally statement will be called.
When testing the application for this situation, it works as expected.
A test is preferred to easily validate it keeps working correct.
Pytest is used for this. How can this be done?
Catching the log line at the finally statement?
Snippets of the code that need to be tested:
def asyncfoobar:
try:
....
finally:
return
def asynctesting(a,b):
while True:
....
await asyncio.sleep(10)
class Do_something:
def start
try:
self.loop = asyncio.new_event_loop()
asyncio.set_event_loop(self.loop)
self.tasks=[
self.loop.create_task(asyncfoobar(config)),
asyncio.get_event_loop().create_task(asynctesting(a, b), )
]
self.loop.run_until_complete(
asyncio.wait(self.tasks,
return_when=asyncio.FIRST_COMPLETED)
)
finally:
logging.info("We are stopping")
return
Background
I wanted to write a unit test for connect on the code excerpts below given the following assumptions:
When AsyncClient is instantiated, self.io.connected and
self.conn_established are False
At the first connect call,
we assert that self.io.connect is called (ideally,
self.io.connect should be mocked here)
When connect event is
emitted self.conn_established is set to True (since we mock
self.io.connect, then we'll need to mock the triggering of the
connect event)
class AsyncClient:
def __init__(self):
self.conn_established = False
self.io = socketio.AsyncClient()
#self.io.event
def connect():
self.connection_established = True
async def connect(self):
while not self.io.connected:
await self.io.connect(endpoint)
while not self.conn_established:
await asyncio.sleep(1)
What I had tried
I was able to write a mock for io.connect, but I'm stuck with triggering the socketio connect event:
#pytest.fixture
def async_client():
yield AsyncClient()
class AsyncMock(mock.MagicMock):
async def __call__(self, *args, **kwargs):
return super(AsyncMock, self).__call__(*args, **kwargs)
#pytest.mark.asyncio
async def test_connect(async_client):
def mock_successful_conn(*args, **kwargs):
async_client.io.connected = True
# how do I trigger the following?
async_client.io.trigger_event("connect")
# mock io.connect
async_client.io.connect = AsyncMock(spec=async_client.io.connect, side_effect=mock_successful_conn)
await async_client.connect()
Questions
How do I write the unit tests for the above?
Is there a way to trigger socketio events for testing purposes?
Thanks! Your help would be greatly appreciated.
ok, thanks to #Miguel's suggestion, I was able to mock the triggering of the event by calling the event handler function directly.
We can get them by accessing the client's handlers which are first indexed by namespace, and then indexed by the event handler's name.
So in my case, I can get the connect event handler by doing this:
connect_handler = tunnel_client.io.handlers["/"]["connect"]
connect_handler()
Is it possible to schedule a task in celery from another task?
I've got this Python script:
import logging
from celery import Celery
from datetime import datetime
logger = logging.getLogger(__name__)
app = Celery('app', backend='amqp://', broker='pyamqp://guest#localhost:5672//')
#app.task()
def add(x, y):
result = x + y
logger.info(f'Add: {x} + {y} = {result}')
return result
#app.task()
def setPeriodicTask():
#option 1
app.add_periodic_task(10, add.s(30, 1))
#option 2
app.conf.beat_schedule = {
'add-every-5-seconds': {
'task': 'app.add',
'schedule': 5.0,
'args': (now.hour, now.second)
}
}
logger.info(f'setPeriodicTask succeeded')
return 1
When I call the add task, it works OK.
If I call the setPeriodicTask task it does not throw any error but the add task is not scheduled.
I've tried both options, none is working:
add_periodic_task
modify the beat_schedule
If I add this code to my Python script (as I've seen in the celery docs):
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(5.0, add.s(10, 1))
I can see the add task running scheduled as expected. So celery and celery beat seem to be working fine.
But I want to enable/disable the task on demand.
Is it possible? And if so, what am I doing wrong?
In case someone else faces this issue.
I ended up using a database with a similar approach as the one mention in the django -celery-beat doc: django-celery-beat - Database-backed Periodic Tasks
Slightly off topic question here but I'm looking to mock the calls to the sender.add_periodic_task(5.0, add.s(10, 1)) and I can not find anything that would tell me where it is
I'm using #mock.patch("celery.add_periodic_task")
any ideas would be welcome
import pytest
from ddt import ddt, data, unpack
from unittest import mock
from django.test import TestCase
from myapp.celery import setup_periodic_tasks
class CeleryTasksTest(TestCase):
#mock.patch("celery.add_periodic_task")
#mock.patch("os.environ.get")
#data(
{"env_to_test":"local", "number_of_tasks":0},
{"env_to_test":"TEST", "number_of_tasks":0},
{"env_to_test":"STAGING", "number_of_tasks":5},
{"env_to_test":"LIVE", "number_of_tasks":1},
)
#unpack
def test_setup_periodic_tasks_per_environment(self, mock_os_environ_get, mock_add_periodic_task, env_to_test, number_of_tasks):
setup_periodic_tasks([])
mock_os_environ_get.return_value=env_to_test
self.assertEqual(mock_add_periodic_task.call_count, number_of_tasks)
For example i have the following class. How i can prevent execution of get_entity task if create_entity task was not executed?
class MyTaskSequence(TaskSequence):
#seq_task(1)
def create_entity(self):
self.round += 1
with self.client.post('/entities', json={}, catch_response=True) as resp:
if resp.status_code != HTTPStatus.CREATED:
resp.failure()
# how to stop other tasks for that run?
self.entity_id = resp.json()['data']['entity_id']
#seq_task(2)
def get_entity(self):
# It is being always executed,
# but it should not be run if create_entity task failed
resp = self.client.get(f'/entities/{self.entity_id}')
...
I found TaskSet.interrupt method in documentation, but does not allow to cancel root TaskSet. I tried to make parent TaskSet for my task sequence, so TaskSet.interrupt works.
class MyTaskSet(TaskSet):
tasks = {MyTaskSequence: 10}
But now i see that all results in ui are cleared after i call interrupt!
I just need to skip dependent tasks in this sequence. I need the results.
The easiest way to solve this is just to use a single #task with multiple requests inside it. Then, if a request fails just do a return after resp.failure()
Might self.interrupt() be what you are looking for?
See https://docs.locust.io/en/latest/writing-a-locustfile.html#interrupting-a-taskset for reference.
Why not using on_start(self): which runs once whenever a locust created, it can set a global which can be checked whether the locust executes the tasks
class MyTaskSequence(TaskSequence):
entity_created = false
def on_start(self):
self.round += 1
with self.client.post('/entities', json={}, catch_response=True) as resp:
if resp.status_code != HTTPStatus.CREATED:
self.entity_created = true
resp.failure()
self.entity_id = resp.json()['data']['entity_id']
#seq_task(2)
def get_entity(self):
if self.entity_created:
resp = self.client.get(f'/entities/{self.entity_id}')
...
I need following workflow for my celery tasks.
when taskA finishes with success I want to execute taskB.
I know there is signal #task_success but this returns only task's result, and I need access to parameters of previous task's arguments. So I decided for code like these:
#app.task
def taskA(arg):
# not cool, but... https://github.com/celery/celery/issues/3797
from shopify.tasks import taskA
taskA(arg)
#task_postrun.connect
def fetch_taskA_success_handler(sender=None, **kwargs):
from gcp.tasks import taskB
if kwargs.get('state') == 'SUCCESS':
taskB.apply_async((kwargs.get('args')[0], ))
The problem is the taskB seems to be executed in some endless loop many, many times instead only once.
This way it works correctly:
#app.task
def taskA(arg):
# not cool, but... https://github.com/celery/celery/issues/3797
# otherwise it won't added in periodic tasks
from shopify.tasks import taskA
return taskA(arg)
#task_postrun.connect
def taskA_success_handler(sender=None, state=None, **kwargs):
resource_name = kwargs.get('kwargs', {}).get('resource_name')
if resource_name and state == 'SUCCESS':
if sender.name == 'shopify.tasks.taskA':
from gcp.tasks import taskB
taskB.apply_async(kwargs={
'resource_name': resource_name
})
just for reference:
celery==4.1.0
Django==2.0
django-celery-beat==1.1.0
django-celery-results==1.0.1
flower==0.9.2
amqp==2.2.2
Python 3.6