Celery, Dynamically adding periodic task - celery

https://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html#entries mentions add_periodic_task
I'm not getting what is test.s and test.s('hello') and not just test('hello')
from celery import Celery
from celery.schedules import crontab
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls test('hello') every 10 seconds.
sender.add_periodic_task(10.0, test.s('hello'), name='add every 10')
# Calls test('world') every 30 seconds
sender.add_periodic_task(30.0, test.s('world'), expires=10)
# Executes every Monday morning at 7:30 a.m.
sender.add_periodic_task(
crontab(hour=7, minute=30, day_of_week=1),
test.s('Happy Mondays!'),
)
#app.task
def test(arg):
print(arg)
and what would be the sender? I'd liket call add_periodic_task outside of #app.on_after_configure.connect

.s() is the task signature - think of it as placeholder for running a task - test('hello') would invoke your task immediately, which is not what you would want when you just want to instruct Celery to call the task periodically in setup_periodic_tasks.

Related

how to stop locust when specific number of users are spawned with -i command line option?

from locust import SequentialTaskSet, HttpUser, constant, task
import locust_plugins
class MySeqTask(SequentialTaskSet):
#task
def get_status(self):
self.client.get("/200")
print("Status of 200")
#task
def get_100_status(self):
self.client.get("/100")
print("Status of 100")
class MyLoadTest(HttpUser):
host = "https://http.cat"
tasks = [MySeqTask]
wait_time = constant(1)
Examples for locust-plugins command line options can be found here:
https://github.com/SvenskaSpel/locust-plugins/blob/master/examples/cmd_line_examples.sh
locust -u 5 -t 60 --headless -i 10
# Stop locust after 10 task iterations (this is an upper bound, so you can be sure no more than 10 of iterations will be done)
# Note that in a distributed run the parameter needs to be set on the workers, it is (currently) not distributed from master to worker.
You will run your locust file the same way as normal but add -i to each worker you run. It sounds to me like since it's per worker, you'll need to pre-calculate how many you want each worker to run. So if you have 10 workers and you want to stop after a total of 10000 task iterations, you'd probably do -i 1000 on each worker.

Locust: How to run task n times on n users and than stop locust run?

I have simple Locust script with one task with http request.
I want to run this task 100 times on 10 users and than stop run script.
Is there any simple way to do it. I know --run-time parameter but it only stop after the specified amount of time
Below my script:
from locust import HttpUser, task, between
class QuickstartUser(HttpUser):
wait_time = between(1, 2)
host = "https://allegro.pl"
#task(1)
def getHome(self):
self.client.get("/dzial/dom-i-ogrod", name = "Get Home and Garden")
Another option, provided by locust-plugins, is the -i parameter: https://github.com/SvenskaSpel/locust-plugins#command-line-options
It should be a little more reliable, as it explicitly calls runner.quit()
If you are not running distributed you can have a global counter and increment it in the task and once it reaches the desired count you can stop the runner, like :
from locust import HttpUser, task, between
counter=0
class QuickstartUser(HttpUser):
wait_time = between(1, 2)
host = "https://allegro.pl"
#task(1)
def getHome(self):
if counter == 100:
self.environment.runner.stop()
self.client.get("/dzial/dom-i-ogrod", name = "Get Home and Garden")
counter = counter + 1`
and if you are running distributed it is best you use some external counter to keep track of requests like redis or something like that.

Retrieving Celery task kwargs from task-failed / uuid

Main Issue
I'm testing how to handle certain task failure, for example handling a 'TimeLimitExceeded' exception which instantly kills the task and is not 'catchable' (Yes...I'm aware of the existence of 'SoftTimeLimit' but it doesn't fit my needs).
First Approach
This is my tasks.py (The worker runs with a --time-limit flag):
import logging
from celery import Celery
import time
app = Celery('tasks', broker='pyamqp://guest#localhost//')
def my_fail(task, exc, req_id, req_args, req_kwargs, einfo, *ext_args, **kwargs):
logger.info("args: %r", req_args)
logger.info("kw: %r", req_kwargs)
#app.task(on_failure=my_fail)
def sum(x, y, delay=0, **kw):
result = x+y
if result == 4:
raise Exception("Some Error")
time.sleep(delay)
return x+y
The main idea when a task fails, to be able to perform some handling based on the args/kwargs of the task
For example if I run sum.delay(3, 1, foo="bar") the Exception("Some Error") is raised and the following is logged:
[2019-06-30 17:21:45,120: INFO/Worker-1] args: (3, 1)
[2019-06-30 17:21:45,121: INFO/Worker-1] kw: {'foo': 'bar'}
[2019-06-30 17:21:45,122: ERROR/MainProcess] Task tasks.sum[9e9de032-1469-44e7-8932-4c490fcee2e3] raised unexpected: Exception('Some Error',)
Traceback (most recent call last):
File "/home/apernin/.virtualenvs/dr/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/apernin/.virtualenvs/dr/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/home/apernin/test/tasks.py", line 89, in sum
raise Exception("Some Error")
Exception: Some Error
Note the args/kwargs are printed by my on-failure handler.
Now if I run sum.delay(3, 2, delay=7) the TimeLimit is triggered
[2019-06-30 17:23:15,244: INFO/MainProcess] Received task: tasks.sum[8c81398b-4378-401d-a674-a3bd3418ccde]
[2019-06-30 17:23:21,070: ERROR/MainProcess] Task tasks.sum[8c81398b-4378-401d-a674-a3bd3418ccde] raised unexpected: TimeLimitExceeded(5.0,)
Traceback (most recent call last):
File "/home/apernin/.virtualenvs/dr/local/lib/python2.7/site-packages/billiard/pool.py", line 645, in on_hard_timeout
raise TimeLimitExceeded(job._timeout)
TimeLimitExceeded: TimeLimitExceeded(5.0,)
[2019-06-30 17:23:21,071: ERROR/MainProcess] Hard time limit (5.0s) exceeded for tasks.sum[8c81398b-4378-401d-a674-a3bd3418ccde]
[2019-06-30 17:23:21,629: ERROR/MainProcess] Process 'Worker-1' pid:15472 exited with 'signal 15 (SIGTERM)'
Note the args/kwargs are note printed, because of the on-failure handler not being excuted. This is somewhat to be expected due to the nature of Celery's Hard Time Limit.
Second Approach
My second approach is to use a event-listener.
from celery import Celery
def my_monitor(app):
state = app.events.State()
def announce_failed_tasks(event):
state.event(event)
# task name is sent only with -received event, and state
# will keep track of this for us.
task = state.tasks.get(event['uuid'])
with app.connection() as connection:
recv = app.events.Receiver(connection, handlers={
'task-failed': announce_failed_tasks,
})
recv.capture(limit=None, timeout=None, wakeup=True)
if __name__ == '__main__':
app = Celery(broker='amqp://guest#localhost//')
my_monitor(app)
The only info I was able to retrieve was the task uuid, I wasn't able to retrieve the name, args or kwargs of the task (the task object contains the attributes but are all None).
Question
Is there a way to either:
Make the on_failure handler in case of a Hard Time Limit?
Retrieve the tasks args/kwargs of a task with a task-failed event listener?
Thanks in advance
First, the timeout is handled by the Worker (the MainProcess) and it is not treated the same as failures happened INSIDE the task, such as exceptions being thrown, etc. This is why you see it as TimeLimitExceeded raised by the MainProcess in the log. So, unfortunately you can't rely on the same logic...
However, your second approach will prove useful in tracking down what is going on.
I have developed (in-house) a Celery monitoring tool that grabs all the events, and populates a database with them so that later we can do all sort of analytics (see average and worst running times for an example, frequency of failures, etc).
In order to grab the details you need from the data given by the task-failed event you also need to record (store it in some dictionary for an example) the task-received event data. This information contains args, task names, and all sort of useful information you may need. You relate them both by the task UUID.

How to load objects in memory and share across different executions of Celery worker?

I have setup celery + rabbitmq for on a 3 cluster machine. I have also created a task which generates a regular expression based on data from the file and uses the information to parse text. However, I would like that the process of reading the file is done only once per worker spawn and not on every execution of as task.
from celery import Celery
celery = Celery('tasks', broker='amqp://localhost//')
import re
#celery.task
def add(x, y):
return x + y
def get_regular_expression():
with open("text") as fp:
data = fp.readlines()
str_re = "|".join([x.split()[2] for x in data ])
return str_re
#celery.task
def analyse_json(tw):
str_re = get_regular_expression()
re.match(str_re,tw.text)
In the above code, I would like to open the file and read the output into the string only once per worker, and then the task analyse_json should just use the string.
Any help will be appreciated,
thanks,
Amit
Put the call to get_regular_expression at the module level:
str_re = get_regular_expression()
#celery.task
def analyse_json(tw):
re.match(str_re, tw.text)
It will only be called once, when the module is first imported.
Additionally, if you must have only one instance of your worker running at a time (for example CUDA), you have to use the -P solo option:
celery worker --pool solo
Works with celery 4.4.2.

Is it possible to use custom routes for celery's canvas primitives?

I have distinct Rabbit queues each dedicated to a special kind of order processing:
# tasks.py
#celery.task
def process_order_for_product_x(order_id):
pass # elided ...
#celery.task
def process_order_for_product_y(order_id):
pass # elided ...
# settings.py
CELERY_QUEUES = {
"black_hole": {
"binding_key": "black_hole",
"queue_arguments": {"x-ha-policy": "all"}
},
"product_x": {
"binding_key": "product_x",
"queue_arguments": {"x-ha-policy": "all"}
},
"product_y": {
"binding_key": "product_y",
"queue_arguments": {"x-ha-policy": "all"}
},
We have a policy of enforcing explicit routing by setting CELERY_DEFAULT_QUEUE = 'black_hole' and then never consuming from black_hole.
Each of these tasks may use celery's canvas primitives, like so:
# tasks.py
#celery.task
def process_order_for_product_x(order_id):
# These can run in parallel
stage_1_group = group(do_something.si(order_id),
do_something_else.si(order_id))
# These can run in parallel
another_group = group(do_something_at_end.si(order_id),
do_something_else_at_end.si(order_id))
# These run in a linear sequence
process_task = chain(
stage_1_group,
do_something_dependent_on_stage_1.si(order_id),
another_group)
process_task.apply_async()
Supposing I want specific uses of celery.group, celery.chord, celery.chord_unlock, and other canvas tasks to flow through the queue for its corresponding product, rather than getting trapped in a black_hole, is there a way to invoke each particular canvas task with either a custom task name or custom routing_key?
For reasons I won't go into I would prefer to not send all celery.* tasks to a catch-all celery_canvas queue, which is what I am doing in the meantime.
This method allows you to route Celery canvas tasks to the queue of a callback task.
It is possible to specify a custom class-based task router for Celery as described here.
Let's focus on the celery.chord_unlock task. Its signature is defined here.
def unlock_chord(self, group_id, callback, ...):
The second positional argument is the signature of the chord callback task.
Task signatures in Celery are basically dicts, so that gives us an opportunity to access task options, including the task queue name.
Here is an example:
class CeleryRouter(object):
def route_for_task(self, task, args=None, kwargs=None):
if task == 'celery.chord_unlock':
callback_signature = args[1]
options = callback_signature.get('options')
if options:
queue = options.get('queue')
if queue:
return {'queue': queue}
Add it to the Celery config:
CELERY_ROUTES = (CeleryRouter(),
I'm currently using Celery in my project. For some scenarios I need task to chain though different queues:
chain(get_staff.s(url), save_staff.s(dt, partner_id, url))()
Those two functions declared like so:
#task(queue='celery_gevent')
def get_staff(source_url):
#task # send to default queue
def save_staff(suggests, dt, partner, url):
btw, celery_gevent is handled by worker with gevent pool to make http requests.
This example, how you can specify queue implicitly. Also you can explicitly put task in a different queue by specifying additional params, like so:
In [1]: add.apply_async([4,5])
Out[1]: <AsyncResult: bda3dedd-c2c4-44db-be8e-6a97e718f8b0>
$ sudo rabbitmqctl list_queues
Listing queues ...
celery 1
...done.
In [2]: add.apply_async([4,5], queue='your_product')
Out[2]: <AsyncResult: 934f6161-298b-468b-9716-3da6fae58fa5>
$ sudo rabbitmqctl list_queues
Listing queues ...
celery 1
your_product 1
...done.
You can run whole canvas in custom queue:
process_task.apply_async(queue='your_queue')
Try to specify queue_name inside #task decorator. This should help.
Links:
http://docs.celeryproject.org/en/latest/reference/celery.app.task.html
http://docs.celeryproject.org/en/latest/_modules/celery/app/task.html#Task.apply_async