how to stop locust when specific number of users are spawned with -i command line option? - locust

from locust import SequentialTaskSet, HttpUser, constant, task
import locust_plugins
class MySeqTask(SequentialTaskSet):
#task
def get_status(self):
self.client.get("/200")
print("Status of 200")
#task
def get_100_status(self):
self.client.get("/100")
print("Status of 100")
class MyLoadTest(HttpUser):
host = "https://http.cat"
tasks = [MySeqTask]
wait_time = constant(1)

Examples for locust-plugins command line options can be found here:
https://github.com/SvenskaSpel/locust-plugins/blob/master/examples/cmd_line_examples.sh
locust -u 5 -t 60 --headless -i 10
# Stop locust after 10 task iterations (this is an upper bound, so you can be sure no more than 10 of iterations will be done)
# Note that in a distributed run the parameter needs to be set on the workers, it is (currently) not distributed from master to worker.
You will run your locust file the same way as normal but add -i to each worker you run. It sounds to me like since it's per worker, you'll need to pre-calculate how many you want each worker to run. So if you have 10 workers and you want to stop after a total of 10000 task iterations, you'd probably do -i 1000 on each worker.

Related

How to get nodename of running celery worker?

I want to shut down celery workers specifically. I was using app.control.broadcast('shutdown'); however, this shutdown all the workers; therefore, I would like to pass the destination parameter.
When I run ps -ef | grep celery, I can see the --hostname on the process.
I know that the format is {CELERYD_NODES}{NODENAME_SEP}{hostname} from the utility function nodename
destination = ''.join(['celery', # CELERYD_NODES defined at /etc/default/newfies-celeryd
'#', # from celery.utils.__init__ import NODENAME_SEP
socket.gethostname()])
Is there a helper function which returns the nodename? I don't want to create it myself since I don't want to hardcode the value.
I am not sure if that's what you're looking for, but with control.inspect you can get info about the workers, for example:
app = Celery('app_name', broker=...)
app.control.inspect().stats() # statistics per worker
app.control.inspect().registered() # registered tasks per each worker
app.control.inspect().active() # active workers/tasks
so basically you can get the list of workers from each one of them:
app.control.inspect().stats().keys()
app.control.inspect().registered().keys()
app.control.inspect().active().keys()
for example:
>>> app.control.inspect().registered().keys()
dict_keys(['worker1#my-host-name', 'worker2#my-host-name', ..])

Locust: How to run task n times on n users and than stop locust run?

I have simple Locust script with one task with http request.
I want to run this task 100 times on 10 users and than stop run script.
Is there any simple way to do it. I know --run-time parameter but it only stop after the specified amount of time
Below my script:
from locust import HttpUser, task, between
class QuickstartUser(HttpUser):
wait_time = between(1, 2)
host = "https://allegro.pl"
#task(1)
def getHome(self):
self.client.get("/dzial/dom-i-ogrod", name = "Get Home and Garden")
Another option, provided by locust-plugins, is the -i parameter: https://github.com/SvenskaSpel/locust-plugins#command-line-options
It should be a little more reliable, as it explicitly calls runner.quit()
If you are not running distributed you can have a global counter and increment it in the task and once it reaches the desired count you can stop the runner, like :
from locust import HttpUser, task, between
counter=0
class QuickstartUser(HttpUser):
wait_time = between(1, 2)
host = "https://allegro.pl"
#task(1)
def getHome(self):
if counter == 100:
self.environment.runner.stop()
self.client.get("/dzial/dom-i-ogrod", name = "Get Home and Garden")
counter = counter + 1`
and if you are running distributed it is best you use some external counter to keep track of requests like redis or something like that.

Port 51347 seems to be used by another program

On running the sample code given in the dispy documentation
def compute(n):
import time, socket
time.sleep(n)
host = socket.gethostname()
return (host, n)
if name == 'main':
import dispy, random
cluster = dispy.JobCluster(compute)
jobs = []
for i in range(10):
# schedule execution of 'compute' on a node (running 'dispynode')
# with a parameter (random number in this case)
job = cluster.submit(random.randint(5,20))
job.id = i # optionally associate an ID to job (if needed later)
jobs.append(job)
# cluster.wait() # wait for all scheduled jobs to finish
for job in jobs:
host, n = job() # waits for job to finish and returns results
print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n))
# other fields of 'job' that may be useful:
# print(job.stdout, job.stderr, job.exception, job.ip_addr, job.start_time, job.end_time)
cluster.print_status()
I get the following output
2017-03-29 22:39:52 asyncoro - version 4.5.2 with epoll I/O notifier
2017-03-29 22:39:52 dispy - dispy client version: 4.7.3
2017-03-29 22:39:52 dispy - Port 51347 seems to be used by another program
And then nothing happens.
How to free the 51347 port?
If you are under Linux, run sudo netstat -tuanp | grep 51347 and take note of the pid using that port.
Then execute ps ax | grep <pid> to check which service/program is running with that pid.
Then execute kill <pid> to terminate the process using that port.
Please check which process is using the port before killing it just in case it is something that you should not kill.

How to load objects in memory and share across different executions of Celery worker?

I have setup celery + rabbitmq for on a 3 cluster machine. I have also created a task which generates a regular expression based on data from the file and uses the information to parse text. However, I would like that the process of reading the file is done only once per worker spawn and not on every execution of as task.
from celery import Celery
celery = Celery('tasks', broker='amqp://localhost//')
import re
#celery.task
def add(x, y):
return x + y
def get_regular_expression():
with open("text") as fp:
data = fp.readlines()
str_re = "|".join([x.split()[2] for x in data ])
return str_re
#celery.task
def analyse_json(tw):
str_re = get_regular_expression()
re.match(str_re,tw.text)
In the above code, I would like to open the file and read the output into the string only once per worker, and then the task analyse_json should just use the string.
Any help will be appreciated,
thanks,
Amit
Put the call to get_regular_expression at the module level:
str_re = get_regular_expression()
#celery.task
def analyse_json(tw):
re.match(str_re, tw.text)
It will only be called once, when the module is first imported.
Additionally, if you must have only one instance of your worker running at a time (for example CUDA), you have to use the -P solo option:
celery worker --pool solo
Works with celery 4.4.2.

Is it possible to use custom routes for celery's canvas primitives?

I have distinct Rabbit queues each dedicated to a special kind of order processing:
# tasks.py
#celery.task
def process_order_for_product_x(order_id):
pass # elided ...
#celery.task
def process_order_for_product_y(order_id):
pass # elided ...
# settings.py
CELERY_QUEUES = {
"black_hole": {
"binding_key": "black_hole",
"queue_arguments": {"x-ha-policy": "all"}
},
"product_x": {
"binding_key": "product_x",
"queue_arguments": {"x-ha-policy": "all"}
},
"product_y": {
"binding_key": "product_y",
"queue_arguments": {"x-ha-policy": "all"}
},
We have a policy of enforcing explicit routing by setting CELERY_DEFAULT_QUEUE = 'black_hole' and then never consuming from black_hole.
Each of these tasks may use celery's canvas primitives, like so:
# tasks.py
#celery.task
def process_order_for_product_x(order_id):
# These can run in parallel
stage_1_group = group(do_something.si(order_id),
do_something_else.si(order_id))
# These can run in parallel
another_group = group(do_something_at_end.si(order_id),
do_something_else_at_end.si(order_id))
# These run in a linear sequence
process_task = chain(
stage_1_group,
do_something_dependent_on_stage_1.si(order_id),
another_group)
process_task.apply_async()
Supposing I want specific uses of celery.group, celery.chord, celery.chord_unlock, and other canvas tasks to flow through the queue for its corresponding product, rather than getting trapped in a black_hole, is there a way to invoke each particular canvas task with either a custom task name or custom routing_key?
For reasons I won't go into I would prefer to not send all celery.* tasks to a catch-all celery_canvas queue, which is what I am doing in the meantime.
This method allows you to route Celery canvas tasks to the queue of a callback task.
It is possible to specify a custom class-based task router for Celery as described here.
Let's focus on the celery.chord_unlock task. Its signature is defined here.
def unlock_chord(self, group_id, callback, ...):
The second positional argument is the signature of the chord callback task.
Task signatures in Celery are basically dicts, so that gives us an opportunity to access task options, including the task queue name.
Here is an example:
class CeleryRouter(object):
def route_for_task(self, task, args=None, kwargs=None):
if task == 'celery.chord_unlock':
callback_signature = args[1]
options = callback_signature.get('options')
if options:
queue = options.get('queue')
if queue:
return {'queue': queue}
Add it to the Celery config:
CELERY_ROUTES = (CeleryRouter(),
I'm currently using Celery in my project. For some scenarios I need task to chain though different queues:
chain(get_staff.s(url), save_staff.s(dt, partner_id, url))()
Those two functions declared like so:
#task(queue='celery_gevent')
def get_staff(source_url):
#task # send to default queue
def save_staff(suggests, dt, partner, url):
btw, celery_gevent is handled by worker with gevent pool to make http requests.
This example, how you can specify queue implicitly. Also you can explicitly put task in a different queue by specifying additional params, like so:
In [1]: add.apply_async([4,5])
Out[1]: <AsyncResult: bda3dedd-c2c4-44db-be8e-6a97e718f8b0>
$ sudo rabbitmqctl list_queues
Listing queues ...
celery 1
...done.
In [2]: add.apply_async([4,5], queue='your_product')
Out[2]: <AsyncResult: 934f6161-298b-468b-9716-3da6fae58fa5>
$ sudo rabbitmqctl list_queues
Listing queues ...
celery 1
your_product 1
...done.
You can run whole canvas in custom queue:
process_task.apply_async(queue='your_queue')
Try to specify queue_name inside #task decorator. This should help.
Links:
http://docs.celeryproject.org/en/latest/reference/celery.app.task.html
http://docs.celeryproject.org/en/latest/_modules/celery/app/task.html#Task.apply_async