How to stop Locust Load Test after all users complete their task? - locust

In locust documentation, we can only stop task using self.interrupt() but it will move to parent class. It will not stop load test. I want to stop complete load test after all users login and complete their task
Locust Version: 1.1
class RegisteredUser(User):
#task
class Forum(TaskSet):
#task(5)
def view_thread(self):
pass
#task(1)
def stop(self):
self.interrupt()
#task
def frontpage(self):
pass

You can call self.environment.runner.quit() to stop the whole run.
More info: https://docs.locust.io/en/stable/writing-a-locustfile.html#environment-attribute

My framework creates a list of tuples of the credentials and support variables for every user. I have stored all my user credentials, tokens, support file names etc in those tuples as part of list. (Actually that is automatically done before starting locust)
I import that list in locustfile
# creds is created before running locust file and can be stored outside or part of locust # file
creds = [('demo_user1', 'pass1', 'lnla'),
('demo_user2', 'pass2', 'taam9'),
('demo_user3', 'pass3', 'wevee'),
('demo_user4', 'pass4', 'avwew')]
class RegisteredUser(SequentialTaskSet)
def on_start(self):
self.credentials = creds.pop()
#task
def task_one_name(self):
task_one_commands
#task
def task_two_name(self):
task_two_commands
#task
def stop(self):
if len(creds) == 0:
self.user.environment.reached_end = True
self.user.environment.runner.quit()
class ApiUser(HttpUser):
tasks = [RegisteredUser]
host = 'hosturl'
I use self.credentials in tasks
I created stop function in my class
Also, observe that RegisteredUser is inherited from SequentialTaskSet to run all tasks in sequence.

Related

How to deploy a Google dataflow worker with a file loaded into memory?

I am trying to deploy Google Dataflow streaming for use in my machine learning streaming pipeline, but cannot seem to deploy the worker with a file already loaded into memory. Currently, I have setup the job to pull a pickle file from a GCS bucket, load it into memory, and use it for model prediction. But this is executed on every cycle of the job, i.e. pull from GCS every time a new object enters the dataflow pipeline - meaning that the current execution of the pipeline is much slower than it needs to be.
What I really need, is a way to allocate a variable within the worker nodes on setup of each worker. Then use that variable within the pipeline, without having to re-load on every execution of the pipeline.
Is there a way to do this step before the job is deployed, something like
with open('model.pkl', 'rb') as file:
pickle_model = pickle.load(file)
But within my setup.py file?
##### based on - https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/setup.py
"""Setup.py module for the workflow's worker utilities.
All the workflow related code is gathered in a package that will be built as a
source distribution, staged in the staging area for the workflow being run and
then installed in the workers when they start running.
This behavior is triggered by specifying the --setup_file command line option
when running the workflow for remote execution.
"""
# pytype: skip-file
from __future__ import absolute_import
from __future__ import print_function
import subprocess
from distutils.command.build import build as _build # type: ignore
import setuptools
# This class handles the pip install mechanism.
class build(_build): # pylint: disable=invalid-name
"""A build command class that will be invoked during package install.
The package built using the current setup.py will be staged and later
installed in the worker using `pip install package'. This class will be
instantiated during install for this specific scenario and will trigger
running the custom commands specified.
"""
sub_commands = _build.sub_commands + [('CustomCommands', None)]
CUSTOM_COMMANDS = [['pip', 'install', 'scikit-learn==0.23.1']]
CUSTOM_COMMANDS = [['pip', 'install', 'google-cloud-storage']]
CUSTOM_COMMANDS = [['pip', 'install', 'mlxtend']]
class CustomCommands(setuptools.Command):
"""A setuptools Command class able to run arbitrary commands."""
def initialize_options(self):
pass
def finalize_options(self):
pass
def RunCustomCommand(self, command_list):
print('Running command: %s' % command_list)
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
# Can use communicate(input='y\n'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print('Command output: %s' % stdout_data)
if p.returncode != 0:
raise RuntimeError(
'Command %s failed: exit code: %s' % (command_list, p.returncode))
def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)
REQUIRED_PACKAGES = [
'google-cloud-storage',
'mlxtend',
'scikit-learn==0.23.1',
]
setuptools.setup(
name='ML pipeline',
version='0.0.1',
description='ML set workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
cmdclass={
'build': build,
'CustomCommands': CustomCommands,
})
Snippet of current ML load mechanism:
class MlModel(beam.DoFn):
def __init__(self):
self._model = None
from google.cloud import storage
import pandas as pd
import pickle as pkl
self._storage = storage
self._pkl = pkl
self._pd = pd
def process(self,element):
if self._model is None:
bucket = self._storage.Client().get_bucket(myBucket)
blob = bucket.get_blob(myBlob)
self._model = self._pkl.loads(blob.download_as_string())
new_df = self._pd.read_json(element, orient='records').iloc[:, 3:-1]
predict = self._model.predict(new_df)
df = self._pd.DataFrame(data=predict, columns=["A", "B"])
A = df.iloc[0]['A']
B = df.iloc[0]['B']
d = {'A':A, 'B':B}
return [d]
You can use the #Setup method in your MlModel DoFn method where you can load your model and then use it in your #Process method. The #Setup method is called once per worker initialization.
I had written a similar answer here
HTH

pytest implementing a logfile per test method

I would like to create a separate log file for each test method. And i would like to do this in the conftest.py file and pass the logfile instance to the test method. This way, whenever i log something in a test method it would log to a separate log file and will be very easy to analyse.
I tried the following.
Inside conftest.py file i added this:
logs_dir = pkg_resources.resource_filename("test_results", "logs")
def pytest_runtest_setup(item):
test_method_name = item.name
testpath = item.parent.name.strip('.py')
path = '%s/%s' % (logs_dir, testpath)
if not os.path.exists(path):
os.makedirs(path)
log = logger.make_logger(test_method_name, path) # Make logger takes care of creating the logfile and returns the python logging object.
The problem here is that pytest_runtest_setup does not have the ability to return anything to the test method. Atleast, i am not aware of it.
So, i thought of creating a fixture method inside the conftest.py file with scope="function" and call this fixture from the test methods. But, the fixture method does not know about the the Pytest.Item object. In case of pytest_runtest_setup method, it receives the item parameter and using that we are able to find out the test method name and test method path.
Please help!
I found this solution by researching further upon webh's answer. I tried to use pytest-logger but their file structure is very rigid and it was not really useful for me. I found this code working without any plugin. It is based on set_log_path, which is an experimental feature.
Pytest 6.1.1 and Python 3.8.4
# conftest.py
# Required modules
import pytest
from pathlib import Path
# Configure logging
#pytest.hookimpl(hookwrapper=True,tryfirst=True)
def pytest_runtest_setup(item):
config=item.config
logging_plugin=config.pluginmanager.get_plugin("logging-plugin")
filename=Path('pytest-logs', item._request.node.name+".log")
logging_plugin.set_log_path(str(filename))
yield
Notice that the use of Path can be substituted by os.path.join. Moreover, different tests can be set up in different folders and keep a record of all tests done historically by using a timestamp on the filename. One could use the following filename for example:
# conftest.py
# Required modules
import pytest
import datetime
from pathlib import Path
# Configure logging
#pytest.hookimpl(hookwrapper=True,tryfirst=True)
def pytest_runtest_setup(item):
...
filename=Path(
'pytest-logs',
item._request.node.name,
f"{datetime.datetime.now().strftime('%Y%m%dT%H%M%S')}.log"
)
...
Additionally, if one would like to modify the log format, one can change it in pytest configuration file as described in the documentation.
# pytest.ini
[pytest]
log_file_level = INFO
log_file_format = %(name)s [%(levelname)s]: %(message)
My first stackoverflow answer!
I found the answer i was looking for.
I was able to achieve it using the function scoped fixture like this:
#pytest.fixture(scope="function")
def log(request):
test_path = request.node.parent.name.strip(".py")
test_name = request.node.name
node_id = request.node.nodeid
log_file_path = '%s/%s' % (logs_dir, test_path)
if not os.path.exists(log_file_path):
os.makedirs(log_file_path)
logger_obj = logger.make_logger(test_name, log_file_path, node_id)
yield logger_obj
handlers = logger_obj.handlers
for handler in handlers:
handler.close()
logger_obj.removeHandler(handler)
In newer pytest version this can be achieved with set_log_path.
#pytest.fixture
def manage_logs(request, autouse=True):
"""Set log file name same as test name"""
request.config.pluginmanager.get_plugin("logging-plugin")\
.set_log_path(os.path.join('log', request.node.name + '.log'))

how to pass custom parameters to a locust test class?

I'm currently passing custom parameters to my load test using environment variables. For example, my test class looks like this:
from locust import HttpLocust, TaskSet, task
import os
class UserBehavior(TaskSet):
#task(1)
def login(self):
test_dir = os.environ['BASE_DIR']
auth=tuple(open(test_dir + '/PASSWORD').read().rstrip().split(':'))
self.client.request(
'GET',
'/myendpoint',
auth=auth
)
class WebsiteUser(HttpLocust):
task_set = UserBehavior
Then I'm running my test with:
locust -H https://myserver --no-web --clients=500 --hatch-rate=500 --num-request=15000 --print-stats --only-summary
Is there a more locust way that I can pass custom parameters to the locust command line application?
You could use like env <parameter>=<value> locust <options> and use <parameter> inside the locust script to use its value
E.g.,
env IP_ADDRESS=100.0.1.1 locust -f locust-file.py --no-web --clients=5 --hatch-rate=1 --num-request=500 and use IP_ADDRESS inside the locust script to access its value which is 100.0.1.1 in this case.
Nowadays it is possible to add custom parameters to Locust (it wasnt possible when this question was originally asked, at which time using env vars was probably the best option).
Since version 2.2, custom parameters are even forwarded to the workers in a distributed run.
https://docs.locust.io/en/stable/extending-locust.html#custom-arguments
from locust import HttpUser, task, events
#events.init_command_line_parser.add_listener
def _(parser):
parser.add_argument("--my-argument", type=str, env_var="LOCUST_MY_ARGUMENT", default="", help="It's working")
# Set `include_in_web_ui` to False if you want to hide from the web UI
parser.add_argument("--my-ui-invisible-argument", include_in_web_ui=False, default="I am invisible")
#events.test_start.add_listener
def _(environment, **kw):
print("Custom argument supplied: %s" % environment.parsed_options.my_argument)
class WebsiteUser(HttpUser):
#task
def my_task(self):
print(f"my_argument={self.environment.parsed_options.my_argument}")
print(f"my_ui_invisible_argument={self.environment.parsed_options.my_ui_invisible_argument}")
It is not recommended to run locust in command line if you want to test in high concurrency. As in --no-web mode, you can only use one CPU core, so that you can not make full use of your test machine.
Back to your question, there is not another way to pass custom parameters to locust in command line.

How to load objects in memory and share across different executions of Celery worker?

I have setup celery + rabbitmq for on a 3 cluster machine. I have also created a task which generates a regular expression based on data from the file and uses the information to parse text. However, I would like that the process of reading the file is done only once per worker spawn and not on every execution of as task.
from celery import Celery
celery = Celery('tasks', broker='amqp://localhost//')
import re
#celery.task
def add(x, y):
return x + y
def get_regular_expression():
with open("text") as fp:
data = fp.readlines()
str_re = "|".join([x.split()[2] for x in data ])
return str_re
#celery.task
def analyse_json(tw):
str_re = get_regular_expression()
re.match(str_re,tw.text)
In the above code, I would like to open the file and read the output into the string only once per worker, and then the task analyse_json should just use the string.
Any help will be appreciated,
thanks,
Amit
Put the call to get_regular_expression at the module level:
str_re = get_regular_expression()
#celery.task
def analyse_json(tw):
re.match(str_re, tw.text)
It will only be called once, when the module is first imported.
Additionally, if you must have only one instance of your worker running at a time (for example CUDA), you have to use the -P solo option:
celery worker --pool solo
Works with celery 4.4.2.

Is it possible to use custom routes for celery's canvas primitives?

I have distinct Rabbit queues each dedicated to a special kind of order processing:
# tasks.py
#celery.task
def process_order_for_product_x(order_id):
pass # elided ...
#celery.task
def process_order_for_product_y(order_id):
pass # elided ...
# settings.py
CELERY_QUEUES = {
"black_hole": {
"binding_key": "black_hole",
"queue_arguments": {"x-ha-policy": "all"}
},
"product_x": {
"binding_key": "product_x",
"queue_arguments": {"x-ha-policy": "all"}
},
"product_y": {
"binding_key": "product_y",
"queue_arguments": {"x-ha-policy": "all"}
},
We have a policy of enforcing explicit routing by setting CELERY_DEFAULT_QUEUE = 'black_hole' and then never consuming from black_hole.
Each of these tasks may use celery's canvas primitives, like so:
# tasks.py
#celery.task
def process_order_for_product_x(order_id):
# These can run in parallel
stage_1_group = group(do_something.si(order_id),
do_something_else.si(order_id))
# These can run in parallel
another_group = group(do_something_at_end.si(order_id),
do_something_else_at_end.si(order_id))
# These run in a linear sequence
process_task = chain(
stage_1_group,
do_something_dependent_on_stage_1.si(order_id),
another_group)
process_task.apply_async()
Supposing I want specific uses of celery.group, celery.chord, celery.chord_unlock, and other canvas tasks to flow through the queue for its corresponding product, rather than getting trapped in a black_hole, is there a way to invoke each particular canvas task with either a custom task name or custom routing_key?
For reasons I won't go into I would prefer to not send all celery.* tasks to a catch-all celery_canvas queue, which is what I am doing in the meantime.
This method allows you to route Celery canvas tasks to the queue of a callback task.
It is possible to specify a custom class-based task router for Celery as described here.
Let's focus on the celery.chord_unlock task. Its signature is defined here.
def unlock_chord(self, group_id, callback, ...):
The second positional argument is the signature of the chord callback task.
Task signatures in Celery are basically dicts, so that gives us an opportunity to access task options, including the task queue name.
Here is an example:
class CeleryRouter(object):
def route_for_task(self, task, args=None, kwargs=None):
if task == 'celery.chord_unlock':
callback_signature = args[1]
options = callback_signature.get('options')
if options:
queue = options.get('queue')
if queue:
return {'queue': queue}
Add it to the Celery config:
CELERY_ROUTES = (CeleryRouter(),
I'm currently using Celery in my project. For some scenarios I need task to chain though different queues:
chain(get_staff.s(url), save_staff.s(dt, partner_id, url))()
Those two functions declared like so:
#task(queue='celery_gevent')
def get_staff(source_url):
#task # send to default queue
def save_staff(suggests, dt, partner, url):
btw, celery_gevent is handled by worker with gevent pool to make http requests.
This example, how you can specify queue implicitly. Also you can explicitly put task in a different queue by specifying additional params, like so:
In [1]: add.apply_async([4,5])
Out[1]: <AsyncResult: bda3dedd-c2c4-44db-be8e-6a97e718f8b0>
$ sudo rabbitmqctl list_queues
Listing queues ...
celery 1
...done.
In [2]: add.apply_async([4,5], queue='your_product')
Out[2]: <AsyncResult: 934f6161-298b-468b-9716-3da6fae58fa5>
$ sudo rabbitmqctl list_queues
Listing queues ...
celery 1
your_product 1
...done.
You can run whole canvas in custom queue:
process_task.apply_async(queue='your_queue')
Try to specify queue_name inside #task decorator. This should help.
Links:
http://docs.celeryproject.org/en/latest/reference/celery.app.task.html
http://docs.celeryproject.org/en/latest/_modules/celery/app/task.html#Task.apply_async