I like Locust, but I'm having a problem interpreting the results.
e.g. my use case is that I have a petition site. I expect 10,000 people to sign the petition over a 12 hour period.
I've written a locust file that simulates user behaviour:
Some users load but don't sign petition
Some users load and submit invalid data
Some users (hopefully) successfully submit.
In real life the user now goes away (because the petition is an API not a main website).
Locust shows me things like:
with 50 concurrent users the median time is 11s
with 100 concurent users the median time is 20s
But as one "Locust" just repeats the tasks over and over, it's not really like one user. If I set it up with a swarm of 1 user, then that still represents many real world users, over a period of time; e.g. in 1 minute it might do the task 5 times: that would be 5 users.
Is there a way I can interpret the data ("this means we can handle N people/hour"), or some way I can see how many "tasks" get run per second or minute etc. (ie locust gives me requests per second but not tasks)
Tasks dont really exist on the logging level in locust.
If you want, you could log your own fake samples, and use that as your task counter. This has an unfortunate side effect of inflating your request rate, but it should not impact things like average response times.
Like this:
from locust.events import request_success
...
#task(1)
def mytask(self):
# do your normal requests
request_success.fire(request_type="task", name="completed", response_time=None, response_length=0)
Here's the hacky way that I've got somewhere. I'm not happy with it and would love to hear some other answers.
Create class variables on my HttpLocust (WebsiteUser) class:
WebsiteUser.successfulTasks = 0
Then on the UserBehaviour taskset:
#task(1)
def theTaskThatIsConsideredSuccessful(self):
WebsiteUser.successfulTasks += 1
# ...do the work...
# This runs once regardless how many 'locusts'/users hatch
def setup(self):
WebsiteUser.start_time = time.time();
WebsiteUser.successfulTasks = 0
# This runs for every user when test is stopped.
# I could not find another method that did this (tried various combos)
# It doesn't matter much, you just get N copies of the result!
def on_stop(self):
took = time.time() - WebsiteUser.start_time
total = WebsiteUser.successfulTasks
avg = took/total
hr = 60*60/avg
print("{} successful\nAverage: {}s/success\n{} successful signatures per hour".format(total, avg, hr)
And then set a zero wait_time and run till it settles (or failures emerge) and then stop the test with the stop button in the web UI.
Output is like
188 successful
0.2738157498075607s/success
13147.527132862522 successful signatures per hour
I think this therefore gives me the max conceivable throughput that the server can cope with (determined by changing the No. users hatched until failures emerge, or until the average response time becomes unbearable).
Obviously real users would have pauses, but that makes it harder to test the maximums.
Drawbacks
Can't use distributed Locust instances
Messy; also can't 'reset' - have to quit the process and restart for another test.
Related
Edited Version:
I'm actually modelling an airport check-in terminal. It works fine so far, but additional I'm still trying to implement a function, that allows my pedestrians not to enter the service-queue if the queue time exceeds a preselected value (e.g. already 15 Passengers in the queue) and therefore walks to some kind of backup Service that opens during this busy times.
Here is my approach:
Variable QueueSize returns permanently the actual Number of Passengers in the Queue.
Every time a ped enters the pedservice block CheckInEco, the function waitingTime() starts:
QueueSize = CheckInEco.size();
if (QueueSize > 15) CheckInEco.cancel(ped)
So, as soon as there are more than 15 Agents in the queue, number 16 should bypass and move to an alternate ServiceBlock, which I would connect to the ccl Port of the CheckInEco Service. But when building the model, I get this message: ped cannot be resolved to a variable?
According to Anylogic Help, it should be possible to use this cancel - call, but I'm not really experienced with it.. Maybe, someone can help me out?
You can simply use a select output block to prevent pedestrians from going into the service block if there are more than 16 pedestrians already in.
Your original question had to do with waiting time, you should follow the exact same approach. But with waiting time it gets more complicated since you don't want to take the average waiting time from the start of the simulation.... so you need to decide if you want to take the last 10 minutes, 1 hour etc and do you want to include the current waiting time of agents in the queue. Since this is the the questions anymore I am not going to add it here, perhaps ask a new question if this is still the case.
Actually what I want to do,
I created dashboards to monitor the alert status in grafana.
I created fake data in my system to simulate my alert situations on these boards. The time of this data covers the range now - now + 12h. In fact, it takes a long time to analyze the alert status in real data. For this reason, I cannot be very flexible on my alert rules. I have to wait until the end of this period to see alert status in the system. (I have many states like this actually.) Grafana creates pending, alerting, and ok states according to the records in my database. Is there a method to quickly verify my tests without waiting for this time?
The main problem is that it is fairly expensive to do in a data source agnostic way. The way worked in Bosun is you would select a time range, and then an interval or a number of queries to run.
Setting both From and To enables testing multiple iterations of the selected alert over time. The number of iterations depends on the setting to the two linked fields Intervals and Step Duration at 3 Changing one changes the other. Intervals will be the number of runs to do even spaced out over the duration of From to To and Step Duration is how much time in minutes should be between intervals. Doing a test over time will populate the Timeline tab 5 which draws a clickable graphic of severity states for each item in the set:
It would then run all those queries with a pool limiting simultaneous queries. For an interval of say 5 minutes, it would run adjacent 5 minute queries.
So this would speed up the alert authoring and testing workflow significantly. But it would best be implemented as a job system. This is because with more expensive queries, or range/interval combination that is a fair amount of runs, it may take a minute or so - so having to wait on an open network connection is less ideal.
So I found I generally used in two modes:
To tweak a specific alert that had fired at some time
To get a general overview of how much the alert rule would trigger for the historical data
For the general over, a larger time range is generally wanted, which means more queries if the interval is kept the same. And with a feature like FOR (Pending), you would have to use the same interval it would actually run at.
So possible, has some limitations, and some care needs to be taken to do it right. But extremely useful in my experience.
I would like to run a spike test using Locust.IO where a large of number of requests are made in parallel to my service.
I have experimented with locust and this is type of command I would like to run:
locust -f locustfile.py --headless --host https://example.com --users 1000 --hatch-rate 1000 --run-time 5s
While running this test no requests are made. I have also tried extending the run time to 60 seconds and no requests are made.
Is there a way of running this type of test in locust?
Here is an example of how to make Users wait until they have all been started.
This way you can have a more reasonable hatch rate (maybe 50/s per load gen) but still have all your users to start the same second (more or less).
class MyUser(HttpUser):
#task
def t(self):
while self.environment.runner.user_count < self.environment.runner.target_user_count:
time.sleep(1)
# do your stuff
You may still encounter Python/OS/network related problems related to creating too many outgoing network connections in a short period, so you may need to combine this with multiple processes and maybe even multiple load gens (https://docs.locust.io/en/stable/running-locust-distributed.html)
You might be able to work around this if you first do a dummy request against the server before the sleep so that the connection is already established (although this may reduce the realism of your test a little)
You might also want to consider subclassing FastHttpUser instead of HttpUser: https://docs.locust.io/en/stable/increase-performance.html
I'm running a similation where i would lige to know the total amount of time agents spends in a delay block. I can access the data when running single simulations in the Dataset log under flowchart_stats_time_in_state_log
https://imgur.com/R5DG51a
However i would like to to write the data from block 5 (spraying) to an output in order to store the data when running multiple simulations.
https://imgur.com/MwPBvO8
Im guessing that the value reffence should look something like the expression below. It is not working however so i would aprreciate it alot if anybody could help me out or suggest an alternate solution for getting the data.
flowchart_stats_time_in_state_log.total_seconds.spraying;
Btw. Time measures dose not work for this situation since i need to know the total amount of time spend in a block after a 12 hour shift. with time measures i do not get the data from the agents that are still in the block when the simulation ends.
Based on the goal of summing all processing times, you could solve it mathematically. Set the output equal to block.statsUtilization.mean() * capacity * time() calculated on simulation end.
For example, if you have a capacity of 1 and a run length of 100 minutes, then if you had a utilization of 50%; that means you had an agent in the block for 50 minutes. Utilization = time busy / total time. Because of this relationship, we can calculate how long agents were actually in the block.
Another alternative would be to have a variable to track time in block, incrementing when the agents leave. At end of the run, you would need to call a function to iterate over the agents still in the block to add their time. AnyLogic allows you to pretty easily loop over queues, delays, or anything that holds agents:
for( MyAgent agent : delayBlockName ){
variable += time() - agent.enterBlockTime;
}
To implement this solution, you would need to create your own agent (name it something better than MyAgent) with a variable for when the agent enters the block. You would then need to then mark the time each agent enters the block.
My developer recently disappeared and I need to make a small change to my website.
Here's the code I'll be referring to:
my $process = scalar(#rows);
$process = 500 if $process > 500;
my $pm = Parallel::ForkManager->new($process);
This is code from a Perl script that scrapes data from an API system through a cron job. Every time the cron job is run, it's opening a ton of processes for that file. For example, cron-job.pl will be running 100+ times.
The number of instances it opens is based on the amount of data that needs to be checked so it's different every time, but never seems to exceed 500. Is the code above what would be causing this to happen?
I'm not familiar with using ForkManager, but from the research I've done it looks like it runs the same file multiple times, that way it'll be extracting multiple streams of data from the API system all at the same time.
The problem is that the number of instances being run is significantly slowing down the server. To lower the amount of instances, is it really as simple as changing 500 to a lower number or am I missing something?
To lower the number of instances created, yes, just lower 500 (in both cases) to something else.
Parallel::ForkManager is a way of using fork (spawning new processes) to handle parallel processing. The parameter passed to new() specifies the maximum number of concurrent processes to create.
Your code simplifies to
my $pm = Parallel::ForkManager->new(500);
It means: Limit the the number of children to 500 at any given time.
If you have fewer than 500 jobs, only that many workers will be created.
If you have more than 500 jobs, the manager will start 500 jobs, wait for one to finish, then start the next job.
If you want fewer children executing any given time, lower that number.
my $pm = Parallel::ForkManager->new(50);