Server Throughput definition ambiguity - server

Is throughput the max number of requests a server instance can handle or is it the number of requests that the server instance is currently handling?
Edit: By "currently handling" I mean, the number of requests the server is receiving for a given time interval in recent time. For eg: The server is currently handling 400 reqs every min.
For eg:, I might have a server instance with a lot of hardware which can have high throughput, but I might be only receiving small amount of traffic. What does throughput measure in such a situation. Also, what about the inverse case, i.e if my instance can only handle x requests per min. but is receiving y>>>x requests per min.
If throughput is the max no. of requests a server can handle, how is it measured? Do we do a load/stress test, where we keep increasing the requests per min on the server until it cannot handle them anymore?

No, Throughput is an aggregation that depends on execution time, you can send 1000 requests in the same second and your server won't handle, but when you'll send 1000 requests in an hour and your server will handle it normally.
Throughput is calculated as requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time).
You want to find the number of concurrent users that your server can handle by increasing JMeter threads until server reach his maximum

Throughput is the number of Samplers which JMeter executes within the duration of your test. If you want to see the actual amount of requests which are being sent - consider using i.e. Server Hits Per Second listener (can be installed using JMeter Plugins Manager)
If you see that your server resources consumption doesn't increase as you increase the load in JMeter the reasons are in:
Your application middleware configuration is not suitable for high load (i.e. there is a limit of concurrent sessions and requests are queuing up instead of being processed), check out Web Performance Optimization: Top 3 Server and Client-Side Performance Tips for overall ideas what could be looked at
Your application code doesn't utilize underlying OS resources efficiently, consider using profiler tool to see what's going on under the hood.
JMeter may fail to send requests fast enough, make sure to follow JMeter Best Practices and if JMeter's machine is overloaded - consider going for Distributed Testing

Related

Locust eats CPU after 2-3 hours running

I have a simple HTTP server that I was testing. This server interacts with other HTTP servers and Cassandra DB.
Currently I was using 100 users with 1 request/s, so totally 100 tps was on the server. What I noticed with the Docker stats was that the CPU usage became higher and higher and ~ 2-3 hours later the CPU usage reaches the 90% mark, and even more. After that I got a notice from Locust, stating that the measurement may be inconsistent. But the latencies were not increased, so I do not know why this has been happening.
Can you please suggest possible cause(s) of the problem? I think 100 tps should be handled by one vCPU.
Thanks,
AM
There's no way for us to know exactly what's wrong without at very least seeing some code, and even then other factors like the environment or data or server you're running it on or against could have additional factors we wouldn't know about.
It's possible you have a problem with your code for your Locust users, such as a memory leak or they're just doing too much for a single worker to handle that many users. For users only doing simple HTTP calls, a single CPU typically can handle upwards of thousands of requests per second. Do anything more than that and you'll start to expect to reduce what a worker can handle. It's also possible you may just need a more powerful CPU (or more RAM or bandwidth) to do what you want it to do at the scale you want.
Do some profiling to see if you can find any inefficiencies in your code. Run smaller tests to see if the same behavior is evident with smaller loads. Run the same load but with additional Locust workers on other CPUs.
It's also just as possible your DB can't handle the load. The increasing CPU usage could be due to how your code is handling waiting on the connection from the DB. Perhaps the DB could sustain, say, 80 users at an acceptable rate but any additional users makes it fall further and further behind and your Locust users are then waiting longer and longer for the requested data.
For more suggestions, check out the Locust FAQ https://github.com/locustio/locust/wiki/FAQ#increase-my-request-raterps

determine ideal number of workers and EC2 sizing for master

I have a requirement to use locust to simulate 20,000 (and higher) users in a 10 minute test window.
the locustfile is a tasksquence of 9 API calls. I am trying to determine the ideal number of workers, and how many workers should be attached to an EC2 on AWS. My testing shows with 20 workers, on two EC2 instance, the CPU load is minimal. the master however suffers big time. a 4 CPU 16 GB RAM system as the master ends up thrashing to the point that the workers start printing messages like this:
[2020-06-12 19:10:37,312] ip-172-31-10-171.us-east-2.compute.internal/INFO/locust.util.exception_handler: Retry failed after 3 times.
[2020-06-12 19:10:37,312] ip-172-31-10-171.us-east-2.compute.internal/ERROR/locust.runners: RPCError found when sending heartbeat: ZMQ sent failure
[2020-06-12 19:10:37,312] ip-172-31-10-171.us-east-2.compute.internal/INFO/locust.runners: Reset connection to master
the master seems memory exhausted as each locust master process has grown to 12GB virtual RAM. ok - so the EC2 has a problem. But if I need to test 20,000 users, is there a machine big enough on the planet to handle this? or do i need to take a different approach and if so, what is the recommended direction?
In my specific case, one of the steps is to download a file from CloudFront which is randomly selected in one of the tasks. This means the more open connections to cloudFront trying to download a file, the more congested the available network becomes.
Because the app client is actually a native app on a mobile and there are a lot of factors affecting the download speed for each mobile, I decided to to switch from a GET request to a HEAD request. this allows me to test the response time from CloudFront, where the distribution is protected by a Lambda#Edge function which authenticates the user using data from earlier in the test.
Doing this dramatically improved the load test results and doesn't artificially skew the other testing happening as with bandwidth or system resource exhaustion, every other test will be negatively impacted.
Using this approach I successfully executed a 10,000 user test in a ten minute run-time. I used 4 EC2 T2.xlarge instances with 4 workers per T2. The 9 tasks in test plan resulted in almost 750,000 URL calls.
The answer for the question in the title is: "It depends"
Your post is a little confusing. You say you have 10 master processes? Why?
This problem is most likely not related to the master at all, as it does not care about the size of the downloads (which seems to be the only difference between your test case and most other locust tests)
There are some general tips that might help:
Switch to FastHttpUser (https://docs.locust.io/en/stable/increase-performance.html)
Monitor your network usage (if your load gens are already maxing out their bandwidth or CPU then your test is very unrealistic anyway, and adding more users just adds to the noice. In general, start low and work your way up)
Increase the number of loadgens
In general, the number of users is not an issue for locust, but number of requests per second or bandwidth might be.

How to retry DynamoDb write when throttled?

I am trying to write large amounts of data to dynamo using AmazonDynamoDBAsyncClient and I am trying to understand what the best practice of handling throttling is?
For example, I have a capacity of 3000 writes and at a given moment I have, let's say, 100,000 records I'd like to write. I don't need them all in immediately, but I am trying to figure what the best way to get them in is.
This application is running in a distributed environment so there maybe 5 executors all trying to do this at the same time. Would the best way to handle this be this way? Where I sleep the write process should we hit the throttle? Or should I be doing something to avoid the throttle completely. In fact, is my code even doing what I think it is, which is retrying the data after waiting a second?
try{
amazonDynamoAsyncDb.updateItemAsync(updateRequest)
}catch{
case e: ThrottlingException => {
Thread.sleep(1000)
//retry here, but how?
}
}
The AWS SDK for Java will retry throttled requests 10 times by default, before throwing a ProvisionedThroughputExceededException. If your items are small (1KB or less) and you are performing the writes from EC2 in the same region as your table you can assume each write will take around 10 ms. That means each thread of processing can do about 100 writes per second. To scale your writes to 3000 writes per second, you would need 30 threads and 30 HTTP connections. 3000 small (1kb) writes per second translates to a data throughput of 2.92 MB per second. Thus, for this write load, it does not appear that EC2 hardware could become a bottleneck. I recommend you do some measurements to figure out how long it takes to write each of your items on average, and scale your threads and HTTP connections appropriately.

httperf for bechmarking web-servers

I am using httperf to benchmark web-servers. My configuration, i5 processor and 4GB RAM. How to stress this configuration to get accurate results...? I mean I have to put 100% load on this server(12.04 LTS server).
you can use httperf like this
$httperf --server --port --wsesslog=200,0,urls.log --rate 10
Here the urls.log contains the different uri/path to be requested. Check the documention for details.
Now try to change the rate value or session value, then see how many RPS you can achieve and what is the reply time. Also in mean time monitor the cpu and memory utilization using mpstat or top command to see if it is reaching 100%.
What's tricky about httperf is that it is often saturating the client first, because of 1) the per-process open files limit, 2) TCP port number limit (excluding the reserved 0-1024, there are only 64512 ports available for tcp connections, meaning only 1075 max sustained connections for 1 minute), 3) socket buffer size. You probably need to tune the above limit to avoid saturating the client.
To saturate a server with 4GB memory, you would probably need multiple physical machines. I tried 6 clients, each of which invokes 300 req/s to a 4GB VM, and it saturates it.
However, there are still other factors impacting hte result, e.g., pages deployed in your apache server, workload access patterns. But the general suggestions are:
1. test the request workload that is closest to your target scenarios.
2. add more physical clients to see if the changes of response rate, response time, error number, in order to make sure you are not saturating the clients.

Distributed rate limiting

I have multiple servers/workers going through a task queue doing API requests. (Django with Memcached and Celery for the queue) The API requests are limited to 10 requests a second. How can I rate limit it so that the total number of requests (all servers) don't pass the limit?
I've looked through some of the related rate limit questions I'm guessing they are focused on a more linear, non concurrent scenario. What sort of approach should I take?
Have you looked in Rate Limiter from Guava project? They introduced this class in one of the latest releases and it seems to partially satisfy your needs.
Surely it won't calculate rate limit across multiple nodes in distributed environment but what you coud do is to have rate limit configured dynamically based on number of nodes which are are running (ie for 5 nodes you'd have rate limit of 2 API requests a second)
I have been working on an opensource project to solve this exact problem called Limitd. Although I don't have clients for other technologies than node yet, the protocol and the idea are simple.
Your feedback is very welcomed.
I solved that problem unfortunately not for your technology: bandwidth-throttle/token-bucket
If you want to implement it, here's the idea of the implementation:
It's a token bucket algorithm which converts the containing tokens into a timestamp since when it last was completely empty. Every consumption updates this timestamp (locked) so that each process shares the same state.