Statistic for requests in deployed VPS servers - deployment

I was thinking about different scalability features, and suddenly understand that I don't really know how much can handle one server (VPS). The question for them who have loaded projects.
Imagine server with:
1 Gb Ram
1 Xeon CPU
CentOS
LAMP with FastCGI
PostgreSQL on the same machine
And we need to calculate count of request, so I decided to take middle parameters for app:
80% of requests using one call to db with indexes
40-50 Kb of html
Cache in 60% of cases
Add some other parameters, and lets calculate, or tell your story about your loads?

I would look at cacti - it can give you plenty of stats to choose from.

Related

Locust eats CPU after 2-3 hours running

I have a simple HTTP server that I was testing. This server interacts with other HTTP servers and Cassandra DB.
Currently I was using 100 users with 1 request/s, so totally 100 tps was on the server. What I noticed with the Docker stats was that the CPU usage became higher and higher and ~ 2-3 hours later the CPU usage reaches the 90% mark, and even more. After that I got a notice from Locust, stating that the measurement may be inconsistent. But the latencies were not increased, so I do not know why this has been happening.
Can you please suggest possible cause(s) of the problem? I think 100 tps should be handled by one vCPU.
Thanks,
AM
There's no way for us to know exactly what's wrong without at very least seeing some code, and even then other factors like the environment or data or server you're running it on or against could have additional factors we wouldn't know about.
It's possible you have a problem with your code for your Locust users, such as a memory leak or they're just doing too much for a single worker to handle that many users. For users only doing simple HTTP calls, a single CPU typically can handle upwards of thousands of requests per second. Do anything more than that and you'll start to expect to reduce what a worker can handle. It's also possible you may just need a more powerful CPU (or more RAM or bandwidth) to do what you want it to do at the scale you want.
Do some profiling to see if you can find any inefficiencies in your code. Run smaller tests to see if the same behavior is evident with smaller loads. Run the same load but with additional Locust workers on other CPUs.
It's also just as possible your DB can't handle the load. The increasing CPU usage could be due to how your code is handling waiting on the connection from the DB. Perhaps the DB could sustain, say, 80 users at an acceptable rate but any additional users makes it fall further and further behind and your Locust users are then waiting longer and longer for the requested data.
For more suggestions, check out the Locust FAQ https://github.com/locustio/locust/wiki/FAQ#increase-my-request-raterps

determine ideal number of workers and EC2 sizing for master

I have a requirement to use locust to simulate 20,000 (and higher) users in a 10 minute test window.
the locustfile is a tasksquence of 9 API calls. I am trying to determine the ideal number of workers, and how many workers should be attached to an EC2 on AWS. My testing shows with 20 workers, on two EC2 instance, the CPU load is minimal. the master however suffers big time. a 4 CPU 16 GB RAM system as the master ends up thrashing to the point that the workers start printing messages like this:
[2020-06-12 19:10:37,312] ip-172-31-10-171.us-east-2.compute.internal/INFO/locust.util.exception_handler: Retry failed after 3 times.
[2020-06-12 19:10:37,312] ip-172-31-10-171.us-east-2.compute.internal/ERROR/locust.runners: RPCError found when sending heartbeat: ZMQ sent failure
[2020-06-12 19:10:37,312] ip-172-31-10-171.us-east-2.compute.internal/INFO/locust.runners: Reset connection to master
the master seems memory exhausted as each locust master process has grown to 12GB virtual RAM. ok - so the EC2 has a problem. But if I need to test 20,000 users, is there a machine big enough on the planet to handle this? or do i need to take a different approach and if so, what is the recommended direction?
In my specific case, one of the steps is to download a file from CloudFront which is randomly selected in one of the tasks. This means the more open connections to cloudFront trying to download a file, the more congested the available network becomes.
Because the app client is actually a native app on a mobile and there are a lot of factors affecting the download speed for each mobile, I decided to to switch from a GET request to a HEAD request. this allows me to test the response time from CloudFront, where the distribution is protected by a Lambda#Edge function which authenticates the user using data from earlier in the test.
Doing this dramatically improved the load test results and doesn't artificially skew the other testing happening as with bandwidth or system resource exhaustion, every other test will be negatively impacted.
Using this approach I successfully executed a 10,000 user test in a ten minute run-time. I used 4 EC2 T2.xlarge instances with 4 workers per T2. The 9 tasks in test plan resulted in almost 750,000 URL calls.
The answer for the question in the title is: "It depends"
Your post is a little confusing. You say you have 10 master processes? Why?
This problem is most likely not related to the master at all, as it does not care about the size of the downloads (which seems to be the only difference between your test case and most other locust tests)
There are some general tips that might help:
Switch to FastHttpUser (https://docs.locust.io/en/stable/increase-performance.html)
Monitor your network usage (if your load gens are already maxing out their bandwidth or CPU then your test is very unrealistic anyway, and adding more users just adds to the noice. In general, start low and work your way up)
Increase the number of loadgens
In general, the number of users is not an issue for locust, but number of requests per second or bandwidth might be.

Calculating and improving the number of requests/concurrent users my webserver can handle?

I am building a dynamic website and app using HTML/Javascript/PHP/mysql. I have
completed the site and my main focus is now ensuring that when it is launched it is
not taken down by the traffic I am hoping to receive. (I predict around 5000-7000 unique visits on launch day).
The website is currently live, you can see it here : http://www.nightmapper.com/
My hosting is provided by bhost and I am on there silver VPS package:
1024MB Guaranteed Memory,
1536MB Burst Memory,
4 Virtual Cores,
40GB Disk Space,
750GB Data Transfer,
1 IPv4 Addresses
I manage the server myself, but I'm fairly new to it.
Anyway, the most computationally expensive page is the index/home page, on this
page I have 10 mySql queries, which are (mostly) used to get this weeks venue
listings. The listing results are each displayed with a thumbnail image.
the size of the home page for a first time visit is: 2.7mb, I have done
everything I can think of to minimize this including generating thumbnails to
reduce image size and utilizing browser caching.
I have tried a couple methods for stress testing the site including load impact: http://imgur.com/4UCGobf
and ab testing in terminal. I am worried by the results (mostly
with the result of 5.26 requests per second, which appears to be quite a low):
ab -n 100 -c 10 http://www.nightmapper.com/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking www.nightmapper.com (be patient).....done
Server Software: Apache/2.2.22
Server Hostname: www.nightmapper.com
Server Port: 80
Document Path: /
Document Length: 44808 bytes
Concurrency Level: 10
Time taken for tests: 19.012 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 4519300 bytes
HTML transferred: 4480800 bytes
Requests per second: 5.26 [#/sec] (mean)
Time per request: 1901.199 [ms] (mean)
Time per request: 190.120 [ms] (mean, across all concurrent requests)
Transfer rate: 232.14 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 26 38 17.9 32 107
Processing: 933 1828 510.2 1782 3495
Waiting: 22 116 303.4 28 1601
Total: 967 1867 518.8 1813 3591
Percentage of the requests served within a certain time (ms)
50% 1813
66% 1983
75% 2032
80% 2184
90% 2412
95% 3124
98% 3568
99% 3591
100% 3591 (longest request)
Using these results, how can I calculate the number of unique visitors a day and concurrent users I can handle, and which methods can I use to identify problems and improve on these results?
I should probably take this opportunity to ask for any good resources;
where I can learn more about such optimization, load testing and Scalability?
This is a complex problem, as there are many factors involved. Here are some things I would investigate:
Your home page as you state is very large, that is going to be a problem. You could look at a caching service for the images, that could help a lot (something like Amazon Cloudfront: https://aws.amazon.com/cloudfront/). This type of content delivery service copies your images to "edge" locations, and takes the burden off of your Web server for downloading those. It could make a very big difference. I would guess that this is the biggest portion of your content, so removing this from your Web server will make things much faster.
The next thing you mention is that you are performing 10 MySQL queries on the home page load, that is a lot of individual queries. If you can restructure your data model or queries to get it down to 1 or 2 queries, it will probably be much faster.
The other option you could try is some sort of paging scheme on the Web page, as the user scrolls down you can perform individual MySQL queries for each portion as it becomes visible.
It seems like you are running on a single server now, an easy thing to do is to run on at least 2 servers (1 for your Web server, 1 for MySQL). MySQL consumes a lot of memory and CPU when it gets busy, so isolating that is recommended.
For scaling your application server that is easy, you can use a load balancer and have many app server instances.
Scaling the database tier is more challenging, there are several ways to do that, including read-balancing (using MySQL replication to a read-only slave). After simple read-balancing it gets into sharding, but I doubt you will need that as it does not appear that you have a lot of database writes, or a very big data set. If you do get into a situation with high write volumes and very large data (50GB - 1TB), then sharding is worth looking into.
To estimate the number of users you can handle should be simple to figure out. There is a book I wrote called Software Pipelines which talks about approaches for doing this (http://www.amazon.com/Software-Pipelines-SOA-Multi-Core-Processing/dp/0137137974). The basic idea is to identify how long each step in your processing is taking, and compute that against the peak traffic you expect. You have the crude figures to do that now even with your current implementation. For example if you can do 5 loads of the home page/second, and you expect 7000 users/day, then just calculate the peak traffic. On average 7000 users/day (with 1 home page load each) is only about 5 page requests/minute. Therefore, even if your peak load is 10X that number, you should be able to handle the load.
The key is to understand and profile your application to see where the time is being spent, then apply one or more of the approaches outlined above.
Good luck with your site!

httperf for bechmarking web-servers

I am using httperf to benchmark web-servers. My configuration, i5 processor and 4GB RAM. How to stress this configuration to get accurate results...? I mean I have to put 100% load on this server(12.04 LTS server).
you can use httperf like this
$httperf --server --port --wsesslog=200,0,urls.log --rate 10
Here the urls.log contains the different uri/path to be requested. Check the documention for details.
Now try to change the rate value or session value, then see how many RPS you can achieve and what is the reply time. Also in mean time monitor the cpu and memory utilization using mpstat or top command to see if it is reaching 100%.
What's tricky about httperf is that it is often saturating the client first, because of 1) the per-process open files limit, 2) TCP port number limit (excluding the reserved 0-1024, there are only 64512 ports available for tcp connections, meaning only 1075 max sustained connections for 1 minute), 3) socket buffer size. You probably need to tune the above limit to avoid saturating the client.
To saturate a server with 4GB memory, you would probably need multiple physical machines. I tried 6 clients, each of which invokes 300 req/s to a 4GB VM, and it saturates it.
However, there are still other factors impacting hte result, e.g., pages deployed in your apache server, workload access patterns. But the general suggestions are:
1. test the request workload that is closest to your target scenarios.
2. add more physical clients to see if the changes of response rate, response time, error number, in order to make sure you are not saturating the clients.

Benefits of multiple memcached instances

Is there any difference between having 4 .5GB memcache servers running or one 2GB instance?
Does running multiple instances offer any benifits?
If one instance fails, you're still get advantages of using the cache. This is especially true if you are using the Consistenthashing that will bring the same data to the same instance, rather than spreading new reads/writes among the machines that are still up.
You may also elect to run servers on 32 bit operating systems, that cannot address more than around 3GB of memory.
Check the FAQ: http://www.socialtext.net/memcached/ and http://www.danga.com/memcached/
High availability is nice, and memcached will automatically distribute your cache across the 4 servers. If one of those servers dies for some reason, you can handle that error by either just continuing as if the cache was blank, redirecting to a different server, or any sort of custom error handling you want. If your 1x 2gb server dies, then your options are pretty limited.
The important thing to remember is that you do not have 4 copies of your cache, it is 1 cache, split amongst the 4 servers.
The only downside is that it's easier to run out of 4x .5 than it is to run out of 1x 2gb memory.
I would also add that theoretically, in case of several machines, it might save you some performance, as if you have a lot of frontends doing a lot of heavy reads, it's much better to split them into different machines: you know, network capabilities and processing power of one machine can become an upper bound for you.
This advantage is highly dependent on memcache utilization, however (sometimes it might be ways faster to fetch everything from one machine).