Related
This is a bit of an open ended question, but I'm looking for an open ended answer. I'm looking for a resource that can help explain how to benchmark different systems, but more importantly how to analyze the data and make intelligent choices based on the results.
In my specific case, I have a 4 server setup that includes mongo that serves as the backend for an iOS game. All servers are running Ubuntu 11.10. I've read numerous articles that make suggestions like "if CPU utilization is high, make this change." As a new-comer to backend architecture, I have no concept of what "high CPU utilization" is.
I am using Mongo's monitoring service (MMS), and I am gathering some information about it, but I don't know how to make choices or identify bottlenecks. Other servers serve requests from the game client to mongo and back, but I'm not quite sure how I should be benchmarking or logging important information from them. I'm also using Amazon's EC2 to host all of my instances, which also provides some information.
So, some questions:
What statistics are important to log on a backend setup? (CPU, RAM, etc)
What is a good way to monitor those statistics?
How do I analyze the statistics? (RAM usage is high/read requests are low, etc)
What tips should I know before trying to create a stress-test or benchmarking script for my architecture?
Again, if there is a resource that answers many of these questions, I don't need an explanation here, I was just unable to find one on my own.
If more details regarding my setup are helpful, I can provide those as well.
Thanks!
I like to think of performance testing as a mini-project that is undertaken because there is a real-world need. Start with the problem to be solved: is the concern that users will have a poor gaming experience if the response time is too slow? Or is the concern that too much money will be spent on unnecessary server hardware?
In short, what is driving the need for the performance testing? This exercise is sometimes called "establishing the problem to be solved." It is about the goal to be achieved-- because if there is not goal, why go through all the work of testing the performance? Establishing the problem to be solved will eventually drive what to measure and how to measure it.
After the problem is established, a next set is to write down what questions have to be answered to know when the goal is met. For example, if the goal is to ensure the response times are low enough to provide a good gaming experience, some questions that come to mind are:
What is the maximum response time before the gaming experience becomes unacceptably bad?
What is the maximum response time that is indistinguishable from zero? That is, if 200 ms response time feels the same to a user as a 1 ms response time, then the lower bound for response time is 200 ms.
What client hardware must be considered? For example, if the game only runs on iOS 5 devices, then testing an original iPhone is not necessary because the original iPhone cannot run iOS 5.
These are just a few question I came up with as examples. A full, thoughtful list might look a lot different.
After writing down the questions, the next step is decide what metrics will provide answers to the questions. You have probably comes across a lot metrics already: response time, transaction per second, RAM usage, CPU utilization, and so on.
After choosing some appropriate metrics, write some test scenarios. These are the plain English descriptions of the tests. For example, a test scenario might involve simulating a certain number of games simultaneously with specific devices or specific versions of iOS for a particular combination of game settings on a particular level of the game.
Once the scenarios are written, consider writing the test scripts for whatever tool is simulating the server work loads. Then run the scripts to establish a baseline for the selected metrics.
After a baseline is established, change parameters and chart the results. For example, if one of the selected metrics is CPU utilization versus the number of of TCP packets entering the server second, make a graph to find out how utilization changes as packets/second goes from 0 to 10,000.
In general, observe what happens to performance as the independent variables of the experiment are adjusted. Use this hard data to answer the questions created earlier in the process.
I did a Google search on "software performance testing methodology" and found a couple of good links:
Check out this white paper Performance Testing Methodology by Johann du Plessis
Have a look at the Methodology section of this Wikipedia article.
So i've been spending some time developing an iPhone app - it's a simple little game and is similar to "Words with friends" in that it:
1) is turn based
2) contacts a web service API to store the "game data" (turns, user info, etc).
In my case, i'm using .NET MVC and a SQL Server backend to develop the API. We're not talking an immense amount of data here - small images will be transferred back and forth and stored in the database though. A typical request would see a few records added or changed in the database.
I mostly don't have much concept of when things would start to get overloaded - my concern, of course, is that this thing takes off (obviously wishful thinking) and then my server gets so overwhelmed that it dies. That being said, I don't want to spend time and money on Windows Azure or something when my hosting needs may be totally trivial.
So, my somewhat general question is this - does anyone have any firsthand knowledge of when things start to get overloaded? Like...just a general estimate of number of requests or something for a time period, assuming each request hits the .NET app which then hits the database a reasonable number of times.
Even some anecdotal "My similar API gets hit 10,000 times a minute and is hosted on crappy shared hosting" would be awesome just so I get some concept.
Thanks in advance!
It is very hard to give a good answer to your question as it greatly depends on what precisely the backend does for each request. Even "trivial" services as you describe can easily differ greatly in performance depending on the actual implementation.
As a rough guideline based on our projects, if your API is a single HTTP request (no HTTPS), hitting a bare-bones controller, being translated into a single, simple SQL statement ("SELECT * FROM foo WHERE bar") returning less than 100 Bytes of data, you can serve about 750 requests per minute on a 32 Bit, 1 Gigahertz box with 512MB ram.
But this number will be reduced to 75 or less if any of those factors go up.
That said:
This is the poster-child case for cloud computing.
If Azure is too much hassle / cost for you (which is not an uncommon complaint from independent developers) you have three main alternatives:
1) Ditch .NET in favor of Python and host within Google App Engine
Python is quick to learn and GAE scales beautifully without you ever needing to care. Best of all, there is a huge free-tier so unless your app really takes off, you won't pay a cent. As you are developing for iOS, I assume you aren't hell bent on .NET to begin with.
2) If you need .NET, go with AWS
They also have a rather large free-tier. Either throw everything on top of a Mono stack (completely free for the 1st year) or shell out the money for a Windows EC2 instance. This takes more planning than GAE but with a little work you can make it scale to wherever your app goes.
If cost is a concern, use the same AWS cluster to host several of your Apps' APIs.
3) Go with OpenFeint's Multiplayer API
OpenFeint supports basic multiplayer games. If you can implement the needed functionality using it, then this might be the best solution. If not, look into (1) and (2).
How long is a piece of string? It all depends with the hosting and connection speeds. .Net is more than capable of handling LARGE amounts of requests. The simplest solution is to monitor the server (or if you cannot, monitor your web services performance) and get better hosting if your app starts to suffer.
We’re creating a massively-multiplayer social game. We expect up to 1 million concurrent users. The game is not real-time, instead it’s turn-based. We need reliable messaging between our clients and the server, preferably over HTTP protocol.
Besides the multiplayer functionality, we’ll also need a content delivery service.
Could you please recommend a server-side technology for us, so we’ll start searching for the right people to hire?
Is it correct assumption that no single server will hold that amount of load so it must scale horizontally?
Will Windows Azure do the job?
Thanks in advance.
Hmmm... gaming, concurrency, server?
G-WAN (200 KB, full-ANSI C scripts included).
This is the best candidate -by far. And it lets you grow horizontally with load-balancing as time goes (you will not have 1 million users the day you ship the game).
I know they are workng on applets (client-side) so you might benefit asking them the question.
[quote]a million concurrent users IS NOT a real number by any means[/quote]
There are games that have this concurrency, and more. Most of the popular Facebook games do, while they have their 15 days in the sun. That being said, having to solve that problem is a nice problem to have :-)
It's probably possible to write such a system on Azure, but you'd probably be piloting in uncharted waters, and you'd also have to pay Microsoft for the hosting. Compare to Amazon ECC for pricing, for example, and perhaps another approach would be better.
Other technologies to consider, depending on what it is you're really trying to do:
- J2EE
- Erlang/OTP
- Python/Twisted
Also, the networking and multiplayer game FAQ on gamedev.net: http://www.gamedev.net/community/forums/showfaq.asp?forum_id=15
Is it [a] correct assumption that no single server will hold that amount of load so it must scale horizontally?
Yes. It depends on how much work the server has to do per person, but I'd say 1 million concurrent users would require more than one server.
Will Windows Azure do the job?
Windows Azure will provide the computers and the storage for a fee. You have to provide the software and make sure the software can scale horizontally.
Is it correct assumption that no
single server will hold that amount of
load so it must scale horizontally?
No, that is nao avalid assumption. There are servers that are HUGH - 1000+ processors (not on a cluster). Also, a million concurrent users IS NOT a real number by any means - that would be way too much a slice of the concurrent facebook users. And it totally depends no what you do in your game. TUrn based could be chess, and I would not have a problem hostin 1.000.000 concurrent chess boards on a high end server with let's say 256gb memory.
Realistically, though, you possibly will scale horizontaly. First, it makes no sense to ahve a million people in one game / world (even eve online scales horizontally by solar system), second it is likely cheaper than buying a super big computer.
Will Windows Azure do the job?
Hahaha. Seriously. Scaling horizontally - yes.
Look at the price, calcualte up an nistance for a month, compare to dedicated server and laugh on the way to the shop. Nice for very varsying load, bad for base load.
Comapre mid range server (8-12cores, 64gb RAM) to an azure instance and iti s clear ONE azure instance is not going to compare.
I am wondering if there are clusters available to rent.
Scenario:
We have a program that will take what we estimate a week to run(after optimization) on a given file. Quite possibly, longer. Unfortunately, we also need to do approximately 300+ different files, resulting in approximately 300 weeks of compute time(roundable to 6 wallclock years of continuously running job). For a research job that should be done - at the latest - by December, that's simply unacceptable. While we are exploring other options, I am investigating the option of simply renting a Beowulf cluster. The job is academic and will lead towards the completion of a PhD.
What would be ideal would be a company that we send the source and the job files to the company and then receive a week or two later the result files. Voila!
Quick googling doesn't turn up anything terribly promising.
Suggested Solutions?
Cloud computing sounds like what you need. Amazon, Microsoft and Google rent computer resources on a pay for what you use basis.
Amazon's service is the most mature, and there are several questions already about Amazon's service, EG here and here.
Amazon EC2 (Elastic Compute Cloud) sounds like exactly what you're looking for. You can sign up for one or more virtual machines (up to 20 automatically, more if you request permission), starting at $0.10 an hour per VM, plus bandwidth costs (free between EC 2 machines and Amazon's other web services). You can choose between several operating systems (various Linux distributions, OpenSolaris, Windows if you pay extra), and you can use pre-existing machine images or create your own. If you're using all open-source software and don't have much bandwidth costs, it sounds like it would cost you around $5000 to run your job (assuming that your 6 years of compute time was for something comparable to their small instances, with a single virtual CPU).
Once you sign up for the service and get their tools set up, it's pretty easy to get new virtual machines launched. I've even spent the $0.10 to launch a machine for a few minutes just to verify an answer I was giving someone here on StackOverflow; I wanted to check something on Solaris, so I just booted up an instance and had a Solaris VM at my disposal within 5 minutes.
I don't know where are you doing your PhD... Most of the Asian, European, and North American universities have some clusters. You can
meet directly the people at the lab which is in charge of the cluster.
ask your PhD director to arrange that. Maybe he/she have some friends that can handle that.
Also, the classical trick is to use the unused time of the computers of your lab/university... Basically, each computer run a client application that crunch numbers when the computer is not used. See http://boinc.berkeley.edu/
This lead may prove helpful:
http://lcic.org/vendors.html
And this is a fantastic resource site on the matter:
http://www.hpcwire.com
The thread has been replete with pointers to Amazon's EC2 - and correctly so. They are the most mature in this area. Recently, they've released their elastic map-reduce platform which sound similar (although not exactly) like what you are trying to do. Google is not an option for you as their compute model doesn't support the generic compute model you need.
For academic/scientific use, there are several public centers offering HPC capability. In Europe, there is DEISA. http://www.deisa.eu/ and DEISA members. There must be similar possibilities in the US, probably thru the NSF.
For commercial use, check IBM Deep Computing On Demand offerings.
http://www-03.ibm.com/systems/deepcomputing/cod/
There are several ways to get time on clusters.
Purchase time on Amazon elastic cloud. Depending on how familiar you are with their service, it may take time to get it configured the way you want it.
Approach a university and see if they have a commercial program to rent out the time to companies. I know several do. One that I know of specifically is private sector program at NCSA at UIUC. Depending on the institution, they also offer porting and optimization service for your code.
Or you could rent CPU time from a private provider.
I'm from Slovenia and, for example, here we have a great private provider called Arctur. The guys were helpful and and responsive when I contacted them.
You can find them here: hpc.arctur.net
One option is to rent the virtual resources equivalent of whatever number of PCs you need, and set them up as a cluster, using the Amazon Elastic Compute Cloud.
Setting up a beowulf cluster of those is entirely possible.
Check out this link which provides resources and software to do exactly that.
Go to : http://www.extremefactory.com/index.php
True HPC cluster, up to 200 TFlops.
The reason I ask is that Stack Overflow has been Slashdotted, and Redditted.
First, what kinds of effect does this have on the servers that power a website? Second, what can be done by system administrators to ensure that their sites remain up and running as best as possible?
Unfortunately, if you haven't planned for this before it happens, it's probably too late and your users will have a poor experience.
Scalability is your first immediate concern. You may start getting more hits per second than you were getting per month. Your first line of defense is good programming and design. Make sure you're not doing anything stupid like reloading data from a database multiple times per request instead of caching it. Before the spike happens, you need to do some fairly realistic load tests to see where the bottlenecks are.
For absurdly high traffic, consider the ability to switch some dynamic pages over to static pages.
Having a server architecture that can scale also helps. Shared hosts generally don't scale. A single dedicated machine generally doesn't scale. Using something like Amazon's EC2 to host can help, especially if you plan for a cluster of servers from the beginning (even if your cluster is a single computer).
You're next major concern is security. You're suddenly a much bigger target for the bad guys. Make sure you have a good security plan in place. This is something you should always have, but it become more important with high usage.
Firstly, ask if you really want to spend weeks and thousands of $ on planning for something that might not even happen, and if it does happen, lasts about 5 hours.
Easiest solution is to have a good way to switch to a page simply allowing a signup. People will sign up and you can email them when the storm has passed.
More elaborate solutions rely on being able to scale quickly. That's firstly a software issue (can you connect to a db on another server, can you do load balancing). Secondly, your hosting solution needs to support fast expansion. Amazon EC2 comes to mind, or maybe slicehost. With both services you can easily start new instances ("Let's move the database to a different server") and expand your instances ("Let's upgrade the db server to 4GB RAM").
If you keep all data in the db (including sessions), you can easily have multiple front-end servers. For the database I'd usually try a single server with the highest resources available, but only because I haven't worked with db replication and it used to be quite hard to do, at least with mysql. Things might have improved.
The app designer needs to think about scaling up (larger machines with more cores and higher performance) and/or scaling out (distributing workload across multiple systems). The IT guy needs to work out how to best support that. The network is what you look at first, because obviously everything rides on top of it. Starting at the border, that usually means network load balancers and redundant routers being served by multiple providers. You can also look at geographic caching services and apps such as cachefly.
You want to reduce your bottlenecks as much as possible. You also want to design the environment such that it can be scaled out as needed without much work. Do the design work up front and it'll mean less headaches when you do get dugg.
Some ideas (of what I used in the past and current projects):
For boosting performance (if needed) you can put a reverse-proxying, caching squid in front of your server. Of course that only works if you don't have session keys and if the pages are somewhat static (means: they change only once an hour or so) and not personalised.
With the squid you can boost a bloated and slow CMS like typo3, thus having the performance of static websites with the comfort of a CMS.
You can outsource large files to external services like Amazon S3, saving your server's bandwidth.
And if you are able to spend some (three-figures per month) bucks, you can as well use a Content Delivery Network. Whith that in place you automatically have scaling, high-availability and low latencys for your users. Of course, your pages must be cachable, so session keys and personalised pages are a no-no. If designed carefully and with CDNs in mind, you can at least cache SOME content, like pics and videos and static stuff.
The load goes up, as other answers have mentioned.
You'll also get an influx of new users/blog comments/votes from bored folks who are only really interested in vandalism. This is mostly a problem for blogs which allow completely anonymous commenting, where some dreadful stuff will be entered. The blog platform might have spam filters sufficient to block it, but manual intervention is frequently required to clean up remaining drivel.
Even a little barrier to entry, like requiring a user name or email address even if no verification is done, will dramatically reduce the volume of the vandalism.