My company is hosting a few separate, but related, moderately hit, web sites. Accordingly, a production database server, staging database server, production web server, staging web server, etc are needed. My question is, should we invest in physically separate servers for each of our needs, or should we put that money together and invest in a much higher end server and virtualize all of the aforementioned servers? Which route would you guys decide on, and why?
That depends on a lot of things, here are the main considerations.
If you have a lot of servers with low to moderate usage, virtualization should generally save you money on hardware, power, and floorspace. There is a tipping point, however, based on the overhead of the VM layer itself. Honestly, you will have to experiment to find the right cost/performance balance on this. I am sure the VM vendors would be happy to help you with the math.
The downside is that virtualization creates a single point of failure. If that box fails, you have downtime for all of your servers. Having them separate makes it far less likely for everything to take a dive at once.
You certainly want physical separation between the development and the production servers. You shouldn't ever have to worry that something you do in dev could bring down the machine on which production is hosted. And, there are some problems in development that really require either a hard reset of the physical machine or a ludicrous work-around to avoid a hard reset.
As for production web server and production database, you're not really introducing any new points of failure by virtualizing them on the same machine, particularly if you can colocate a static version of the site on another server. For any modern web site of even moderate complexity, database failure is web site failure anyway.
From my experience, for low or moderate usage a VM is the way to go - if you get just one very powerful server instead on several moderately powerful servers you save money, power and space and make the application faster at the same time because it's running on faster hardware.
A VM also have same another nice advantages, if the server hardware fails you can load the same VM on different physical hardware and continue running like nothing happened (you do have full backups, don't you?) and you can take a copy of the actual production server and run it in isolation on a development machine to debug those annoying bug that only appear in production.
But I would put the development (and maybe testing) servers on a different physical machine than production, you need to make sure no matter what stupid mistake you made in development it wouldn't take down the production server.
Related
I have a social networking site which is almost ready. On the site people would upload images and put information about themselves for their profile and would also post messages (which can include images). I am wondering exactly how to proceed (hosting, servers etc.), I am a relative beginner at all this stuff so I am not sure exactly what route to take. I am thinking of maybe hosting from home initially from my Personal Computer and maybe expand by acquiring servers to stack (which I am not exactly sure how to do honestly) if we grow. Since the site is aimed at a small proportion of the population, I am not expecting huge growth in traffic but I want to be prepared for spikes, albeit small ones. I was wondering if maybe it is possible to just host it off my computer and store the the database (MySQL) in a removable disk along with the images. I was also thinking about cloud hosting, which seems to be the most common, but I was wondering if that really is the best thing to do, given this is a social networking site. I know this question is very vague and broad, but since I am a beginner I really have no clue how to proceed. What is the best thing to do? Thank you so much!
Hosting from personal computers is a bad idea for few reasons - your internet bandwidth limits the speed of the website, you need to maintain 24/7 interest connectivity/ power and all the resources.
I suggest you to start with AWS, get a free account of AWS, which comes with a basic level machine free for 12 months, more details here (https://aws.amazon.com/activate/).
Deploy a machine in EC2,
Install the webserver and MySQL tools into the machine
Host your files in this machine.
Refer this machine public ip to your domain service provider(where you bought your domain. Example: Godaddy)
Deploying a machine and configuring the server takes a while, but its worth, it and the best part is its FREE for 12 months, so you need not worry about the pricing, connectivity and bandwidth.
Also when you think the traffic is more, you can upgrade your server with few clicks with no config changes.
I'm in the early stages of planning out a virtualised environment for our production system (Moodle). The layers are relatively simple:
web - Apache 2.2
Database - MySQL 5
PHP 5.2
My question is this, what is the generally accepted approach for distributing the above layers amongst phycsical servers? In this case, we are planning to have 2 physical servers. Should I aim to keep my web server cluster on a single physical server and database cluster on another? Or, replicate a full stack on both servers, in case one fails? Any insights into this would be a great help to me.
thanks,
Cathal.
We use separate (virtual) servers, but do maintain separate stacks on each simply because the overheads are small and it allows for flexibility if we want to scale up/down. This is not for fallback however, because if one server is so broken that it's not web accessible, you probably won't be able to get data off it and onto the second server in order for it to be a useful replacement. Use proper backups for fallback and practice restoring from them regularly.
Moodle generally blocks on the PHP side rather than the DB side and we see roughly 3.5:1 PHP:MySQL CPU loads when they are on separate machines. With that in mind, you need to consider what the maximum capacity of one server is: you will get best performance if you have no network overhead between the machines at all, so bigger is better. If you can't do it with one, then making 2 VMS, one larger for PHP and one smaller for MySQL is the best option, but do benchmark the differences under load for your particular setup (use Apache JMeter for this).
Our largest installs involve 70,000 users or so and we have two 4-CPU/8GB VMs, one for PHP and one for MySQL (although the DB one rarely goes above 30% CPU). This allows for about 400 concurrent connections via Apache. However, we are using a large farm of VMs and can scale up and down between 2 and 16 CPUs easily, so you may wish to consider one monster machine if you want flexibility.
For more information on Moodle performance, look here, particularly under 'scalability'.
although I've much experience writing code. I don't really have much experience deploying things. I am writing a project that uses mongodb for persistence, redis for meta-caching, and play for serving pages. I am deciding whether to buy a dedicated server vs buying multiple small/medium instance from amazon/linode (one for each, mongo, redis, play). I have thought of the trade-offs as below, I wonder if anyone can add to the list or provide further insights. I am leaning toward (b) buying two sets of instances from linode and amazon, so if one of them have an outage it will fail over to the other provider. Also if anyone has any tips for deploying scala/maven cluster or tools to do so, much appreciated.
A. put everything in one instance
Pros:
faster speed between database and page servlet (same host).
cheaper.
less end points to secure.
Cons:
harder to manage. (in my opinion)
harder to upgrade a single module. if there are installation issues, it might bring down the whole system.
B. put each module (mongo,redis,play) in different instances
Pros:
sharding is easier.
easier to create cluster for a single purpose. (i.e. cluster of redis)
easier to allocate resources between module.
less likely everything will fail at once.
Cons:
bandwidth between modules -> $
secure each connection and end point.
I can only comment about the technical aspects (not cost, serviceability, etc ...)
It is not mentioned whether the dedicated instance is a physical box, or simply a large VM. If the application generates a lot of roundtrips to MongoDB or Redis, then the difference will be quite significant.
With a VM, the cost of I/Os, OS scheduling and system calls is higher. These elements tend to represent an important part in the performance cost of efficient remote data stores like MongoDB or Redis, and the virtualization toll is higher for them.
From a system point of view, I would not put MongoDB and Redis/Play on the same box if the MongoDB database is expected to be larger than the available memory. MongoDB maps data files in memory, and relies on the OS to perform memory swapping. It is designed for this. The other processes are not. Swapping induced by MongoDB will have catastrophic consequences on Redis and Play response time if they are all on the same box. So I would at least separate MongoDB from Redis/Play.
If you plan to use Redis for caching, it makes sense to keep it on the same box than the Play server. Redis will use memory, but low CPU. Play will use CPU, but not much memory. So it seems a good fit. Also, I'm not sure it is possible from Play, but if you use a unix domain socket to connect to Redis instead of the TCP loopback, you can achieve about 50% more throughput for free.
I work for a small company with a .NET product that was acquired by a medium sized company with "big iron" products. Recently, the medium-sized part of the company acquired another small company with a similar .NET product and management went to have a look at their technology. They make heavy use of virtualization in their production environment and it's been decided that we will too.
Our product was not designed to be run in a virtual environment, but some accommodations can be made. For instance; there are times when we're resource bound due to customer initiated processes. This initiation is "bursty" by nature, but the processing can be made asynchronous and throttled. This is something that would need to be done for scalability anyway.
But there is other processing that we do that isn't so easily modified because we're resource bound for extended periods of time.
How do I convince management that heavy use of virtualization is probably not appropriate for us?
If I were your manager, and heard your argument (above), I'd assume that you're just resistant to change. I'd challenge you to show me the data. You haven't really made a case against virtualization. You say that your product "was not designed to be run in a virtual environment". You're in good company, very few apps ARE designed that way. It usually "just works". And if it's too slow, they just throw more resources at it. If they need to move it, make it fault tolerant, expand or contract, it's all transparent. Poorly-behaved apps can be firewalled from other environments, without having to have dedicated hardware. etc., etc.,. What's not to like about that?
You should prepare a better argument, backed up with data from testing. Or you should prepare to be steamrolled by an organization with a lot of time, $$$, and momentum invested in (insert favorite technology here).
It sounds like you're confused about how virtualization works.
You still need to provide enough resources for your virtual machines, the real benefit of virtualization is consolidating 5 machines that only run at 10-15% CPU onto a single machine that will run at 50-75% CPU and which still leaves you 25-50% overhead for those "bursty" times.
If your "bursty" application is slowing down other VM's, then you need to put resource limits in place (e.g. VM#1 can't use more than 3Ghz CPU) and ensure that there are enough resources.
I've seen this in a production environment, where 20 machines were virtualized but each was using as much CPU as it could. This caused problems as a machine was trying to use more Ghz than a single core could provide but the VM would only show a single core. Once we throttled the CPU usage of each VM to the maximum available from any single core, performance skyrocketed. I've seen the same with overallocation of RAM and where the hypervisor keeps paging to disk and killing performance.
Virtualization works, given sufficient resources.
Don't fight the methods, specify requirements.
Do some benchmarks on different sized platforms and establish a rough requirement guideline. If possible, don't say 'this is the minimum needed'; it's better to say "with X resources, we do Y work units per hour, with X', we do Y'. A host that costs Z$ can hold W virtual machines of X' resources", then the bean counters will have beans to count. If after all they decide that virtualization is cost-effective, they might be right.
Why does it help to know about virtualization from a programmer's perspective? Except testing and developing on several different platforms without the need of switching between operating systems is there a particular reason why virtualization is important for a programmer? Are there any details that must be kept in mind before developing on virtual instances?
I use it for testing our installer, because it is important to check whether the application will work on a clean installation of the operating system.
I used to do these tests by keeping a hard drive with a fresh operating system installation and making a copy of that disk for (almost) every new test run. This was very time consuming, and the virtual machine solution has saved me a lot of time. Note that this even allows you to do remote debugging as easily as when using two non-virtual machines.
Note: If you're interested, I'm using VirtualBox, which is a very good and free virtualization tool.
If you develop a driver or something very close to the hardware with a high risk to crash the machine, you will be glad to be working on a virtual machine.
Reverting to an old state is easier than to repair a damaged OS.
One of the main advantages is having your entire development environment as a single image file. I have a perfectly configured version of Windows Server, Visual Studio, ReSharper, etc. I can easily try a new version of something on a copy of this virtual machine without worrying about it causing problems.
I can also back up my entire dev environment to transfer it to another physical machine very easily. I've been through 3 machines in this office alone so that was a lifesaver in itself.
The only real trade-off I see is performance. You generally have to use less physical CPU cores than you actually have and less memory. With a sufficiently powerful machine this is not much of a problem though.
Edit: As nader said, I/O is obviously important for most projects as well. Although developing on a virtual machine does mean a fairly large I/O penalty compared with a native OS install, in practice I rarely find it to be a problem. The superior random access capabilities of SSDs are helping to mitigate this drawback as well.
Being able to completely reset the state of the system is very useful to debug applications which modify their environment - If the actions are repeated after a reset, and they're constrained to the sandbox environment of the VM, you are pretty much guaranteed to get the same result.
We have a large number of different versions / customer customisations of our software, and its not possible for 2 installs of our software to coexist on the same machine. Virtualisation allows us to replace the 50-60 physical machines that we need to maintain for testing and problem reproduction with 2-3 virtual servers - it takes around 10 miniutes to make a copy of a VHD template we have and create a new virtual machine, and as long as you allocate 1-2Gb of RAM the performance is comparable to that of a (slow) physical machine.
Virtual machines are also great for build machines.
Personally I do all of my development on my deskop machine for best performance, and remote debug into VMs. I dont run virtual machines on my desktop as it uses up too much RAM, we have dedicated virtual servers for that.
Good for developing, because you have same server configuration in virtual machine like on production server.
https://stackoverflow.com/questions/905926/developer-software-setup
From a user space application there should be no difference developing for a virtualised OS versus a normal OS. There may be some gotchas if your code makes explicit assumptions of the machines memory size and number of processors and believes what the hypervisor tells you.
I'm surprised no one has mentioned the ease of deployment. All you need to do is get the build down on the virtual O/S and then you can copy the image to as many new servers (running some kind of virtualization solution [like VMWare]) as you want, easily scaling your application.
Record the state of a bug in a program, and send it to the developer (along with the entire "machine").
Testing your code on various O.S, some of which you don't have.
Working in a more protected environment, making sure that the code doesn't harm your system -useful for understanding dangerous programs, like viruses, and developing security against it, for writing potentially wrong hard-drive programs, and anything that can have catastrophic effects on your system.
Easily Write your own O.S without the need to write on 'real' boot sectors, a potentially harmful act (Hope this is not new...).
Quickly use tools and programs not found on your own O.S.
Demonstrate a program at various times, by restoring a virtual machine,
quicker and less prone to failure, than trying to recreate the state at the minutes before the demo.
Less directly connected to programing, but surfing vie a virtual machine (for example to see documentation) has the added value that your own important system (and code) is less likely to be harmed by malicious programs.
From my experience in most cases the answer is typically "no" (When testing and targeting multiple platforms is removed) Both are huge reasons to be familiar with "desktop" VM solutions. Others have done an excellent job of listing rare exceptions like debugging kernel codes.
There are some quirks one must be aware of when running on a virtual machine. This is hardly an exhaustive list:
Loss of precsision or even time reversal in high resolution timers due to emulation of hardware resources (depends somewhat on the vm platform and operating system)
Virtual network interfaces ususally bridged. We've seen some extremely odd behavior in the host system with an application that sets up its own bridge between virtual interfaces -behavior which logically should not effect the host in one of the leading VM solutions.
Usage models - If your product has orwellian licensing codes or records state dependant behavior when interacting with remote systems you should account for what would happen if a system were "paused" and "restarted" or restarted from an earlier "state". Normally this kind of thing would be taken into account anyway in a robust implementation.
If you are developing in a virtual environment you will want to make sure you know what specifications were used to create the environment. If you have say a 4 Gig machine and create a virtual environment with 1 Gig you will want to make sure things in your development do not grow to a point that it overruns the memory. This will cause slight performance problems. I personally ran into this and it was a pretty tricky thing to track down. The scenario was that I was fixing a bug and testing it in a virtual environment. I did not setup the virtual environment by the way... The application took a performance hit because of all of the memory swapping that was taking place.
A very good use for a virtual environment is when you are developing applications that mess with the Windows Gina. It's much easier to reinstall a virtual environment than an entire PC....(been here done that too).
I do all of my development on a virtual XP instance under VMWare Fusion so that I can use a Mac for everything and still write .NET code ;-)
Sometimes they are necessary, because the platform you are programming doesn't support the standard developer environment. One such example is Sharepoint. As of Sharepoint 2007 you still need a server OS to install Sharepoint 2007, WSS, and the Visual Studio Sharepoint Extensions (VseWSS).
Thus for Sharepoint I have to use a Window Server VM to do my development work. As for Sharepoint 2010 they are supporting installations on Vista and 7 x64, but I will still use a VM, because I don't want to have Sharepoint on my main machine slowing everything down. Rather I want it in a VM where the services are on when needed and off when I don't without having to manually turn off/on each service. This in addition to the many great answers posted above.