What is the best way to convince management that virtualization isn't always appropriate in production? - virtualization

I work for a small company with a .NET product that was acquired by a medium sized company with "big iron" products. Recently, the medium-sized part of the company acquired another small company with a similar .NET product and management went to have a look at their technology. They make heavy use of virtualization in their production environment and it's been decided that we will too.
Our product was not designed to be run in a virtual environment, but some accommodations can be made. For instance; there are times when we're resource bound due to customer initiated processes. This initiation is "bursty" by nature, but the processing can be made asynchronous and throttled. This is something that would need to be done for scalability anyway.
But there is other processing that we do that isn't so easily modified because we're resource bound for extended periods of time.
How do I convince management that heavy use of virtualization is probably not appropriate for us?

If I were your manager, and heard your argument (above), I'd assume that you're just resistant to change. I'd challenge you to show me the data. You haven't really made a case against virtualization. You say that your product "was not designed to be run in a virtual environment". You're in good company, very few apps ARE designed that way. It usually "just works". And if it's too slow, they just throw more resources at it. If they need to move it, make it fault tolerant, expand or contract, it's all transparent. Poorly-behaved apps can be firewalled from other environments, without having to have dedicated hardware. etc., etc.,. What's not to like about that?
You should prepare a better argument, backed up with data from testing. Or you should prepare to be steamrolled by an organization with a lot of time, $$$, and momentum invested in (insert favorite technology here).

It sounds like you're confused about how virtualization works.
You still need to provide enough resources for your virtual machines, the real benefit of virtualization is consolidating 5 machines that only run at 10-15% CPU onto a single machine that will run at 50-75% CPU and which still leaves you 25-50% overhead for those "bursty" times.
If your "bursty" application is slowing down other VM's, then you need to put resource limits in place (e.g. VM#1 can't use more than 3Ghz CPU) and ensure that there are enough resources.
I've seen this in a production environment, where 20 machines were virtualized but each was using as much CPU as it could. This caused problems as a machine was trying to use more Ghz than a single core could provide but the VM would only show a single core. Once we throttled the CPU usage of each VM to the maximum available from any single core, performance skyrocketed. I've seen the same with overallocation of RAM and where the hypervisor keeps paging to disk and killing performance.
Virtualization works, given sufficient resources.

Don't fight the methods, specify requirements.
Do some benchmarks on different sized platforms and establish a rough requirement guideline. If possible, don't say 'this is the minimum needed'; it's better to say "with X resources, we do Y work units per hour, with X', we do Y'. A host that costs Z$ can hold W virtual machines of X' resources", then the bean counters will have beans to count. If after all they decide that virtualization is cost-effective, they might be right.

Related

Is Virtual Memory in some way related to Virtualization Technology?

I think it is a bit vague question. But I was trying to get a clear understanding on how a hypervisor interacts with operating systems under the hood, and what makes them two so different. Let me just drive you through my thought process.
Why do we need a virtualization manager a.k.a. a hypervisor, if we already have an operating system to manage resources which are shared?
One answer that I got was: suppose the system crashes, and if we have no virtualization manager, then it's a full loss. So, virtualization keeps another system unaffected, by providing isolation.
Okay, then why do we need an operating system? Well, both operating systems and hypervisors have different task to handle: hypervisor handles how to allocate the resources (compute, networking etc.), while OS handles process management, file system, memory (hmm.. We also have a virtual memory. Right?)
I think I haven't asked the question in a trivial manner? But I am confused, so may be I could get a little help to clear my insight.
"Virtual" roughly means "something that is not what it seems". It is a common task in computing to substitute one thing with another.
A "virtual resource" is a common approach for that. It also means that there is an entity in a system that transparently substitutes one portion of resource with another. Memory is one of the most important resources in computing systems, therefore "Virtual Memory" is one of the first terms that historically was introduced.
However, there are other resources that are worth virtualizing. One can virtualize registers, or, more specifically, their values. Input/output devices, time, number of processors, network connections — all these resources can be and are virtualized these days (see: Intel VT-d, Virtual Time papers, Multicore simulators, Virtual switches and network adapters as respective examples). A combination of such things is what roughly constitutes a "Virtualization Technology". It is not a well-defined term, unless you talk about Intel® Virtualization Technology, which is one-vendor trade name.
In this sense, a hypervisor is such an entity that substitutes/manages chosen resources transparently to other controlled entities, which are then said to reside inside "containers", "jails", "virtual machines" — different names exist.
Both operating system and hypervisors have different task to handle
In fact, they don't.
An operating system is just a hypervisor for regular user applications, as it manages resources behind their back and transparently for them. The resources are: virtual memory, because an OS makes it seem that every application has a huge flat memory space for its own needs; virtual time, because each application does not manage context switching points; virtual I/O, because each application uses system calls to access devices instead of directly writing into their registers.
A hypervisor is a fancy way to say a "second level operating system", as it virtualizes resources visible to operating systems. The resources are essentially the same: memory, time, I/O; a new thing are system registers.
It can go on and on, i.e. you can have hypervisors of higher levels that virtualize certain resources for entities of lower level. For Intel systems, it roughly corresponds to the stack SMM -> VMM -> OS -> user application, where SMM (system management mode) is the outermost hypervisor and user application is the inner entity (that actually does useful job of running a web browser and web server you use right now).
Why do we need a virtualization manager aka hypervisor, if we already have an operating system to manage how the resources are shared?
We don't need it if chosen computer architecture supports more than one level of indirection for resource management (e.g. nested virtualization). Thus, it depends on chosen architecture. On certain IBM systems (System/360, years 1960-1970), hypervisors were invented and used much earlier than operating systems had been introduced in a modern sense. More common IBM Personal Computer architecture based on Intel x86 CPUs (around 1975) had deficiencies that did not allow to achieve required level of isolation between multiple OSes without introducing a second layer of abstraction (hypervisors) into the architecture (which happened around 2005).

The benefits of deploying multiple instances for serving/data/cache

although I've much experience writing code. I don't really have much experience deploying things. I am writing a project that uses mongodb for persistence, redis for meta-caching, and play for serving pages. I am deciding whether to buy a dedicated server vs buying multiple small/medium instance from amazon/linode (one for each, mongo, redis, play). I have thought of the trade-offs as below, I wonder if anyone can add to the list or provide further insights. I am leaning toward (b) buying two sets of instances from linode and amazon, so if one of them have an outage it will fail over to the other provider. Also if anyone has any tips for deploying scala/maven cluster or tools to do so, much appreciated.
A. put everything in one instance
Pros:
faster speed between database and page servlet (same host).
cheaper.
less end points to secure.
Cons:
harder to manage. (in my opinion)
harder to upgrade a single module. if there are installation issues, it might bring down the whole system.
B. put each module (mongo,redis,play) in different instances
Pros:
sharding is easier.
easier to create cluster for a single purpose. (i.e. cluster of redis)
easier to allocate resources between module.
less likely everything will fail at once.
Cons:
bandwidth between modules -> $
secure each connection and end point.
I can only comment about the technical aspects (not cost, serviceability, etc ...)
It is not mentioned whether the dedicated instance is a physical box, or simply a large VM. If the application generates a lot of roundtrips to MongoDB or Redis, then the difference will be quite significant.
With a VM, the cost of I/Os, OS scheduling and system calls is higher. These elements tend to represent an important part in the performance cost of efficient remote data stores like MongoDB or Redis, and the virtualization toll is higher for them.
From a system point of view, I would not put MongoDB and Redis/Play on the same box if the MongoDB database is expected to be larger than the available memory. MongoDB maps data files in memory, and relies on the OS to perform memory swapping. It is designed for this. The other processes are not. Swapping induced by MongoDB will have catastrophic consequences on Redis and Play response time if they are all on the same box. So I would at least separate MongoDB from Redis/Play.
If you plan to use Redis for caching, it makes sense to keep it on the same box than the Play server. Redis will use memory, but low CPU. Play will use CPU, but not much memory. So it seems a good fit. Also, I'm not sure it is possible from Play, but if you use a unix domain socket to connect to Redis instead of the TCP loopback, you can achieve about 50% more throughput for free.

Programming considerations for virtualized applications

There are lots of questions on SO asking about the pros and cons of virtualization for both development and testing.
My question is subtly different - in a world in which virtualization is commonplace, what are the things a programmer should consider when it comes to writing software that may be deployed into a virtualized environment? Some of my initial thoughts are:
Detecting if another instance of your application is running
Communicating with hardware (physical/virtual)
Resource throttling (app written for multi-core CPU running on single-CPU VM)
Anything else?
You have most of the basics covered with the three broad points. Watch out for:
Hardware communication related issues. Disk access speeds are vastly different (and may have unusually high extremes - imagine a VM that is shut down for 3 days in the middle of a disk write....). Network access may interrupt with unusual responses
Fancy pointer arithmetic. Try to avoid it
Heavy reliance on unusually uncommon low level/assembly instructions
Reliance on machine clocks. Remember that any calls you're making to the clock, and time intervals, may regularly return unusual values when running on a VM
Single CPU apps may find themselves running on multiple CPU machines, that do funky things like Work Stealing
Corner cases and unusual failure modes are much more common. You might not have to worry as much that the network card will disappear in the middle of your communication on a real machine, as you would on a virtual one
Manual management of resources (memory, disk, etc...). The more automated the work, the better the virtual environment is likely to be at handling it. For example, you might be better off using a memory-managed type of language/environment, instead of writing an application in C.
In my experience there are really only a couple of things you have to care about:
Your application should not fail because of CPU time shortage (i.e. using timeouts too tight)
Don't use low-priority always-running processes to perform tasks on the background
The clock may run unevenly
Don't truss what the OS says about system load
Almost any other issue should not be handled by the application but by the virtualizer, the host OS or your preferred sys-admin :-)

Why can't we build virtualization that runs on several machines? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
You're probably familiar with virtualization which takes a single host and is able to "emulate" many instances by sharing the resources among them all. You probably heard about XEN.
Is it completely insane to imagine the "opposite" of XEN : a layer that would abstract several hosts in a single running instance? I believe this would allow building apps which wouldn't need to really care much about a "clustering" layer themselves.
I wonder what are the technical limits to this, because I'm pretty sure some people are already working on it somewhere :)
The goal is NOT to achieve any kind of failure recovery. I believe this can (and should?) be handled at a higher level. For example, if someone is able to run a MySQL server on a gigantic instance (made of say 50 hosts), then one can easily use MySQL's replication features to replicate the database over a similar virtual instance.
Good question. Microsoft Azure is attempting to address this by allowing you to put applications "in the cloud" and not have to be as concerned with scalability up/down, redundancy, data storage, etc. But this is not accomplished at the hypervisor level.
http://www.microsoft.com/windowsazure/
Hardware-wise, there are some downsides to having everything be one big VM rather than many smaller ones. For one thing, software doesn't always understand how to handle all the resources. For example, some applications still can't handle multiple processor cores. I've seen informal benchmarks showing that IIS performs better spreading the same resources over multiple instances rather than one giant instance.
From a management perspective, it is probably better to have multiple VMs in certain cases. Imagine that a bad deployment corrupts a node. If that were your one and only (albeit giant) node, now your whole application is down.
You're probably talking about the concept Single System Image.
There used to be a Linux implementation, openMosix that since closed down. I don't know of any replacements. openMosix made it very easy to create and use SSI on a standard Linux kernel; too bad it got overtaken by events.
I do not know enough about Xen to know if it is possible but with VMware you can create pools of resources which come from many physical hosts. Then you can assign the resources to your VMs. That could be many VMs or just one VM.
Aggregation: Transform Isolated Resources into Shared Pools
Simulating a single core over multiple physical cores is very inefficient. You can do it, but it'll be slower than a cluster. Two physical cores can talk to each other in near-real-time, if they're on separate machines then you're doing something like say clocking down your motherboard speed by factors of 10 or more if these two physical cores (and RAM) are communicating even over a fibre optic network.
Dual cores can communicate faster than two distinct CPUs on the same motherboard, if they are on separate machines, thats slower again, if there are multiple machines, slower even again.
Basically you can, but there is net performance loss compared to the net performance gain you would be hoping to achieve.
Real life example, I had a bunch of VMs on a dual quad core server (~2.5Ghz/core) performing way, way below what they should have been. On closer inspection, it turned out that the hypervisor was emulating a single 3.5-4Ghz core when the load on an individual VM was more than 2.5Ghz -- after limiting each VM to 2.5Ghz performance went back to what was expected.
I agree with saidimu, you are talking about the Single System Image concept. In addition to the OpenMosix project, there have been several commercial implementations of the same idea (one contemporary example is ScaleMP). It's not a new idea.
I just wanted to elaborate on some of the technical points of SSI.
Basically, the reason it's not done is because the performance is generally absolutely unpredictable or terrible. There is a concept in computer systems known as [NUMA][3], which basically means that the cost of accessing different pieces of memory is not uniform. This can apply to huge systems where CPUs may have some memory accesses routed around to different chips, or in cases where memory is accessed remotely over a network (such as in SSI). Typically, the operating system will attempt to compensate for this by laying out programs and data in memory in such a way that a program can run as quickly as possible. I.e., the code and data will all be placed in the same NUMA "region", and be scheduled on the closest possible CPU.
However, in cases where you are running big applications (attempting to use all the memory in your SSI), there is little the operating system can do to reduce the impact of remote memory fetches. MySQL is not aware that accessing page 0x1f3c will cost 8 nanoseconds, while accessing page 0x7f46 will stall it for hundreds of microseconds, possibly milliseconds while the memory is fetched over the network. This means that non-NUMA aware applications will run like crap (seriously, very bad) in this kind of environment. As far as I know, most comtemporary SSI products rely on the fastest possible interconnects (such as Infiniband) between machines to achieve even a passable performance.
This is also why frameworks that expose the true cost of accessing data to the programmer (such as MPI: message passing interface) have achieved more traction than SSI or DSM (distributed shared memory) approaches. In fact, there is basically no way for a programmer to optimize an application to run in an SSI environment, which just sucks.

Wastage of resources in Virtualization

I am not sure if this is the write place to ask the question. However i hope it is.
When looking for a VPS earlier today, I was trying to understand how each container would work in the background. Keeping in mind the fact that the operating system uses most of the power and power on a system, wouldn't having multiple operating systems in the same machine mean more wastage of resources.
For instance if i was running centOS on a dedicated box and it was running lets say 20 background OS level processes. Then i go and install a virtualization platform and install 5 more centOS virtual machines in the same system which are exactly the same as the host operating system. Doesn't this mean duplication of those 20 processes 6 times? So internally the context switching is happening between 120 processes instead of 20?
Firstly your question seems to touch on two topics, full virtualization and paravirtualization. Most VPS are providing a paravirtualized environment which (to very broadly generalize) only virtualizes parts of the OS, it appears as a fully virtualized system to the user but in terms of processes, I/O, it can be very different depending on the OS and the way this has been implemented.
When dealing with full guest virtualization, the main reason and benefit of Virtualization is reclaiming underutilized resources. Making use of that idle capacity.
For example, 5 machines running at average resource utilization of 15% could be virtualized on a single server and use an average of 75% resources, still leaving 25% overhead to handle peak capacity.
If your processes can co-exist on the same system, all depend on the same libraries, configuration settings, etc. can be brought up/down and restarted without affecting each other - then you may "waste" resources virtualizing them.
However if you need to reboot/restart Server A without affecting Server B and they both have pretty low usage, or the two applications depend on different kernel versions for example - then that's a good candidate for virtualization.
When you move up to enterprise level virtualization and start thinking about computing costs in cents-per-hour and dollars-per-gigabyte then this "overhead" is nothing compared to the savings and other benefits. You don't have disks half empty, CPUs idling, wasted resources, competition for who gets to configure what. Virtual hosts can move between hosts depending on load, fault tolerance, high-availability, automated provisioning.