System aggregation using virtualization - operating-system

Instead of virtualizing a single machine to appear as several machines it is theoretically possible to make several machines appear as one. ScaleMP Appears to use this technique.
Where can I find research/literature about development such a system?
Are there any open source platforms for this?

I don't know of a system which does this transparently, i.e. which pretends that multiple computers are one.
If programs need to run on multiple computers they are often specifically developed for that purpose, often using a technique such as MPI.
Also, systems with a lot of running processes may take a running process and move it to another computer, to distribute load more evenly.

Related

what can a computer do without an operation system

An operation system is a program that controls all the other programs so that the user can manipulate he's work.
What will happen if the os seize to exist?
Could the user still be able to perform some tasks or not?
The OS is not just needed to control other programs, but also to do all the heavy-lifting involved with dealing with the hardware, so the actual programs don't have to do it or even know about it.
The user can do whatever his programs allow him to do, while the programs need the operating system to function properly (or at all). If there where no OS, all the programs would have to do everything that OS was doing for them before (which is A LOT).
Among many things that OS does the most important for the user is that it allows to run several programs simultaneously without them interfering with each other.
Computers do not need operating systems. If the computer does not have an operating system, the application needs to perform the operating system functions. Such applications as operating systems used to be more common than they are today. They are most common in real-time systems where the computer is performing only one general function.

How to imitate servers (without loss of computing power)?

I have production environment, which is running on one server. But I need to run 2 instances of one software, each on "another" server.
Is it possible to imitate more servers on one real server for free? Without loss of computing power and network flow in/out of the real server?
EDIT:
In another words: I want to run two instances of the same software on one machine.
And then I need to use some function that transport some subinstance from instance1 into instance2. But this function is only possible to use when instance1 is on another server than instance2. So I need to imitate that one of both instances running on local is on different servers.
I'm making an assumption that you are using Windows, in which case you could use a Hypervisor like Hyper-V however if you have only purchased one license of Windows you may be fairly limited in what you can run in a production capacity.
If you mean that the software you need to run only has one license you typically are not allowed to virtualize it either, so it seems like the answer is legally you are not going to be able to do much with just one license, however my assumptions may be all wrong, your question wasn't clear enough.

Why can't we build virtualization that runs on several machines? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
You're probably familiar with virtualization which takes a single host and is able to "emulate" many instances by sharing the resources among them all. You probably heard about XEN.
Is it completely insane to imagine the "opposite" of XEN : a layer that would abstract several hosts in a single running instance? I believe this would allow building apps which wouldn't need to really care much about a "clustering" layer themselves.
I wonder what are the technical limits to this, because I'm pretty sure some people are already working on it somewhere :)
The goal is NOT to achieve any kind of failure recovery. I believe this can (and should?) be handled at a higher level. For example, if someone is able to run a MySQL server on a gigantic instance (made of say 50 hosts), then one can easily use MySQL's replication features to replicate the database over a similar virtual instance.
Good question. Microsoft Azure is attempting to address this by allowing you to put applications "in the cloud" and not have to be as concerned with scalability up/down, redundancy, data storage, etc. But this is not accomplished at the hypervisor level.
http://www.microsoft.com/windowsazure/
Hardware-wise, there are some downsides to having everything be one big VM rather than many smaller ones. For one thing, software doesn't always understand how to handle all the resources. For example, some applications still can't handle multiple processor cores. I've seen informal benchmarks showing that IIS performs better spreading the same resources over multiple instances rather than one giant instance.
From a management perspective, it is probably better to have multiple VMs in certain cases. Imagine that a bad deployment corrupts a node. If that were your one and only (albeit giant) node, now your whole application is down.
You're probably talking about the concept Single System Image.
There used to be a Linux implementation, openMosix that since closed down. I don't know of any replacements. openMosix made it very easy to create and use SSI on a standard Linux kernel; too bad it got overtaken by events.
I do not know enough about Xen to know if it is possible but with VMware you can create pools of resources which come from many physical hosts. Then you can assign the resources to your VMs. That could be many VMs or just one VM.
Aggregation: Transform Isolated Resources into Shared Pools
Simulating a single core over multiple physical cores is very inefficient. You can do it, but it'll be slower than a cluster. Two physical cores can talk to each other in near-real-time, if they're on separate machines then you're doing something like say clocking down your motherboard speed by factors of 10 or more if these two physical cores (and RAM) are communicating even over a fibre optic network.
Dual cores can communicate faster than two distinct CPUs on the same motherboard, if they are on separate machines, thats slower again, if there are multiple machines, slower even again.
Basically you can, but there is net performance loss compared to the net performance gain you would be hoping to achieve.
Real life example, I had a bunch of VMs on a dual quad core server (~2.5Ghz/core) performing way, way below what they should have been. On closer inspection, it turned out that the hypervisor was emulating a single 3.5-4Ghz core when the load on an individual VM was more than 2.5Ghz -- after limiting each VM to 2.5Ghz performance went back to what was expected.
I agree with saidimu, you are talking about the Single System Image concept. In addition to the OpenMosix project, there have been several commercial implementations of the same idea (one contemporary example is ScaleMP). It's not a new idea.
I just wanted to elaborate on some of the technical points of SSI.
Basically, the reason it's not done is because the performance is generally absolutely unpredictable or terrible. There is a concept in computer systems known as [NUMA][3], which basically means that the cost of accessing different pieces of memory is not uniform. This can apply to huge systems where CPUs may have some memory accesses routed around to different chips, or in cases where memory is accessed remotely over a network (such as in SSI). Typically, the operating system will attempt to compensate for this by laying out programs and data in memory in such a way that a program can run as quickly as possible. I.e., the code and data will all be placed in the same NUMA "region", and be scheduled on the closest possible CPU.
However, in cases where you are running big applications (attempting to use all the memory in your SSI), there is little the operating system can do to reduce the impact of remote memory fetches. MySQL is not aware that accessing page 0x1f3c will cost 8 nanoseconds, while accessing page 0x7f46 will stall it for hundreds of microseconds, possibly milliseconds while the memory is fetched over the network. This means that non-NUMA aware applications will run like crap (seriously, very bad) in this kind of environment. As far as I know, most comtemporary SSI products rely on the fastest possible interconnects (such as Infiniband) between machines to achieve even a passable performance.
This is also why frameworks that expose the true cost of accessing data to the programmer (such as MPI: message passing interface) have achieved more traction than SSI or DSM (distributed shared memory) approaches. In fact, there is basically no way for a programmer to optimize an application to run in an SSI environment, which just sucks.

Benefits of JVM atop an OS VM?

I see many deployments where IT groups run effectively nothing but a JVM application stack inside a VM (vmware, &c) instance.
I guess I consider the JVM to be a formal VM: what real benefit is it to run your Java application stack inside another VM?
Two JVM instances within the same (real or virtualized) machine wouldn't be completely isolated from each other: they couldn't both have sockets listening on the same well-known numbered port, they might interfere with each other if they both wrote in the same filesystem, and so on, and so forth.
Using OS-level VMs (vmware or whatever) does guarantee you as much isolation as you would have on physically separate systems, which is quite a different proposition.
It's an unfortunate terminology collision
Those are really two different terms that unfortunately use the same english words, but have only a rather abstract connection.
IBM used the term "virtual machine" first, so I guess we can't rename that one to "virtual server" or something.
Too bad "software framework" doesn't have VM in its initials. If you think of the JVM that way it will be obvious that you are really just running a framework in a VM, not a thing inside the same kind of thing...
So a real VM can casually give away super user shell accounts, ssh access, software installation privs, ....
what real benefit is it to run your Java application stack inside another VM?
By doing this, your JVM will run on virtualized hardware that you can modify and run in parallel of other virtual machines. This is a nice way to slice a big server into "shares" that you can allocate on demand.
(EDIT: I'm answering a comment from the OP directly in this answer)
I get what you're saying, but why would one not be able to do the very same thing as separate processes on the host OS?
I could mention that a guest can possibly run another OS but this is not the most important part. As pointed out in another answer, the biggest difference is that a virtual machine is isolated from other VMs, it's are real dedicated environment. The port stuff was a good example but I prefer to illustrate it this way: another process won't eat "your" CPU cycles. This is a very important difference, especially for IT teams that usually don't like to share resources. Instead, you can size a virtual machine exactly as needed, possibly dynamically, and bill IT teams for what they are really using. This is IMO what makes mutualisation of resources actually possible (and thus costs cutting).

Virtualization and why it is good for programmers

Why does it help to know about virtualization from a programmer's perspective? Except testing and developing on several different platforms without the need of switching between operating systems is there a particular reason why virtualization is important for a programmer? Are there any details that must be kept in mind before developing on virtual instances?
I use it for testing our installer, because it is important to check whether the application will work on a clean installation of the operating system.
I used to do these tests by keeping a hard drive with a fresh operating system installation and making a copy of that disk for (almost) every new test run. This was very time consuming, and the virtual machine solution has saved me a lot of time. Note that this even allows you to do remote debugging as easily as when using two non-virtual machines.
Note: If you're interested, I'm using VirtualBox, which is a very good and free virtualization tool.
If you develop a driver or something very close to the hardware with a high risk to crash the machine, you will be glad to be working on a virtual machine.
Reverting to an old state is easier than to repair a damaged OS.
One of the main advantages is having your entire development environment as a single image file. I have a perfectly configured version of Windows Server, Visual Studio, ReSharper, etc. I can easily try a new version of something on a copy of this virtual machine without worrying about it causing problems.
I can also back up my entire dev environment to transfer it to another physical machine very easily. I've been through 3 machines in this office alone so that was a lifesaver in itself.
The only real trade-off I see is performance. You generally have to use less physical CPU cores than you actually have and less memory. With a sufficiently powerful machine this is not much of a problem though.
Edit: As nader said, I/O is obviously important for most projects as well. Although developing on a virtual machine does mean a fairly large I/O penalty compared with a native OS install, in practice I rarely find it to be a problem. The superior random access capabilities of SSDs are helping to mitigate this drawback as well.
Being able to completely reset the state of the system is very useful to debug applications which modify their environment - If the actions are repeated after a reset, and they're constrained to the sandbox environment of the VM, you are pretty much guaranteed to get the same result.
We have a large number of different versions / customer customisations of our software, and its not possible for 2 installs of our software to coexist on the same machine. Virtualisation allows us to replace the 50-60 physical machines that we need to maintain for testing and problem reproduction with 2-3 virtual servers - it takes around 10 miniutes to make a copy of a VHD template we have and create a new virtual machine, and as long as you allocate 1-2Gb of RAM the performance is comparable to that of a (slow) physical machine.
Virtual machines are also great for build machines.
Personally I do all of my development on my deskop machine for best performance, and remote debug into VMs. I dont run virtual machines on my desktop as it uses up too much RAM, we have dedicated virtual servers for that.
Good for developing, because you have same server configuration in virtual machine like on production server.
https://stackoverflow.com/questions/905926/developer-software-setup
From a user space application there should be no difference developing for a virtualised OS versus a normal OS. There may be some gotchas if your code makes explicit assumptions of the machines memory size and number of processors and believes what the hypervisor tells you.
I'm surprised no one has mentioned the ease of deployment. All you need to do is get the build down on the virtual O/S and then you can copy the image to as many new servers (running some kind of virtualization solution [like VMWare]) as you want, easily scaling your application.
Record the state of a bug in a program, and send it to the developer (along with the entire "machine").
Testing your code on various O.S, some of which you don't have.
Working in a more protected environment, making sure that the code doesn't harm your system -useful for understanding dangerous programs, like viruses, and developing security against it, for writing potentially wrong hard-drive programs, and anything that can have catastrophic effects on your system.
Easily Write your own O.S without the need to write on 'real' boot sectors, a potentially harmful act (Hope this is not new...).
Quickly use tools and programs not found on your own O.S.
Demonstrate a program at various times, by restoring a virtual machine,
quicker and less prone to failure, than trying to recreate the state at the minutes before the demo.
Less directly connected to programing, but surfing vie a virtual machine (for example to see documentation) has the added value that your own important system (and code) is less likely to be harmed by malicious programs.
From my experience in most cases the answer is typically "no" (When testing and targeting multiple platforms is removed) Both are huge reasons to be familiar with "desktop" VM solutions. Others have done an excellent job of listing rare exceptions like debugging kernel codes.
There are some quirks one must be aware of when running on a virtual machine. This is hardly an exhaustive list:
Loss of precsision or even time reversal in high resolution timers due to emulation of hardware resources (depends somewhat on the vm platform and operating system)
Virtual network interfaces ususally bridged. We've seen some extremely odd behavior in the host system with an application that sets up its own bridge between virtual interfaces -behavior which logically should not effect the host in one of the leading VM solutions.
Usage models - If your product has orwellian licensing codes or records state dependant behavior when interacting with remote systems you should account for what would happen if a system were "paused" and "restarted" or restarted from an earlier "state". Normally this kind of thing would be taken into account anyway in a robust implementation.
If you are developing in a virtual environment you will want to make sure you know what specifications were used to create the environment. If you have say a 4 Gig machine and create a virtual environment with 1 Gig you will want to make sure things in your development do not grow to a point that it overruns the memory. This will cause slight performance problems. I personally ran into this and it was a pretty tricky thing to track down. The scenario was that I was fixing a bug and testing it in a virtual environment. I did not setup the virtual environment by the way... The application took a performance hit because of all of the memory swapping that was taking place.
A very good use for a virtual environment is when you are developing applications that mess with the Windows Gina. It's much easier to reinstall a virtual environment than an entire PC....(been here done that too).
I do all of my development on a virtual XP instance under VMWare Fusion so that I can use a Mac for everything and still write .NET code ;-)
Sometimes they are necessary, because the platform you are programming doesn't support the standard developer environment. One such example is Sharepoint. As of Sharepoint 2007 you still need a server OS to install Sharepoint 2007, WSS, and the Visual Studio Sharepoint Extensions (VseWSS).
Thus for Sharepoint I have to use a Window Server VM to do my development work. As for Sharepoint 2010 they are supporting installations on Vista and 7 x64, but I will still use a VM, because I don't want to have Sharepoint on my main machine slowing everything down. Rather I want it in a VM where the services are on when needed and off when I don't without having to manually turn off/on each service. This in addition to the many great answers posted above.