What application domains are CPU bound and will tend to benefit from multi-core technologies? - multicore

I hear a lot of people talking about the revolution that is coming in programming due to multi-core processors and parallelism, but I can't shake the feeling that for most of us, CPU cycles aren't the bottleneck. Pretty much all of my programs have been I/O bound in one way or another (database, filesystem, network, user interaction, etc.) for a very long time.
Now I can think of a few areas where CPU cycles are a limiting factor, like code breaking, graphics, sound, some forms of simulation (weather, physics, etc.), and some forms of mathematical research, but they all seem like fairly specialized application domains. My general impression is that most programs are still I/O bound and that for most of our industry CPUs have been plenty fast for quite a while now.
Am I off my rocker? What other application domains are CPU bound today? Do any of them include a large portion of the programming population? In essence, I'm wondering whether the multi-core CPUs will impact very many of us, and if so, how?

Visual effects / rendering. (Entertainment industry.)
Artificial Intelligence. (Games and scientific research.)
Biomedical research.
Physical simulations. (Games and scientific research.)
Database applications including SaaS, most webpages, etc.
As the personal computer becomes more and more a browser-based thin client for web applications, this industry will expand, as well as the need for more, and parallelized processing power on the backend. I could see gaming pushing parallel processing in personal computing.

One of the ways to leverage multi core is through a use of remote desktop technologies.
It's much easier to deploy desktop applications to one big Citrix server instead of dozens of user desktops.

Related

What is the best way to start programming with Real Time Linux?

Although I have implemented many projects in C, I am completely new to operating systems. I tried real time linux on Discovery board (STM32) and got the correct results for blinking LED but I didn't really understand the whole process since I just followed the steps and could not find whole description for each step on the internet.
I want to implement scheduling on real time linux. What is the best way to start? Any sites, books, tutorials available?
Complete RTLinux process description will be appreciated.
Thanks in adv.
The transition from "bare metal" to OS based programming is something that I experienced in reverse. I started out a complete software guy, totally into the OS side of things and over time I have moved to the opposite of that (even designing circuits in VHDL!). My advice would be to start simple. Linux is pretty complex, and everywhere you look there are many layers of things all working together to deliver the final product. If you are dead set on a real time linux extension, I'd be happy to suggest https://xenomai.org/ which is a real time extension for linux.
However, to more specifically address your question about implementing scheduling in Linux, you can, but it will be a large amount of work and can be very complicated. The OS uses a completely fair scheduling process ( http://en.wikipedia.org/wiki/Completely_Fair_Scheduler ) and whenever you spin up a thread, it simply gets added to the list to run. This can differ slightly if you implement your code in kernel space as a driver, rely on hardware interrupts, etc., but in general, this is how Linux works. Real time generally means that it has the ability to assign threads one of several different priorities and utilize thread preemption fully at any given time which are concepts that aren't really a part of vanilla Linux. It has some notion of this, but it has limitations that can cause problems when you are looking for real time behavior from Linux.
What may be helpful to you is an RTOS. If you are looking for a full on Real Time Operating System, check out FreeRTOS http://www.freertos.org/ . It has a large community and supports a lot of different devices out of the box with a large amount of example code. They even support your specific board with an example package, so you can give it a shot with nothing to lose! http://www.freertos.org/FreeRTOS-for-Cortex-M3-STM32-STM32F100-Discovery.html . It gives you access to many OS ish constructs like network APIs, memory management, and threading without the overhead and latency of a huge OS. With an RTOS, you create tasks and assign them priorities so you become the scheduler and are no longer at the mercy of the OS. You run the OS, not the OS runs you (if that makes sense). Plus, the constructs offered within an RTOS will feel much like bare metal code and thus will be much easier to follow, understand, and fully learn. It is a more simple world to learn the base building blocks of a full blown OS such as Linux or Windows. If this option sounds good, I would suggest looking through the supported devices on FreeRTOS website and picking one you would like to experiment with and then go for it. I would highly recommend this as a way to learn about scheduling and OS constructs in general as it is as simple as you can get and open source. Once you have the basics of an RTOS down, buying a book about Linux specifically wouldn't be a bad idea. Although there are many free resources on the web related to learning about Linux, they are commonly contradictory, and can be misleading. Pile on learning Linux specific knowledge along with OS in general, and it can feel overwhelming. Starting simpler will help keep you from getting burnt out and minimize the amount of time you spend feeling lost. Linux is definitely a learning process, but like with any learning process, start simple, keep your ultimate goal in mind, make a plan, and take small, manageable steps along that plan until you look up and find yourself exactly where you want to be. Then go tackle the next mountain!
The real-time Linux landscape is quite confusing. 99.99% of the information out there is just plain obsolete.
First, there are lots of "microkernels" that run Linux as one task. (Such as the defunct RTLinux). The problem is that you must write your real-time task to a different API, and can't depend on anything in Linux, because Linux will be frozen in the background while your task runs. So unless your task is dead-simple ("stop the motors when I press this button"), this approach will cause more pain than gain.
Next, there is the realtime Linux patch set. This hasn't been doing so well. because of the next item:
Lastly, the current Linux kernel has gotten rid of the problems that caused people to need realtime in the past. You can even turn off Linux on one of your processors to have full control of the CPU. See also this paper.
To answer your question: I see two different paths you could take:
1) Start with a normal 3.xx Linux kernel and explore the various APIs and realtime techniques (i.e. realtime priorities, memory pinning, etc.) This can get you "close enough" for 99% of what people want "realtime" for. If it's good enough for high frequency trading, it's probably good enough for you.
2) If you have a hard realtime requirement and you are worried that Linux won't cut it, then (as Nick mentioned above), just go buy a processor and write your realtime code with no OS. By splitting up your "realtime" and "non-realtime" code onto different CPUs, you will make the whole system simpler and much more robust.
If you want to learn real-time operating systems then I suggest that you get an FPGA, for example the Altera DE2, and experiment with your own operating system and ucos. You can read a good text about embedded RTOS here.
You could also get a Linux Raspberry and write your own operating system for that.

Setting up a distributed computing grid on local network using .NET

In our firm we run complex simulations using our own software developed in .NET. These simulations are well-suited to parallel computation and we currently make much use of the various multi-threading features native to .NET. Even so, simulations often take hours or days.
We'd like to explore the potential of distributing computation over our local network of high-performance (24 core) workstations to access more CPU power. However we have no experience in this area.
Searching on Google reveals a few MPI-based options such as Pure MPI, MPI.NET, plus some commercial software such as Frontier.
Which solution should we consider for something that is ideally well-suited to a .NET environment and is relatively easy to set up?
Thanks!
Multithreading != grid computing, so you will need to rewrite some parts of your application regardless of what you will choose in the end.
I don’t know your network infrastructure but it sounded to me, like you would want to use normal desktop workstations to run distribute the code. I wouldn’t use MPI for that. MPI was rather developed for clusters and supercomputers where the network supports high bandwidth and low latency. Those aren’t the properties of a traditional office network (unless I understood something wrong).
The next thing you have to deal with is the fact that users shouldn’t turn off their machines if computations are performed on them. No grid computing platform (including MPI) deals with these kind of issues, as it is usually running on server hardware which has little failures and are running 24/7.
I don’t think there is a simple and inexpensive solution to this. You could have a service running on each machine which could execute code from DLLs with predefined parameters and send responses. Those assemblies could be downloadable from some windowsshare. But you want to have really huge peaces of work to be distributed like this. You wouldn’t get almost any improvements if the application runs only for a minute or less.
In the end you’d need also a service to find those services which are online or not, some kind of in memory DB where every service could write the IP address and that it’s online so that the clients would know to whom they can distribute the work. This could be done using RavenDB (as you said you are working with .Net), Redis or an application which was actually written for these kind of problems, Zookeeper.

VM cloud based OS possible?

"A process virtual machine (also, language virtual machine) is
designed to run a single program, which means that it supports a
single process. Such virtual machines are usually closely suited to
one or more programming languages and built with the purpose of
providing program portability and flexibility (amongst other things).
An essential characteristic of a virtual machine is that the software
running inside is limited to the resources and abstractions provided
by the virtual machine—it cannot break out of its virtual
environment. quote from the Wikipedia Article"
I've been studying usage of virtual machines, especially with their importance in Cloud Computing, and I want to know if it would be possible to develop a VM based operating system that could be scaled dynamically to use the processing power of connected servers? Use it's own local hardware for fast processing, but also boost it's performance by sending processes that don't need to return immediate responses to a cloud service.
Is this possible, or is that concept flawed?
Basically, the OS scaling with the connected cloud servers. The processes that are ok to send to the cloud servers for a latent response would be up to the developers of each program.
At first, I could see this being effective only for corporations in need of cost-effective massive computation. But as internet speeds increase, even front-end interface animation calculations might be possible, having less local hardware, relying more heavily on cloud services.
If it's possible, it would allow many scientific simulations that would otherwise need super computer time to be possible from any location in the world, at no more cost than exactly what processing is done at a specific speed. And would lead eventually to consumer devices being extremely small, 'scaleably' powerful, and very cheap, allowing people to pay for processing the same way they pay for internet service today.
Is this possible, or is that concept flawed?
Both. ;)
What you are talking about seems like what used to be called "Grid Computing". (Sun even sold it in the early 90's.) The concept was that you put a magic library on all your boxes, and your app will be able to scale out with no further work by the programmer.
This is useful -- but only if your problem is "embarrassingly parallel" (i.e. lots of independent calculations that don't affect each other.).
MPI is one such popular way to do this: http://www.linux-mag.com/id/5759/
Unfortunately, most of the time people have problems that are more lumpy (grab a bunch of data from database, do some calculations, generate PDF.) In these cases, it's simpler to figure out a good strategy and manually code it up, than try to use a magic library which can be hard to debug, and even harder to work around performance problems. I know a lot of people using AWS, and none of them use a 'magic grid library' like you are talking about. They communicate between servers using simple protocols like Queues or HTTP interfaces.
That's not because your idea won't work. It's just that their needs can be satisfied by something much simpler to debug/run/tune.
Another neat idea in the same vein: http://www.gocircuit.org/

what operating system concepts should every programmer be aware of? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am compiling various lists of competencies that self taught programmers must have.
Among all subjects, Operating Systems is the trickiest one, because creating even a toy operating system is a rather non-trivial task. However, at the same time an application developer (who may not have formally learned CS) must at least be aware of and hopefully should have implemented some key concepts to appreciate how an OS works, and to be a better developer.
I have a few specific questions:
What key concepts of operating systems are important for a self taught programmer to understand so they can be better software developers (albeit working on regular application development)?
Is it even remotely possible to learn such a subject in byte sized practical pieces ? (Even a subject like compiler construction can be learned in a hands on way, at a rather low level of complexity)
I would suggest reading Andrew S. Tanenbaum ( http://en.wikipedia.org/wiki/Andrew_S._Tanenbaum ) book on Modern Operating Systems (ISBN 978-0-13-600663-3) as everything is there.
However from the book index we can identify the minimum key topics:
Processes
Memory management
File systems
Input/output
And the easiest way to start playing with this topics will be to download MINIX:
http://www.minix3.org/
and study the code. Older versions of this operating system might be easier to understand.
Another useful resource is Mike Saunders How to write a simple operating system that shows you how to write and build your first operating system in x86 assembly language:
http://mikeos.sourceforge.net/write-your-own-os.html
Every OS designer must understand the concepts behind Multics. One of the most brilliant ideas is the notion of of a vast virtual memory partioned into directly readable and writable segments with full protections, and multiprocessor support to boot; with 64 bit pointers, we have enough bits to address everything on the planet directly. These ideas are from the 1960s yet timeless IMHO.
The apparent loss of such knowledge got us "Eunuchs" now instantiated as Unix then Linux and an equally poor design from Microsoft, both of which organize the world as a flat process space and files. Those who don't know history are doomed to doing something dumber.
Do anything you can to get a copy of Organick's book on Multics, and read it, cover to cover. (Elliott I. Organick, The Multics System: An Examination of Its Structure).
The wikipedia site has some good information; Corbato's papers are great.
I believe it depends on the type of application you are developing and the OS platform you are developing for. For example if you are developing a website you don't need to know too much about the OS. In this example you need to know more about your webserver. There are different things you need to know when you are working on Windows, Linux or Android or some embedded system or sometimes you need to know nothing beyond what your API provides. In general it is always good for a developer or CS person to know following.
What lies in the responsibility of application, toolchain and then OS.
Inter process communication and different IPC mechanism the OS system calls provides.
OS is quite an interesting subject but mostly consist of theory but this theory comes to action when you working on embedded systems. On average for desktop applications you don't see where all that theory fits in.
Ok, operating system concepts that a good programmer should be aware of.
practically speaking. Unless you are concerned about performance. If you are writing in a cross os language. None.
If you care about performance.
The cost of user/system transitions
How the os handles locking/threads/deadlocks and how to best use them.
Virtual Memory/Paging/thrashing and the cost thereof.
Memory allocation, how the os does it, and how you should take advantage of that to when A, use the OS allocator ( see 1) and when to allocate from the os and sub allocate.
As earlier put, process creation/ and inter process communication.
How the os writes/reads to disk by default to read/write optimally ( see why databases use B-trees)
Bonus, sub-os, what cache size and cache lines can mean to you in terms of performance.
but generally it would boil down to what does the OS provide you that isn't generic, and what and why does it cost, and what will cost too much ( too much cpu, too much disk usage, too much io, too much network ect).
Well that depends on the need of the developer like:-
Point.
Applications such as web browsers and email tools are
performing an increasingly important role inmodern desktop computer
systems. To fulfill this role, they should be incorporated as part of the
operating system. By doing so, they can provide better performance
and better integration with the rest of the system. In addition, these
important applications can have the same look-and-feel as the operating
system software.
Counterpoint.
The fundamental role of the operating system is to manage
system resources such as the CPU, memory, I/O devices, etc. In addition,
it’s role is to run software applications such as web browsers and
email applications. By incorporating such applications into the operating
system, we burden the operating system with additional functionality.
Such a burdenmay result in the operating system performing a less-thansatisfactory
job at managing system resources. In addition, we increase
the size of the operating system thereby increasing the likelihood of
system crashes and security violations.
Also there are many other important points which one must understand to get a better grip of Operating System like Multithreading, Multitasking, Virtual Memory, Demand Paging, Memory Management, Processor Management, and more.
I would start with What Every Programmer Should Know About Memory. (Not completely OS, but all of it is useful information. And chapter 4 covers virtual memory, which is the first thing that came to mind reading your question.)
To learn the rest piecemeal, pick any system call and learn exactly what it does. This will often mean learning about the kernel objects it manipulates.
Of course, the details will differ from OS to OS... But so does the answer to your question.
Simply put:
Threads and Processes.
Kernel space/threads vs user space/threads (probably some kernel level programming)
Followed by the very fundamental concepts of process deadlocks.
And thereafter monitors vs semaphores vs mutex
How Memory works and talks to process and devices.
Every self-taught programmer and computer scientist alike should know the OSI model and know it well. It helps to identify where a problem could lie and who to contact if there are problems. The scope is defined here and many issues could be filtered out here.
This is because there is just too much in an operating system to simply learn it all. As a web developer I usually work in the application level when an issue ever goes out of this scope I know when i need help. Also many people simply do not care about certain components they want to create thing as quickly as possible. The OSI model is a place where someone can find their computer hot spot.
http://en.wikipedia.org/wiki/OSI_model

The state of programming and compiling for multicore systems

I'm doing some research on multicore processors; specifically I'm looking at writing code for multicore processors and also compiling code for multicore processors.
I'm curious about the major problems in this field that would currently prevent a widespread adoption of programming techniques and practices to fully leverage the power of multicore architectures.
I am aware of the following efforts (some of these don't seem directly related to multicore architectures, but seem to have more to do with parallel-programming models, multi-threading, and concurrency):
Erlang (I know that Erlang includes constructs to facilitate concurrency, but I am not sure how exactly it is being leveraged for multicore architectures)
OpenMP (seems mostly related to multiprocessing and leveraging the power of clusters)
Unified Parallel C
Cilk
Intel Threading Blocks (this seems to be directly related to multicore systems; makes sense as it comes from Intel. In addition to defining certain programming-constructs, it also seems have features that tell the compiler to optimize the code for multicore architectures)
In general, from what little experience I have with multithreaded programming, I know that programming with concurrency and parallelism in mind is definitely a difficult concept. I am also aware that multithreaded programming and multicore programming are two different things. in multithreaded programming you are ensuring that the CPU does not remain idle (on a single-CPU system. As James pointed out the OS can schedule different threads to run on different cores -- but I'm more interested in describing the parallel operations from the language itself, or via the compiler). As far as I know you cannot truly do parallel operations. In multicore systems, you should be able to perform truly-parallel operations.
So it seems to me that currently the problems facing multicore programming are:
Multicore programming is a difficult concept that requires significant skill
There are no native constructs in today's programming languages that provide a good abstraction to program for a multicore environment
Other than Intel's TBB library I haven't found efforts in other programming-languages to leverage the power of multicore architectures for compilation (for example, I don't know if the Java or C# compiler optimizes the bytecode for multicore systems or even if the JIT compiler does that)
I'm interested in knowing what other problems there might be, and if there are any solutions in the works to address these problems. Links to research papers (and things of that nature) would be helpful. Thanks!
EDIT
If I had to condense my question down to one sentence, it would be this: What are the problems that face multicore programming today and what research is going on in the field to solve these problems?
UPDATE
It also seems to me that there are three levels where multicore needs to be concerned:
Language level: Constructs/concepts/frameworks that abstract parallelization and concurrency and make it easy for programmers to express the same
Compiler level: If the compiler is aware of what architecture it is compiling for, it can optimize the compiled code for that architecture.
OS level: The OS optimizes the running process and perhaps schedules different threads/processes to run on different cores.
I've searched on ACM and IEEE and have found a few papers. Most of them talk about how difficult it is to think concurrently and also how current languages don't have a proper way to express concurrency. Some have gone so far as to claim that the current model of concurrency that we have (threads) is not a good way to handle concurrency (even on multiple cores). I'm interested in hearing other views.
I'm curious about the major problems in this field that would currently prevent a widespread adoption of programming techniques and practices to fully leverage the power of multicore architectures.
Inertia. (BTW: that's pretty much the answer to all "what does prevent the widespread adoption" questions, whether that be models of parallel programming, garbage collection, type safety or fuel-efficient automobiles.)
We have known since the 1960s that the threads+locks model is fundamentally broken. By ~1980, we had about a dozen better models. And yet, the vast majority of languages that are in use today (including languages that were newly created from scratch long after 1980), offer only threads+locks.
The major problems with multicore programming is the same as writing any other concurrent applications, but whereas before it was uncommon to have multiple cpus in a computer, now it is hard to find any modern computer with only one core in it, so, to take advantage of multicore, multiple cpu architectures there are new challenges.
But, this problem is an old problem, whenever computer architectures go beyond compilers then it seems the fallback solution is to move back toward functional programming, as that programming paradigm, if strictly followed, can make very parallelizable programs, as you don't have any global mutable variables, for example.
But, not all problems can be done easily using FP, so the goal then is how to easily get other programming paradigms to be easy to use on multicores.
The first thing is that many programmers have avoided writing good mulithreaded applications, so there isn't a strongly prepared number of developers, as they learned habits that will make their coding harder to do.
But, as with most changes to the cpu, you can look at how to change the compiler, and for that you can look at Scala, Haskell, Erlang and F#.
For libraries you can look at the parallel framework extension, by MS as a way to make it easier to do concurrent programming.
It is at work, but I recently either IEEE Spectrum or IEEE Computer had articles on multicore programming issues, so look at what IEEE and ACM articles have been written on these issues, to get more ideas as to what is being looked at.
I think the biggest impediment will be the difficulty to get programmers to change their language as FP is very different than OOP.
One place for research besides developing languages that will work well this way, is how to handle multiple threads accessing memory, but, as with much in this area, Haskell seems to be at the forefront in testing ideas for this, so you can look at what is going on with Haskell.
Ultimately there will be new languages, and it may be that we have DSLs to help abstract the developer more, but how to educate programmers on this will be a challenge.
UPDATE:
You may find Chapter 24. Concurrent and multicore programming of interest, http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html
One of the answers mentioned the Parallel Extensions for the .NET Framework and since you mentioned C#, it's definitely something I would investigate. Microsoft has done something interesting things there, though I have to think many of their efforts seem more suited for language enhancements in C# than a separate and distinct library for concurrent programming. But I think their efforts are worth applauding and respect that we're early here. (Disclaimer: I used to be the marketing director for Visual Studio about 3 years ago)
The Intel Thread Building Blocks are also quite interesting (Intel recently released a new version, and I'm excited to head down to Intel Developer Forum next week to learn more about how to use it properly).
Lastly, I work for Corensic, a software quality startup in Seattle. We've got a tool called Jinx that is designed to detect concurrency errors in your code. A 30-day trial edition is available for Windows and Linux, so you might want to check it out. (www.corensic.com)
In a nutshell, Jinx is a very thin hypervisor that, when activated, slips in between the processor and operating system. Jinx then intelligently takes slices of execution and runs simulations of various thread timings to look for bugs. When we find a particular thread timing that will cause a bug to happen, we make that timing "reality" on your machine (e.g., if you're using Visual Studio, the debugger will stop at that point). We then point out the area in your code where the bug was caused. There are no false positives with Jinx. When it detects a bug, it's definitely a bug.
Jinx works on Linux and Windows, and in both native and managed code. It is language and application platform agnostic and can work with all your existing tools.
If you check it out, please send us feedback on what works and doesn't work. We've been running Jinx on some big open source projects and already are seeing situations where Jinx can find bugs 50-100 times faster than simply stress testing code.
The bottleneck of any high-performance application (written in C or C++) designed to make efficient use of more than one processor/core is the memory system (caches and RAM). A single core usually saturates the memory system with its reads and writes so it is easy to see why adding extra cores and threads causes an application to run slower. If a queue of people can pass through a door one a time, adding extra queues will not only clog the door but also make the passage of any one individual through the door less efficient.
The key to any multi-core application is optimization of and economizing on memory accesses. This means structuring data and code to work as much as possible inside their own caches where they don't disturb the other cores with acceses to the common cache (L3) or RAM. Once in a while a core needs to venture there but the trick is to reduce those situations as much as possible. In particular, data needs to be structured around and adapted to cache lines and their sizes (currently 64 bytes) and code needs to be compact and not call and jump all over the place which also disrupts pipelines.
My experience is that efficient solutions are unique to the application in question. The generic guidelines (above) are a basis on which to construct code but the tweak changes resulting from profiling conclusions will not be obvious to those who were not themselves involved in the optimizing work.
Look up fork/join frameworks and work-stealing runtimes. Two names for the same, or at least related, approaches, which is to recursively subdivide large tasks into lightweight units, such that all available parallelism is exploited, without having to know in advance how much parallelism there is. The idea is that it should run at serial speed on a uniprocessor, but get a linear speedup with multiple cores.
Sort of a horizontal analogue of cache-oblivious algorithms if you look at it right.
But i'd say the main problem facing multicore programming is that the great majority of computations remain stubbornly serial. There's just no way to throw multiple cores at those computations and make them stick.