Netlogo High performance Computing

Netlogo High performance Computing - netlogo

Are there any high performance computing facilites available for running NetLogo behavior space like R servers.
Thanks.

You can use headless mode to run batches of experiments on a cluster/cloud computing platform. This involves simply running an executable so should be compatible with most setups. If you don't have access to a cluster through an institution, I know people use AWS and Google compute. You probably want an instance with many cores, since that allows a single instance of BehaviorSpace to automatically distribute the runs involved in an experiment across multiple processes. Higher processing power of course helps too. You shouldn't need much memory. The n1-highcpu-16 or n1-standard-16 instance types in Google compute looks pretty ideal to me.

Related

Define the minimal configuration for a program

So I have my .exe ready to deploy, and for distribution, I need to know the minimal requirements for my program to run on a machine... and I really don't know how to do that.
Is there a way to know that ? Some kind of benchmark ? Or must I just set things as I think it'll work ?
Maybe should I just buy all existing components until I find the minimal ? :')
Well, thanks for your answers.

Start by seeing the first Windows' version you can deploy on (Windows XP? Vista?).
If your program is cpu or gpu intensive, and has a fixed time loop (eg. game) then you'll have to do benchmarks.
You should look at several old vs new CPUs/GPUs and trying to "guess" based on online specs posted online what the minimal requirement is. For example, if your program can't run on an old cpu, but runs blazingly fast on a new one, try to find the model that -barely- runs it, which will obviously be one somewhere in the middle.
If your program requires other special things, specify them (eg. USB 3.0, controllers supported...).
Otherwise, if your program loads slower but doesn't have runtime issues, the minimum specs should be indicative of a reasonable loading time (a minute seems to be the standard now, sadly).
Additionally, if your program is memory hungry (both hard drive or RAM), you must indicate this.
For hard drive memory, simply state your program size, along with the files included with it.
For RAM, use a profiler - it will tell you how much memory your program is using.
I've completely skipped over the fact that, in some computers, the bottleneck might be the cpu, and it might be the gpu in others. You need to know which is the bottleneck to make your judgement.
To find out is a rather simple process - remove expensive gpu operations (lower texture resolutions, turn off shaders). If the program still runs slowly, then the bottleneck is the cpu.
edit: this is a simplification of the problem and hardware is a little more complicated than this (slower multi-core cpus vs faster one-core cpus vary their performance depending on how many cores a program uses and how / a program may require a gpu to have less memory but more processing power, or the opposite... even heat dissipation can affect component efficiency: your program might run fine for 20 minutes but start to slow down if the cpu isn't cooled down properly), but "minimal hardware requirements" aren't exactly precise so this method is appropriate.
tl;dr of the spoiler:
In short, there are so many factors that affect performance that you can't measure it, so just a rough estimation is good.

Difference between MATLAB parallel computing terminologies

I want to know the differences between
1. labs
2. workers
3. cores
4. processes
Is it just the semantics or they are all different?

labs and workers are MathWorks terminologies, and they mean roughly the same thing.
A lab or a worker is essentially an instance of MATLAB (without a front-end). You run several of them, and you can run them either on your own machine (requires only Parallel Computing Toolbox) or remotely on a cluster (requires Distributed Computing Server). When you execute parallel code (such as a parfor loop, an spmd block, or a parfeval command), the code is executed in parallel by the workers, rather than by your main MATLAB.
Parallel Computing Toolbox has changed and developed its functionality quite a lot over recent releases, and has also changed and developed the terminologies it uses to describe the way it works. At some point it was convenient to refer to them as labs when running an spmd block, but workers when running a parfor loop, or working on jobs and tasks. I believe they are moving now toward always calling them workers (although there's a legacy in the commands labSend, labReceive, labBroadcast, labindex and numlabs).
cores and processes are different, and are not themselves anything to do with MATLAB.
A core is a physical part of your processor - you might have a dual-core or quad-core processor in your desktop computer, or you might have access to a really big computer with many more than that. By having multiple cores, your processor can do multiple things at once.
A process is (roughly) a program that your operating system is running. Although the OS runs multiple programs simultaneously, it typically does this by interleaving operations from each process. But if you have access to a multiple-core machine, those operations can be done in parallel.
So you would typically want to tell MATLAB to start one worker for each of the cores you have on your machine. Each of those workers will be run as a process by the OS, and will end up being run one worker per core in parallel.
The above is quite simplified, but I hope gives a roughly accurate picture.
Edit: moved description of threads from a comment to the answer.
Threads are something different again. Threads are also not in themselves anything to do with MATLAB.
Let's go back to processes for a moment. One thing I didn't mention above is that the OS allocates each process a specific block of memory which other processes shouldn't be able to touch, so that it's difficult for them to interact with each other and mess things up.
A thread is like a process within a process - it's a stream of operations that the process runs. Typically, operations from each thread would be interleaved, but if you have multiple cores, they can also be parallelized across the cores.
However, unlike processes, they all share a memory block, which is OK because they're all managed by the same program so it should matter less if they're allowed to interact.
Regular MATLAB automatically uses multiple threads to parallelize many built-in operations (such as matrix multiplication, svd, eig, linear algebra etc) - that's without you doing anything, and whether or not you have Parallel Computing Toolbox.
However, MATLAB workers are each run as a single process with a single thread, so you have full control over how to parallelize.

I think workers are synonyms for processes. The term "cores" is related to the hardware. Labs is a mechanism which allows workers to communicate with each other. Each worker has at least one lab but can own more.
This piece of a discussion may be useful
http://www.mathworks.com/matlabcentral/answers/5529-mysterious-behavior-in-parfor-i-know-sounds-basic-but
I hope someone here will deliver more information in a more rigorous way

Is it possible to build a Large Memory System for matlab?

I am suffering from the out of memory problem in mablab.
Is is possible to build a large memory system for matlab(e.g. 64GB ram)?
If yes, what do I need?

#Itamar gives good advice about how MATLAB requires contiguous memory to store arrays, and about good practices in memory management such as chunking your data. In particular, the technical note on memory management that he links to is a great resource. However much memory your machine has, these are always sensible things to do.
Nevertheless, there are many applications of MATLAB that will never be solved by these tips, as the datasets are just too large; and it is also clearly true that having a machine with much more RAM can address these issues.
(By the way, it's also sometimes the case that it's cheaper to just buy a new machine with more RAM than it is to pay the MATLAB developer to make all the memory optimizations they could - but that's for you to decide).
It's not difficult to access large amounts of memory with MATLAB. If you have a Windows or Linux machine with 64GB (or more) - it will obviously need to be running a 64-bit OS - MATLAB will be able to access it. I've come across plenty of MATLAB users who are doing this. If you know what you're doing you can build your own machine, or nowadays you can just buy a machine that size of the shelf from Dell.
Another option (depending on your application) would be to look into getting a small cluster, and using Parallel Computing Toolbox together with MATLAB Distributed Computing Server.

When you try to allocate an array in Matlab, Matlab must have enough contiguous memory the size of the array, and if not enough contiguous memory is available, you will get out of memory error, no matter how much RAM you have on your computer.
From my experience, the solution will not come from dealing directly with memory-related properties of your hardware, but from writing your code in a way that prevents allocation of too large arrays (cutting data to chunks, etc.). If you can describe your code and the task you try to solve, it might be possible to guide you in that direction.
You can read more here:http://www.mathworks.com/support/tech-notes/1100/1106.html

How to utilise parallel processing in Matlab

I am working on a time series based calculation. Each iteration of the calculation is independent. Could anyone share some tips / online primers on using utilising parallel processing in Matlab? How can this be specified inside the actual code?

Since you have access to the Parallel toolbox, I suggest that you first check whether you can do it the easy way.
Basically, instead of writing
for i=1:lots
out(:,i)=do(something);
end
You write
parfor i=1:lots
out(:,i)=do(something);
end
Then, you use matlabpool to create a number of workers (you can have a maximum of 8 on your local machine with the toolbox, and tons on a remote cluster if you also have a Distributed Computing Server license), and you run the code, and see nice speed gains when your iterations are run by 8 cores instead of one.
Even though the parfor route is the easiest, it may not work right out of the box, since you might do your indexing wrong, or you may be referencing an array in a problematic way etc. Look at the mlint warnings in the editor, read the documentation, and rely on good old trial and error, and you should figure it out reasonably fast. If you have nested loops, it's often best parallelize only the innermost one and ensure it does tons of iterations - this is not only good design, it also reduces the amount of code that could give you trouble.
Note that especially if you run the code on a local machine, you may run into memory issues (which might manifest in really slow execution in parallel mode because you're paging): Every worker gets a copy of the workspace, so if your calculation involves creating a 500MB array, 8 workers will need a total 4GB of RAM - and then you haven't even started counting the RAM of the parent process! In addition, it can be good to only use N-1 cores on your machine, so that there is still one core left for other processes that may run on the computer (such as a mandatory antivirus...).

Mathworks offers its own parallel computing toolbox. If you do not want to purchase that, there a few options
You could write your own mex file and use pthreads or OpenMP.
However make sure you do not call any Mex api in the parallel part of the code, because they arent thread safe
If you want coarser grained parallelism via MPI you can try pmatlab
Same with parmatlab
Edit: Adding link Parallel MATLAB with openmp mex files
I have only tried the first.

Don't forget that many Matlab functions are already multithreaded. By careful programming you may be able to take advantage of them -- check the documentation for your version as the Mathworks seem to be increasing the range and number of multithreaded functions with each new release. For example, it seems that 2010a has multithreaded ffts which may be useful for time series processing.
If the intrinsic multithreading is not what you need, then as #srean suggests, the Parallel Computing Toolbox is available. For my money (or rather, my employers' money) it's the way to go, allowing you to program in parallel in Matlab, rather than having to bolt things on. I have to admit, too, that I'm quite impressed by the toolbox and the facilities it offers.

Parallel programming on a Quad-Core and a VM?

I'm thinking of slowly picking up Parallel Programming. I've seen people use clusters with OpenMPI installed to learn this stuff. I do not have access to a cluster but have a Quad-Core machine. Will I be able to experience any benefit here? Also, if I'm running linux inside a Virtual machine, does it make sense in using OpenMPI inside a VM?

If your target is to learn, you don't need a cluster at all. Your quad-core (or any dual-core or even a single-cored) computer will be more than enough. The main point is to learn how to think "in parallel" and how to design your application.
Some important points are to:
Exploit different parallelism paradigms like divide-and-conquer, master-worker, SPMD, ... depending on data and tasks dependencies of what you want to do.
Chose different data division granularities to check the computation/communication ratio (in case of message passing), or to check the amount of serial execution because of mutual exclusion to memory regions.
Having a quad-core you can measure your approach speedup (the gain on performance attained because of the parallelization) which is normally given by the division between the time of the non parallelized execution and the time of the parallel execution.
The closer you get to 4 (four cores meaning 1/4th the execution time), the better your parallelization strategy was (once you could evenly distribute work and data).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse