Matlab "Out of Memory" error when running "Stacked Auto-encoder example" program (deep learning toolbox) using MNIST dataset - matlab

i'm new to deep learning and i was using matlab's deep learning toolbox.
i wanted to run : "test_example_SAE.m" which builds a stacked auto-encoder and trains and tests it using MNIST dataset, but i couldn't because of this error :
*
Error using horzcat
Out of memory. Type HELP MEMORY for your options.
*
how much memory does this job want ? i mean can i run deep learning toolbox codes on an average PC with 4GB RAM ? or should i learn to run the code on GPU ?

It happened to me as well. If you can decrease number of samples.
Also if you are running 'for' loops, try change it to vectors.
One more hint is to divide your data in pieces(if it's possible) and try to run piece-by-piece.
Those are just my suggestions. No difference if you'll run it on GPU.
You can also try use cluster computing if it is available at your job or university.

Related

Out of memory error with Matlab's NN toolbox using GPU

I'm receiving an out of memory error when training using GPU using Matlab's NN toolbox, and it appears that subdivision is not useful.
I have tried:
net2 = train(net1,x,t,'reduction',N);
using various reduction values, however I am unsure if it works for GPU side? It may not. Would using the GPU matrix setup directly as matlab talks about be the way to go or other there other options?

One-vs-One Classification and Out of Memory Error in MATLAB

I'm trying to classify 5 kinds of data, each has about 10,000 samples.
Using one-vs-one and voting method, I have to run classification for 5(5-1)/2=10 times,
which I have written a loop for that. But I get the massage,
Error using svmclassify (line 114)
An error was encountered during classification.
Out of memory. Type HELP MEMORY for your options.
What should I do ?
How can I rewrite the code ?
Compress data to reduce memory fragmentation.
If possible, break large matrices into several smaller matrices so that less memory is used at any one time.
If possible, reduce the size of your data.
Add more memory to the system.
With reference: http://www.mathworks.com/help/matlab/matlab_prog/resolving-out-of-memory-errors.html

Simulink running on multiple cores (or not) for a given model

I have a huge simulink model that takes about ~1 hour to execute. In my computer (HP Z210), the execution makes the computer use all cpu cores at 100%. What is intriguing me is that the same model running on my colleague computer (Dell precision T3600) uses ~50% of the cpu "power" (some cores at 100% and some cores remain idle).
My questions is:
I always thought Simulink runs in a single core. I´m not using parrallel computer toolbox or any other toolbox. I´m only using Matlab and Simulink licenses.
Why the execution in my computer is different from my colleague? Does it have anything to do with hyper-threading?
While simulink itself is single threaded in executing a single model, a block itself might be multi threaded. For example a simulink block executing matrix multiplication will use the multi threaded implementation which is also used by matlab.
Simnulink is definitely a single-threaded application. The exception to this is if you are using Rapid Accelerator mode and have multiple cores available, then the standalone executable runs on a separate core. See How Acceleration Modes Work for more details.
If you are running multiple simulations, then you can distribute these across multiple cores with the Parallel Computing Toolbox, or even multiple workers (machines) with the MATLAB Distributed Computing Server. However, this is for multiple simulations of a model (e.g. a Monte-Carlo simulation), not for breaking uo a large model in several chunks (currently not possible as far as I know). see Run Parallel Simulations for more details.
Not sure why the execution would be different from one machine to the other. Are you both using the same release of MATLAB? Same O/S? There are so many things that could be different. With regards to speeding up the execution of model, you could try running the model in accelerated mode, using the Simulink profiler to see where the bottlenecks are, changing some of the solver settings (e.g. variable-step vs fixed-step), etc...
If your model can be built with Simulink Coder, you can use xMOD software (www.xmodsoftware.com) to execute your model in multi-core (subsystem per thread basis, with a dedicated solver and step-size for each subsystem).

svmtrain function in matlab never exits ... do alternatives exist?

I am trying to learn how to use support vector machines in matlab. I have the bioinformatics toolbox, which has SVM functions svmtrain and svmclassify.
I managed to successfully use it for some reference data sets, with some nice accuracy. When I try to use the svm on my actual data the training never stops. My data set is 400 instances in 25 dimensions, so it should not take very long?!
Can I use other solvers in matlab? I dont want to buy new toolbox please ...
There are several things that may cause problems for training, but it should not run infinitely. Do you get any errors when using the solver?
With regard to alternatives: LIBSVM has an interface to matlab. This is a state-of-the-art library with thousands of users. I highly recommend it, because it is easy to install/use and offers additional functionality for parameter tuning and more.

Matlab and GPU/CUDA programming

I need to run several independent analyses on the same data set.
Specifically, I need to run bunches of 100 glm (generalized linear models) analyses and was thinking to take advantage of my video card (GTX580).
As I have access to Matlab and the Parallel Computing Toolbox (and I'm not good with C++), I decided to give it a try.
I understand that a single GLM is not ideal for parallel computing, but as I need to run 100-200 in parallel, I thought that using parfor could be a solution.
My problem is that it is not clear to me which approach I should follow. I wrote a gpuArray version of the matlab function glmfit, but using parfor doesn't have any advantage over a standard "for" loop.
Has this anything to do with the matlabpool setting? It is not even clear to me how to set this to "see" the GPU card. By default, it is set to the number of cores in the CPU (4 in my case), if I'm not wrong.
Am I completely wrong on the approach?
Any suggestion would be highly appreciated.
Edit
Thanks. I'm aware of GPUmat and Jacket, and I could start writing in C without too much effort, but I'm testing the GPU computing possibilities for a department where everybody uses Matlab or R. The final goal would be a cluster based on C2050 and the Matlab Distribution Server (or at least this was the first project).
Reading the ADs from Mathworks I was under the impression that parallel computing was possible even without C skills. It is impossible to ask the researchers in my department to learn C, so I'm guessing that GPUmat and Jacket are the better solutions, even if the limitations are quite big and the support to several commonly used routines like glm is non-existent.
How can they be interfaced with a cluster? Do they work with some job distribution system?
I would recommend you try either GPUMat (free) or AccelerEyes Jacket (buy, but has free trial) rather than the Parallel Computing Toolbox. The toolbox doesn't have as much functionality.
To get the most performance, you may want to learn some C (no need for C++) and code in raw CUDA yourself. Many of these high level tools may not be smart enough about how they manage memory transfers (you could lose all your computational benefits from needlessly shuffling data across the PCI-E bus).
Parfor will help you for utilizing multiple GPUs, but not a single GPU. The thing is that a single GPU can do only one thing at a time, so parfor on a single GPU or for on a single GPU will achieve the exact same effect (as you are seeing).
Jacket tends to be more efficient as it can combine multiple operations and run them more efficiently and has more features, but most departments already have parallel computing toolbox and not jacket so that can be an issue. You can try the demo to check.
No experience with gpumat.
The parallel computing toolbox is getting better, what you need is some large matrix operations. GPUs are good at doing the same thing multiple times, so you need to either combine your code somehow into one operation or make each operation big enough. We are talking a need for ~10000 things in parallel at least, although it's not a set of 1e4 matrices but rather a large matrix with at least 1e4 elements.
I do find that with the parallel computing toolbox you still need quite a bit of inline CUDA code to be effective (it's still pretty limited). It does better allow you to inline kernels and transform matlab code into kernels though, something that