SOM out of memory in MATLAB - matlab

I am trying to use SOM to learn 80000X10 samples (each sample is a vector of size 10). But I can't even configure 8x8 net with 10000X1 samples. It throws "out of memory" error.
Here is my code (data is 80000X10 matrix):
net=selforgmap([8 8])
net=configure(net,data(1:10000,1))
Matlab help: "Unconfigured networks are automatically configured and initialized the first time train is called."
Even for 8000X1 dataset, it takes a lot of time. I noticed a huge numWeightElements: 512000 in net variable (8*8*8000=512000). The weights should be 8*8. SOM training algorithm shouldn't use this much memory. What is wrong?
The output of memory command:
>> memory
Maximum possible array: 3014 MB (3.160e+009 bytes)
Memory available for all arrays: 3014 MB (3.160e+009 bytes)
Memory used by MATLAB: 1154 MB (1.210e+009 bytes)
Physical Memory (RAM): 4040 MB (4.236e+009 bytes)

I think your configuring wrong the input structure. Each input vector must be a column and not a row. Quote from this "Clustering Data - MATLAB & Simulink"
To define a clustering problem, simply arrange Q input vectors to be
clustered as columns in an input matrix (see "Data Structures"
for a detailed description of data formatting for static and time
series data). For instance, you might want to cluster this set of 10
two-element vectors:
inputs = [7 0 6 2 6 5 6 1 0 1; 6 2 5 0 7 5 5 1 2 2]
As you can see each input vector is a column. You have 10 two element input vectors as a 2x10 array.

Related

efficiently use the memory of GPU in matlab

I am using GPU for computation in matlab. And I keep on getting Out of memory problem.
So I think I could convert some of my variables from double, which is the default type of matlab, to single. Then I did the following experiment
A = gpuArray([1,2,3])
A =
1 2 3
whos A
Name Size Bytes Class
A 1*3 4 gpuArray
B = gpuArray(single([1,2,3]))
B =
1*3 gpuArray single row vector
1 2 3
whos B
Name Size Bytes Class
B 1*3 4 gpuArray
Now I am a little bit confusing. On one hand, it does show me that B is a 1*3 gpuArray single row vector. However, on the other hand, the whos command shows no difference between A and B.
I am wondering if this double to single conversion will indeed help me reduce the memory usage of my GPU in matlab. Basically, my question is: when I move 2 variables on cpu, one is double and the other is single, to gpu, do they consume same amount of memory of GPU in matlab? whos command shows no difference.
Note the following:
A = gpuArray([1:1000])
whos A
Name Size Bytes Class Attributes
A 1x1000 4 gpuArray
Interesting! Only 4 bytes!
But this has an easy explanation: whos is only giving you the size of the variable on CPU RAM. Its 4 bytes because its just a memory address, not the data itself. The data is on the GPU, and it can not "easily" be accessed by the CPU.
Answering your question: Yes, single will take half of the memory of double on the GPU.

Interpreting time series dimension?

I am wondering if anyone can explain the interpretation of the size (number of feature) in a time series? For example consider a simple script in Matlab
X= randn(2,5,2)
X(:,:,1) =
-0.5530 0.4291 0.3937 -1.2534 0.2811
-1.4926 -0.7019 -0.8305 -1.4034 1.9545
X(:,:,2) =
0.2004 0.1438 2.3655 -0.1589 0.7140
0.4905 0.2301 -0.7813 -0.6737 0.2552
Assume X is a time series with the following output
This generates 2 vectors of length 5 each has 2 rows. Can anyone tell me what is exactly the meaning of first 2 and 5?
In some websites it says a creating 5 vectors of length 5 and size 2. What does size mean here?
Is 2 like number of features and 5 is like number of time series. The reason for this confusion is because I do not understand how to interpret following sentence:
"Generate 2 vector-valued sequences of length 5; each vector has size
2."
What do size 2 and length 5 mean here?
This entirely depends on your data, and how you want to store this. If you have some 2D data over time, I find it convenient to have a data matrix with in the 1st and 2nd dimension the 2D data per time step, and in the 3rd dimension time.
Say I have a movie of 1920 by 1080 pixels with 100 frames, I'd store this as mov = rand(1080,1920,100) (1080 and 1920 swapped because of row, col order of indexing). So now mov(:,:,1) would give me the first frame etc.
BTW, your X is a normal array, not to be confused with the timeseries object.

Memory error in Matlab while solving a linear equation

I am having Out of Memory error while trying to solve a certain linear equation (I will put the code below). Since I am used to coding in C where you have every control over the objects you create I am wondering if I am using matlab inefficiently. Here is the relevant part of the code
myData(n).AMatrix = sparse(fscanf(fid2, '%f', [2*M, 2*M]));
myData(n).AMatrix = transpose(myData(n).AMatrix);
%Read the covariance^2 matrix
myData(n).CovMatrix = sparse(fscanf(fid2, '%f', [2*M,2*M]));
myData(n).CovMatrix = reshape(myData(n).CovMatrix, [4*M*M,1]);
%Kronecker sum of A with itself
I=sparse(eye(2*M));
myData(n).AA=kron( I, myData(n).AMatrix)+kron( myData(n).AMatrix,I);
myData(n).AMatrix=[];
I=[];
%Solve (A+A)x = Vec(CovMatrix)
x=myData(n).CovMatrix\myData(n).AA;
Trying to use this code I get the error
Error using \
Out of memory. Type HELP MEMORY for your options.
Error in COV (line 62)
x=myData(n).CovMatrix\myData(n).AA;
Before this piece of code I only open some files (which contain two 100x100 array of floats) so I dont think they contribute to this error. The element AMatrix is a 100 x 100 array. So the linear equation in question has dimensions 10000 x 10000. Also AA has one dimensional kernel, I dont know if this affects the numerical computations. Later I project the obtained solution to the orthogonal complement of the kernel to get the "good" solution but it comes after the error. For people who are familiar with it this is just a solution to the Lyapunov equation AX + XA = Cov. The matrix A is sparse, it has 4 50x50 sublocks one of which is all zeros, the other is identity, the other is diagonal and the other has less than 1000 non-zero elements. The matrix CovMatrix is diagonal with 50 non-zero elements in the diagonal.
The problem is at the moment I can only do the calculations on a small personal computer with 2GB RAM with 2.5-6GB of virtual memmory. When I run memmory on matlab it gives
>> memory
Maximum possible array: 311 MB (3.256e+08 bytes) *
Memory available for all arrays: 930 MB (9.749e+08 bytes) **
Memory used by MATLAB: 677 MB (7.102e+08 bytes)
Physical Memory (RAM): 1931 MB (2.025e+09 bytes)
I am not very knowledgable when it comes to memory so I am open to even simple advices. Thanks.
Complex functions usually allocate temp memory during computation. 10000x10000 looks quite large if a temp dense matrix of such size is allocated during the computation. You could try a few smaller problem sizes and find out the upper limit of your current computer.

Matlab out of memory error behave differently in one and two dimensional arrays

Today I have the need to allocate a vector with size 100000 in Matlab. I try to do it simply using:
a=ones(100000);
which my Matlab angrily answered with:
Out of memory. Type HELP MEMORY for your options.
Which is strange since I have Matlab 64 bit running on a 64 bit machine with 8 GB RAM. I tried many of the "resolving out of memory errors in Matlab" recipe in SO or other places but no luck so far.
Now I'm more confused when something like:
a=ones(10000,10000);
Runs without problem in my machine.
Does this mean that Matlab have some mechanism to limit the number of elements of a vector in a single-dimensional space?
Today I have the need to allocate a vector with size 100000 in Matlab.
Now, as noted in the comments and such, the method you tried (a=ones(100000);) creates a 100000x100000 matrix, which is not what you want.
I would suggest you try:
a = ones(1, 100000);
Since that creates a vector rather than a matrix.
Arguments Matter
Calling Matlab's ones() or zeros() or magic() with a single argument n, creates a square matrix with size n-by-n:
>> a = ones(5)
a = 1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
Calling the same functions with 2 arguments (r, c) instead creates a matrix of size r-by-c:
>> a = ones(2, 5)
a = 1 1 1 1 1
1 1 1 1 1
This is all well documented in Matlab's documentation.
Size Matters Too
Doubles
Having said this, when you do a = zeros(1e6) you are creating a square matrix of size 1e6 * 1e6 = 1e12. Since these are doubles the total allocated size would be 8 * 1e12 Bytes which is circa (8 * 1e12) / 1024^3 = 7450.6GB. Do you have this much RAM on your machine?
Compare this with a = zeros(1, 1e6) which creates a column-vector of size 1 * 1e6 = 1e6, for a total allocated size of (8 * 1e6) / 1024^3 = 7.63MB.
Logicals
Logical values, on the other hand are boolean values, which can be set to either 0 or 1 representing False or True. With this in mind, you can allocate matrices of logicals using either false() or true(). Here the same single-argument rule applies, hence a = false(1e6) creates a square matrix of size 1e6 * 1e6 = 1e12. Matlab today, as many other programming languages, stores bit values, such as booleans, into single Bytes. Even though there is a clear cost in terms of memory usage, such a mechanism provides significant performance improvements. This is because it is accessing single bits is a slow operation.
The total allocated size of our a = false(1e6) matrix would therefore be 1e12 Bytes which is circa 1e12 / 1024^3 = 931.32GB.
Well the first declaration tries to build a matrix of 1000000x1000000 ones. That would be ~931 GB.
The second tries to declare a matrix of 10000 x 10000. That would be ~95MB.
I assumed each one is stored on a byte. If they use floats, than the requested memory size will be 4 times larger.

How to find values of the input data in plotsommaphits

I have used SOM tool box in MATLAB or iris data set. following example and using the plotsomhits i can see that how many data values are in each neuron of my neuron grid of the SOM . However I wish to know actual data values which are grouped in every neuron of the given SOM configuration .Is there any way to do it. this is the example I used.
net = selforgmap([8 8]);
view(net)
[net,tr] = train(net,x);
nntraintool
plotsomhits(net,x)
not that hard. plotsomhits just plots "fancily" the results of the simulation of the net.
so if you simulate it and add the "hits" you have the result!
basicaly:
hits=sum(sim(net,x)');
In your net case this was my results, that coincide with the numbers in the plotsomehits
hits= 6 5 0 7 15 7 4 0 8 20 3 3 9 3 0 8 6 3 11 4 5 5 7 10 1
PD: you can learn a lot in this amazing SO answer:
MATLAB: help needed with Self-Organizing Map (SOM) clustering
You need to convert vector to indices first and then you can see what input values a neuron correspond to.
>> input_neuron_mapping = vec2ind(net(x))';
Now, look into the neuron's inputs.
For example, you want to see neuron input values for neuron 2.
>> neuron_2_input_indices = find(input_neuron_mapping == 2)
>> neuron_2_input_values = x(neuron_2_input_indices)
It will display all the input values from your data.
Read here for more details:
https://bioinformaticsreview.com/20220603/how-to-get-input-values-from-som-sample-hits-plot-in-matlab/