I have to calculate fourth-order cumulants with large matrices in my current research project.
The following is part of my code.
N=zeros((Kse*Ksf*Ksr)^2);
for k=1:Ke % Ke=3
for i=1:Kr % Kr=3
for j=1:Kf % Kf=1;
Mat=Xb(k:k+Kse-1,i:i+Ksr-1,j:j+Ksf-1); % size of Xb: (Ke+Kse-1)
% *(Kr+Ksr-1)*(Kf+Ksf-1)
Mat_vec=Mat(:);
snapshot_number=70;
Mat_kr=kron(Mat_vec,conj(Mat_vec));
N=N+Mat_kr*Mat_kr'/snapshot_number;
N=N-Mat_kr*Mat_kr'/snapshot_number.^2;
temp=Mat_vec*Mat_vec'/snapshot_number;
N=N-kron(temp,conj(temp));
end
end
end
Under the conditions of Kse=Ksr=5 and Ksf=9, the running memory is about 200GB. The size of N is 50625*50625. My available server has the maximum memory of 250GB. The key limit is memory. So I can not increase the parameters of Kse, Ksr and Ksf any more. My target is to set Kse=Ksr=Ksf=11.
So I have thought of matlab distributed computing.
Firstly, I shrinked the parameter to Kse=3, Ksr=5 and Ksf=8 just for test. I modified the code to the following editon:
myPool=parpool();
Xb_dis=distributed(Xb);
N_dis=distributed(zeros((Kse*Ksf*Ksr)^2));
for k=1:Ke % Ke=3
for i=1:Kr % Kr=3
for j=1:Kf % Kf=1;
Mat_dis=Xb_dis(k:k+Kse-1,i:i+Ksr-1,j:j+Ksf-1);
Mat_vec_dis=Mat_dis(:);
snapshot_number=70;
Mat_kr_dis=kron(Mat_vec_dis,conj(Mat_vec_dis));
N_dis=N_dis+Mat_kr_dis*Mat_kr_dis'/snapshot_number;
N_dis=N_dis-Mat_kr_dis*Mat_kr_dis'/snapshot_number.^2;
temp_dis=Mat_vec_dis*Mat_vec_dis'/snapshot_number;
N_dis=N_dis-kron(temp_dis,conj(temp_dis));
end
end
end
N=gather(N_dis);
The experimental distributed computing cluster consists of two computers. I found that there is little data communication between the two computers during the triple loops. So I thought that the calculations can be done in smaller scale by sequences instead of loading the whole matrices into the memory. In other words, the calculation of large scale matrix may equal to combinations of smaller ones. But after several days in searching the Internet, I still have no idea.
Anybody have some advices or suggestions for this question?
Your matrices seems to have a lot of zeros, have you tried to use sparse matrices ? It helps a lot when you want to save memory.
Related
This is a cross-post from here:
Link to post in the Mathworks community
Currently I'm working with large data sets, I've saved those data set as matlab files with the two biggest files being 9.5GB and 5.9GB.
These files contain a cell array each of 1x8 (this is done for addressibility and to prevent mixing up data from each of the 8 cells and I specifically wanted to avoid eval).
Each cell then contains a 3D double matrix, for one it's 1001x2002x201 and the other it is 2003x1001x201 (when process it I chop of 1 row at the end to get it to 2002).
Now I'm already running my script and processing it on a server (64 cores and plenty of RAM, matlab crashed on my laptop, as I need more than 12GB ram on windows). Nonetheless it still takes several hours to finish running my script and I still need to do some extra operations on the data which is why I'm asking advice.
For some of the large cell arrays, I need to find the maximum value of the entire set of all 8 cells, normally I would run a for loop to get the maximum of each cel and store each value in a temporay numeric array and then use the function max again. This will work for sure I'm just wondering if there's a better more efficient way.
After I find the maximum I need to do a manipulation over all this data as well, normally I would do something like this for an array:
B=A./maxvaluefound;
A(B > a) = A(B > a)*constant;
Now I could put this in a for loop, adress each cell and run this, however I'm not sure how efficient that would be though. Do you think there's a better way then a for loop that's not extremely complicated/difficult to implement?
There's one more thing I need to do which is really important, each cell as I said before is a slice (consider it time), while inside each slide is the value for a 3D matrix/plot. Now I need to integrate the data so that I get more slices. The reason I need to do this that I need to create slices/frames/plots to create a movie/gif. I'm planning on plotting the 3d data using scatter3 where this data is represented by color. I plan on using alpha values to make it see through so that one can actually see the intensity in this 3d plot. However I understand how to use griddata but apparently it's quite slow. Some of the other methods where hard to understand. Thus what would be the best way to interpolate these (time) slices in an efficient way over the different cells in the cell array? Please explain it if you can, preferably with an example.
I've added a pic for the Linux server info I'm running it on below, note I can not update the matlab version unfortunately, it's R2016a:
I've also attached part of my code to give a better idea of what I'm doing:
if (or(L03==2,L04==2)) % check if this section needs to be executed based on parameters set at top of file
load('../loadfilewithpathnameonmypc.mat')
E_field_650nm_intAll=cell(1,8); %create empty cell array
parfor ee=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
E_field_650nm_intAll{ee}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cell 1-8
for qq=1:2:xres
tt=(qq+1)/2; %consecutive number instead of spacing 2
T1=griddata(Xsall{ee},Ysall{ee},EfieldsAll{ee}(:,:,qq)',XIT,ZIT,'natural'); %change data on non-uniform grid to uniform gridded data
E_field_650nm_intAll{ee}(:,:,tt)=T1; %fill up each cell with uniform data
end
end
clear T1
clear qq tt
clear ee
save('../savelargefile.mat', 'E_field_650nm_intAll', '-v7.3')
end
if (L05==2) % check if this section needs to be executed based on parameters set at top of file
if ~exist('E_field_650nm_intAll','var') % if variable not in workspace load it
load('../loadanotherfilewithpathnameonmypc.mat');
end
parfor tt=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
CFxLight{tt}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cells 1 to 8
for qq=1:xres
CFs=Cafluo3D{tt}(1:lxq2,:,qq)'; %get matrix slice and tranpose matrix for point-wise multiplication
CFxLight{tt}(:,:,qq)=CFs.*E_field_650nm_intAll{tt}(:,:,qq); %point-wise multiple the two large matrices for each cell and put in new cell array
end
end
clear CFs
clear qq tt
save('../saveanotherlargefile.mat', 'CFxLight', '-v7.3')
end
I am working on an algorithm (in MATLAB) that needs to find Hu-Moments of overlapping blocks in an image. I am converting image in column matrix (im2col(...,'sliding')) and then calculating Hu-Moments for each column individually. For calculating the Hu Moments for the blocks of an Image of 512X512 my system is taking 14-15 minutes. Code is as given below:
d=im2col(A,[m n],'sliding');
[mm nn]=size(d);
for j=1:nn
d_temp=d(:,j);
d_pass_temp=col2im(d_temp,[n n], [n n], 'distinct');
[mn_t vr_t]=new_hu_moment(d_pass_temp);
[mn]=[mn mn_t];
[vr]=[vr vr_t];
end
'new_hu_moment' is my own made function returning mean and variance of hu moments for the respective block.
My sys configuration is I3 processor with 6GB RAM.
please suggest for performance up-gradation of this code.
is there any function in matlab that can calculate 7 Hu moments for overlapping blocks.
1.preallocate mn and vr before the loop, something like
% I suppose that new_hu_moment returns row of known length nvals
mn=zeros(nn,nvals);
vr=zeros(nn,nvals);
for k=1:nn,
...
[mn(k,:) vr(k,:)]=new_hu_moment(d_pass_temp);
...
end
I strongly recomend not using i and j as variable names as they are synonyms for imaginary unit. In older MATLAB it can cause errors
2.If you have enough memory and for-loop processes independent data chunks, use parfor or cellfun/arrayfun
3.If code is not as fast you want it to be, use profiler to find the performance bottleneck
X,Y and z are coordinates representing surface. In order to calculate some quantity, lets call it flow, at point i,j of the surface, i need to calculate contibution from all other points (i0,j0). To do so i need for example to know cos of angles between point i0,j0 and all other points (alpha). Then all contirbutions from i0,j0 must be multiplied on some constants and added. zv0 at every point i,j is final needed result.
I came up with some code written below and it seems to be extremely unappropriate. First of all it slows down rest of the program and seems to use all of the available memory. My system has 4gb physical memory and 12gb swap file and it always runs out of memory, though all of variables sizes are not bigger then 10kb. Please help up with speed up/vectorization and memory problems.
parfor i0=2:1:length(x00);
for j0=2:1:length(y00);
zv=red3dfunc(X0,Y0,f,z0,i0,j0,st,ang,nx,ny,nz);
zv0=zv0+zv;
end
end
function[X,Y,z,zv]=red3dfunc(X,Y,f,z,i0,j0,st,ang,Nx,Ny,Nz)
x1=X(i0,j0);
y1=Y(i0,j0);
z1=z(i0,j0);
alpha=zeros(size(X));
betha=zeros(size(X));
r=zeros(size(X));
XXa=X-x1;
YYa=Y-y1;
ZZa=z-z1;
VEC=((XXa).^2+(YYa).^2+(ZZa).^2).^(1/2);
VEC(i0,j0)=VEC(i0-1,j0-1);
XXa=XXa./VEC;
YYa=YYa./VEC;
ZZa=ZZa./VEC;
alpha=-(Nx(i0,j0).*XXa+Ny(i0,j0).*YYa+Nz(i0,j0).*ZZa);
betha=Nx.*XXa+Ny.*YYa+Nz.*ZZb;
r=VEC;
zv=(1/pi)*st^2*ang.*f.*(alpha).*betha./r.^2;
The obvious thing to do this is to use Kroneker product. The matlab function is kron(A,B) for matricies of dimensions nAxmA and nBxmB. This function will return matrix of dimension (nA*nB)x(mA*mB), which will look something like
[a11*B a12*B ... a1mA*B;
.......................;
anA1*B ........ anAmA*B]
So your problem may be solved by introducing the matrix of ones I=ones(size(X)). You will then define your XXa, YYa, ZZa and VEC matricies without any loop as
XXa = kron(I,X)-kron(X,I);
YYa = kron(I,Y)-kron(Y,I);
ZZa = kron(I,Z)-kron(Z,I);
VEC=((XXa).^2+(YYa).^2+(ZZa).^2).^(1/2);
You will then find VEC for any i0,j0 as (if you define n and m as size components of X)
VEC((1+n*(i0-1)):(n*i0),(1+m*(j0-1)):(m*j0))
I need to use 4 dimensional matrix as an accumulator for voting 4 parameters. every parameters vary in the range of 1~300. for that, I define Acc = zeros(300,300,300,300) in MATLAB. and somewhere for example, I used:
Acc(4,10,120,78)=Acc(4,10,120,78)+1
however, MATLAB says some error happened because of memory limitation.
??? Error using ==> zeros
Out of memory. Type HELP MEMORY for your options.
in the below, you can see a part of my code:
I = imread('image.bmp'); %I is logical 300x300 image.
Acc = zeros(100,100,100,100);
for i = 1:300
for j = 1:300
if I(i,j)==1
for x0 = 3:3:300
for y0 = 3:3:300
for a = 3:3:300
b = abs(j-y0)/sqrt(1-((i-x0)^2) / (a^2));
b1=floor(b/3);
if b1==0
b1=1;
end
a1=ceil(a/3);
Acc(x0/3,y0/3,a1,b1) = Acc(x0/3,y0/3,a1,b1)+1;
end
end
end
end
end
end
As #Rasman mentioned, you probably want to use a sparse representation of the matrix Acc.
Unfortunately, the sparse function is geared toward 2D matrices, not arbitrary n-D.
But that's ok, because we can take advantage of sub2ind and linear indexing to go back and forth to 4D.
Dims = [300, 300, 300, 300]; % it will be a 300 by 300 by 300 by 300 matrix
Acc = sparse([], [], [], prod(Dims), 1, ExpectedNumElts);
Here ExpectedNumElts should be some number like 30 or 9000 or however many non-zero elements you expect for the matrix Acc to have. We notionally think of Acc as a matrix, but actually it will be a vector. But that's okay, we can use sub2ind to convert 4D coordinates into linear indices into the vector:
ind = sub2ind(Dims, 4, 10, 120, 78);
Acc(ind) = Acc(ind) + 1;
You may also find the functions find, nnz, spy, and spfun helpful.
edit: see lambdageek for the exact same answer with a bit more elegance.
The other answers are helping to guide you to use a sparse mat instead of your current dense solution. This is made a little more difficult since current matlab doesn't support N-dimensional sparse arrays. One implementation to do this is
replace
zeros(100,100,100,100)
with
sparse(100*100*100*100,1)
this will store all your counts in a sparse array, as long as most remain zero, you will be ok for memory.
then to access this data, instead of:
Acc(h,i,j,k)=Acc(h,i,j,k)+1
use:
index = h+100*i+100*100*j+100*100*100*k
Acc(index,1)=Acc(index,1)+1
See Avoiding 'Out of Memory' Errors
Your statement would require more than 4 GB of RAM (Around 16 Gigs, to be specific).
Solutions to 'Out of Memory' problems
fall into two main categories:
Maximizing the memory available to
MATLAB (i.e., removing or increasing
limits) on your system via operating
system selection and system
configuration. These usually have the
greatest overall applicability but are
potentially the most disruptive (e.g.
using a different operating system).
These techniques are covered in the
first two sections of this document.
Minimizing the memory used by MATLAB
by making your code more memory
efficient. These are all algorithm
and application specific and therefore
are less broadly applicable. These
techniques are covered in later
sections of this document.
In your case later seems to be the solution - try reducing the amount of memory used / required.
I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm.
In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so
list = who('data_matrix_prefix*')
H = cell(numel(list),numel(list));
for i=1:numel(list)
for j=1:numel(list)
if i ~= j
eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']);
end
end
end
is fast for small subsets of the data (e.g. for 9 matrices, 9*9 - 9 = 72 calls are made in ~1 s, 870 calls in ~2.5 s).
However, operating on all the data requires almost 25 million calls.
I have also tried using deal() to make a cell array composed entirely of the next element in data, so I could use cellfun() in a single loop:
# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
nextData = cell(k,1);
for i=1:k
[nextData{:}] = deal(data{i});
H{:,i} = cellfun(#compare,data,nextData,'UniformOutput',false);
end
Unfortunately, this is not really any faster, because all the time is in compare(). Both of these code examples seem ill-suited for parallelization. I'm having trouble figuring out how to make my variables sliced.
compare() is totally vectorized; it uses matrix multiplication and conv2() exclusively (I am under the impression that all of these operations, including the cellfun(), should be multithreaded in MATLAB?).
Does anyone see a (explicitly) parallelized solution or better vectorization of the problem?
Note
I realize both my examples are inefficient - the first would be twice as fast if it calculated a triangular cell array, and the second is still calculating the self comparisons, as well. But the time savings for a good parallelization are more like a factor of 16 (or 72 if I install MATLAB on everyone's machines).
Aside
There is also a memory issue. I used a couple of evals to append each column of H into a file, with names like H1, H2, etc. and then clear Hi. Unfortunately, the saves are very slow...
Does
compare(a,b) == compare(b,a)
and
compare(a,a) == 1
If so, change your loop
for i=1:numel(list)
for j=1:numel(list)
...
end
end
to
for i=1:numel(list)
for j= i+1 : numel(list)
...
end
end
and deal with the symmetry and identity case. This will cut your calculation time by half.
The second example can be easily sliced for use with the Parallel Processing Toolbox. This toolbox distributes iterations of your code among up to 8 different local processors. If you want to run the code on a cluster, you also need the Distributed Computing Toolbox.
%# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
parfor i=1:k-1 %# this will run the loop in parallel with the parallel processing toolbox
%# only make the necessary comparisons
H{i+1:k,i} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
%# if the above doesn't work, try this
hSlice = cell(k,1);
hSlice{i+1:k} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
H{:,i} = hSlice;
end
If I understand correctly you have to perform 5000^2 matrix comparisons ? Rather than try to parallelise the compare function, perhaps you should think of your problem being composed of 5000^2 tasks ? The Matlab Parallel Compute Toolbox supports task-based parallelism. Unfortunately my experience with PCT is with parallelisation of large linear algebra type problems so I can't really tell you much more than that. The documentation will undoubtedly help you more.