Matlab for loop vectorization and memory - matlab

X,Y and z are coordinates representing surface. In order to calculate some quantity, lets call it flow, at point i,j of the surface, i need to calculate contibution from all other points (i0,j0). To do so i need for example to know cos of angles between point i0,j0 and all other points (alpha). Then all contirbutions from i0,j0 must be multiplied on some constants and added. zv0 at every point i,j is final needed result.
I came up with some code written below and it seems to be extremely unappropriate. First of all it slows down rest of the program and seems to use all of the available memory. My system has 4gb physical memory and 12gb swap file and it always runs out of memory, though all of variables sizes are not bigger then 10kb. Please help up with speed up/vectorization and memory problems.
parfor i0=2:1:length(x00);
for j0=2:1:length(y00);
zv=red3dfunc(X0,Y0,f,z0,i0,j0,st,ang,nx,ny,nz);
zv0=zv0+zv;
end
end
function[X,Y,z,zv]=red3dfunc(X,Y,f,z,i0,j0,st,ang,Nx,Ny,Nz)
x1=X(i0,j0);
y1=Y(i0,j0);
z1=z(i0,j0);
alpha=zeros(size(X));
betha=zeros(size(X));
r=zeros(size(X));
XXa=X-x1;
YYa=Y-y1;
ZZa=z-z1;
VEC=((XXa).^2+(YYa).^2+(ZZa).^2).^(1/2);
VEC(i0,j0)=VEC(i0-1,j0-1);
XXa=XXa./VEC;
YYa=YYa./VEC;
ZZa=ZZa./VEC;
alpha=-(Nx(i0,j0).*XXa+Ny(i0,j0).*YYa+Nz(i0,j0).*ZZa);
betha=Nx.*XXa+Ny.*YYa+Nz.*ZZb;
r=VEC;
zv=(1/pi)*st^2*ang.*f.*(alpha).*betha./r.^2;

The obvious thing to do this is to use Kroneker product. The matlab function is kron(A,B) for matricies of dimensions nAxmA and nBxmB. This function will return matrix of dimension (nA*nB)x(mA*mB), which will look something like
[a11*B a12*B ... a1mA*B;
.......................;
anA1*B ........ anAmA*B]
So your problem may be solved by introducing the matrix of ones I=ones(size(X)). You will then define your XXa, YYa, ZZa and VEC matricies without any loop as
XXa = kron(I,X)-kron(X,I);
YYa = kron(I,Y)-kron(Y,I);
ZZa = kron(I,Z)-kron(Z,I);
VEC=((XXa).^2+(YYa).^2+(ZZa).^2).^(1/2);
You will then find VEC for any i0,j0 as (if you define n and m as size components of X)
VEC((1+n*(i0-1)):(n*i0),(1+m*(j0-1)):(m*j0))

Related

Performance issues in finding Hu Moments of Overlapping Blocks in image

I am working on an algorithm (in MATLAB) that needs to find Hu-Moments of overlapping blocks in an image. I am converting image in column matrix (im2col(...,'sliding')) and then calculating Hu-Moments for each column individually. For calculating the Hu Moments for the blocks of an Image of 512X512 my system is taking 14-15 minutes. Code is as given below:
d=im2col(A,[m n],'sliding');
[mm nn]=size(d);
for j=1:nn
d_temp=d(:,j);
d_pass_temp=col2im(d_temp,[n n], [n n], 'distinct');
[mn_t vr_t]=new_hu_moment(d_pass_temp);
[mn]=[mn mn_t];
[vr]=[vr vr_t];
end
'new_hu_moment' is my own made function returning mean and variance of hu moments for the respective block.
My sys configuration is I3 processor with 6GB RAM.
please suggest for performance up-gradation of this code.
is there any function in matlab that can calculate 7 Hu moments for overlapping blocks.
1.preallocate mn and vr before the loop, something like
% I suppose that new_hu_moment returns row of known length nvals
mn=zeros(nn,nvals);
vr=zeros(nn,nvals);
for k=1:nn,
...
[mn(k,:) vr(k,:)]=new_hu_moment(d_pass_temp);
...
end
I strongly recomend not using i and j as variable names as they are synonyms for imaginary unit. In older MATLAB it can cause errors
2.If you have enough memory and for-loop processes independent data chunks, use parfor or cellfun/arrayfun
3.If code is not as fast you want it to be, use profiler to find the performance bottleneck

Limiting large array to 1D in MATLAB

I'm working with XShooter data and for galactic corrections, I'm using ccm_unred in MATLAB. The problem is
funred = flux*10.^(0.4*A_lambda);
this line of code generates a 29686 X 29686 double array. I want only one side of it, I can do it by reassigning funred as funred = funred(:,1) but this piece of code also takes 57 seconds to be executed and uses up my CPU and RAM too much for my laptop to stay stable. Is there any method by which I can limit the generation of funred to only (:,1) from the beginning?
You say that your code generates a 29686 X 29686 matrix, however you are doing element-wise operations in your equation. That means that either flux or A_lambda bust be 29686 X 29686. Just slice the ones that are that size!
Assuming one of them is 29686 X 29686
funred = flux(:,1)*10.^(0.4*A_lambda(:,1));
Just remove the (:,1) of the one that is is not a matrix.
If both of them are a matices, then you can not do it, as flux*... would need the whole matrix to operate.

Diffusion outer bounds

I'm attempting to run this simple diffusion case (I understand that it isn't ideal generally), and I'm doing fine with getting the inside of the solid, but need some help with the outer edges.
global M
size=100
M=zeros(size,size);
M(25,25)=50;
for diffusive_steps=1:500
oldM=M;
newM=zeros(size,size);
for i=2:size-1;
for j=2:size-1;
%we're considering the ij-th pixel
pixel_conc=oldM(i,j);
newM(i,j+1)=newM(i,j+1)+pixel_conc/4;
newM(i,j-1)=newM(i,j-1)+pixel_conc/4;
newM(i+1,j)=newM(i+1,j)+pixel_conc/4;
newM(i-1,j)=newM(i-1,j)+pixel_conc/4;
end
end
M=newM;
end
It's a pretty simple piece of code, and I know that. I'm not very good at using Octave yet (chemist by trade), so I'd appreciate any help!
If you have concerns about the border of your simulation you could pad your matrix with NaN values, and then remove the border after the simulation has completed. NaN stands for not a number and is often used to denote blank data. There are many MATLAB functions work in a useful way with these values.
e.g. finding the mean of an array which has blanks:
nanmean([0 nan 5 nan 10])
ans =
5
In your case, I would start by adding a border of NaNs to your M matrix. I'm using 'n' instead of 'size', since size is an important function in MATLAB, and using it as a variable can lead to confusing errors.
n=100;
blankM=zeros(n+2,n+2);
blankM([1,end],:) = nan;
blankM(:, [1,end]) = nan;
Now we can define 'M'. N.B that the first column and row will be NaNs so we need to add an offset (25+1):
M = blankM;
M(26,26)=50;
Run the simulation through,
m = size(blankM, 1);
n = size(blankM, 2);
for diffusive_steps=1:500
oldM = M;
newM = blankM;
for i=2:m-1;
for j=2:n-1;
pixel_conc=oldM(i,j);
newM(i,j+1)=newM(i,j+1)+pixel_conc/4;
newM(i,j-1)=newM(i,j-1)+pixel_conc/4;
newM(i+1,j)=newM(i+1,j)+pixel_conc/4;
newM(i-1,j)=newM(i-1,j)+pixel_conc/4;
end
end
M=newM;
end
and then extract the area of interest
finalResult = M(2:end-1, 2:end-1);
One simple change you might make is to add a boundary of ghost cells, or halo, around the domain of interest. Rather than mis-use the name size I've used a variable called sz. Replace:
M=zeros(sz,sz)
with
M=zeros(sz+2,sz+2)
and then compute your diffusion over the interior of this augmented matrix, ie over cells (2:sz+1,2:sz+1). When it comes to considering the results, discard or just ignore the halo.
Even simpler would be to simply take what you already have and ignore the cells in your existing matrix which are on the N,S,E,W edges.
This technique is widely used in problems such as, and similar to, yours and avoids the need to write code which deals with the computations on cells which don't have a full complement of neighbours. Setting the appropriate value for the contents of the halo cells is a problem-dependent matter, 0 isn't always the right value.

4 dimensional matrix

I need to use 4 dimensional matrix as an accumulator for voting 4 parameters. every parameters vary in the range of 1~300. for that, I define Acc = zeros(300,300,300,300) in MATLAB. and somewhere for example, I used:
Acc(4,10,120,78)=Acc(4,10,120,78)+1
however, MATLAB says some error happened because of memory limitation.
??? Error using ==> zeros
Out of memory. Type HELP MEMORY for your options.
in the below, you can see a part of my code:
I = imread('image.bmp'); %I is logical 300x300 image.
Acc = zeros(100,100,100,100);
for i = 1:300
for j = 1:300
if I(i,j)==1
for x0 = 3:3:300
for y0 = 3:3:300
for a = 3:3:300
b = abs(j-y0)/sqrt(1-((i-x0)^2) / (a^2));
b1=floor(b/3);
if b1==0
b1=1;
end
a1=ceil(a/3);
Acc(x0/3,y0/3,a1,b1) = Acc(x0/3,y0/3,a1,b1)+1;
end
end
end
end
end
end
As #Rasman mentioned, you probably want to use a sparse representation of the matrix Acc.
Unfortunately, the sparse function is geared toward 2D matrices, not arbitrary n-D.
But that's ok, because we can take advantage of sub2ind and linear indexing to go back and forth to 4D.
Dims = [300, 300, 300, 300]; % it will be a 300 by 300 by 300 by 300 matrix
Acc = sparse([], [], [], prod(Dims), 1, ExpectedNumElts);
Here ExpectedNumElts should be some number like 30 or 9000 or however many non-zero elements you expect for the matrix Acc to have. We notionally think of Acc as a matrix, but actually it will be a vector. But that's okay, we can use sub2ind to convert 4D coordinates into linear indices into the vector:
ind = sub2ind(Dims, 4, 10, 120, 78);
Acc(ind) = Acc(ind) + 1;
You may also find the functions find, nnz, spy, and spfun helpful.
edit: see lambdageek for the exact same answer with a bit more elegance.
The other answers are helping to guide you to use a sparse mat instead of your current dense solution. This is made a little more difficult since current matlab doesn't support N-dimensional sparse arrays. One implementation to do this is
replace
zeros(100,100,100,100)
with
sparse(100*100*100*100,1)
this will store all your counts in a sparse array, as long as most remain zero, you will be ok for memory.
then to access this data, instead of:
Acc(h,i,j,k)=Acc(h,i,j,k)+1
use:
index = h+100*i+100*100*j+100*100*100*k
Acc(index,1)=Acc(index,1)+1
See Avoiding 'Out of Memory' Errors
Your statement would require more than 4 GB of RAM (Around 16 Gigs, to be specific).
Solutions to 'Out of Memory' problems
fall into two main categories:
Maximizing the memory available to
MATLAB (i.e., removing or increasing
limits) on your system via operating
system selection and system
configuration. These usually have the
greatest overall applicability but are
potentially the most disruptive (e.g.
using a different operating system).
These techniques are covered in the
first two sections of this document.
Minimizing the memory used by MATLAB
by making your code more memory
efficient. These are all algorithm
and application specific and therefore
are less broadly applicable. These
techniques are covered in later
sections of this document.
In your case later seems to be the solution - try reducing the amount of memory used / required.

Parallelize or vectorize all-against-all operation on a large number of matrices?

I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm.
In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so
list = who('data_matrix_prefix*')
H = cell(numel(list),numel(list));
for i=1:numel(list)
for j=1:numel(list)
if i ~= j
eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']);
end
end
end
is fast for small subsets of the data (e.g. for 9 matrices, 9*9 - 9 = 72 calls are made in ~1 s, 870 calls in ~2.5 s).
However, operating on all the data requires almost 25 million calls.
I have also tried using deal() to make a cell array composed entirely of the next element in data, so I could use cellfun() in a single loop:
# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
nextData = cell(k,1);
for i=1:k
[nextData{:}] = deal(data{i});
H{:,i} = cellfun(#compare,data,nextData,'UniformOutput',false);
end
Unfortunately, this is not really any faster, because all the time is in compare(). Both of these code examples seem ill-suited for parallelization. I'm having trouble figuring out how to make my variables sliced.
compare() is totally vectorized; it uses matrix multiplication and conv2() exclusively (I am under the impression that all of these operations, including the cellfun(), should be multithreaded in MATLAB?).
Does anyone see a (explicitly) parallelized solution or better vectorization of the problem?
Note
I realize both my examples are inefficient - the first would be twice as fast if it calculated a triangular cell array, and the second is still calculating the self comparisons, as well. But the time savings for a good parallelization are more like a factor of 16 (or 72 if I install MATLAB on everyone's machines).
Aside
There is also a memory issue. I used a couple of evals to append each column of H into a file, with names like H1, H2, etc. and then clear Hi. Unfortunately, the saves are very slow...
Does
compare(a,b) == compare(b,a)
and
compare(a,a) == 1
If so, change your loop
for i=1:numel(list)
for j=1:numel(list)
...
end
end
to
for i=1:numel(list)
for j= i+1 : numel(list)
...
end
end
and deal with the symmetry and identity case. This will cut your calculation time by half.
The second example can be easily sliced for use with the Parallel Processing Toolbox. This toolbox distributes iterations of your code among up to 8 different local processors. If you want to run the code on a cluster, you also need the Distributed Computing Toolbox.
%# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
parfor i=1:k-1 %# this will run the loop in parallel with the parallel processing toolbox
%# only make the necessary comparisons
H{i+1:k,i} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
%# if the above doesn't work, try this
hSlice = cell(k,1);
hSlice{i+1:k} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
H{:,i} = hSlice;
end
If I understand correctly you have to perform 5000^2 matrix comparisons ? Rather than try to parallelise the compare function, perhaps you should think of your problem being composed of 5000^2 tasks ? The Matlab Parallel Compute Toolbox supports task-based parallelism. Unfortunately my experience with PCT is with parallelisation of large linear algebra type problems so I can't really tell you much more than that. The documentation will undoubtedly help you more.