Performance issues in finding Hu Moments of Overlapping Blocks in image - matlab

I am working on an algorithm (in MATLAB) that needs to find Hu-Moments of overlapping blocks in an image. I am converting image in column matrix (im2col(...,'sliding')) and then calculating Hu-Moments for each column individually. For calculating the Hu Moments for the blocks of an Image of 512X512 my system is taking 14-15 minutes. Code is as given below:
d=im2col(A,[m n],'sliding');
[mm nn]=size(d);
for j=1:nn
d_temp=d(:,j);
d_pass_temp=col2im(d_temp,[n n], [n n], 'distinct');
[mn_t vr_t]=new_hu_moment(d_pass_temp);
[mn]=[mn mn_t];
[vr]=[vr vr_t];
end
'new_hu_moment' is my own made function returning mean and variance of hu moments for the respective block.
My sys configuration is I3 processor with 6GB RAM.
please suggest for performance up-gradation of this code.
is there any function in matlab that can calculate 7 Hu moments for overlapping blocks.

1.preallocate mn and vr before the loop, something like
% I suppose that new_hu_moment returns row of known length nvals
mn=zeros(nn,nvals);
vr=zeros(nn,nvals);
for k=1:nn,
...
[mn(k,:) vr(k,:)]=new_hu_moment(d_pass_temp);
...
end
I strongly recomend not using i and j as variable names as they are synonyms for imaginary unit. In older MATLAB it can cause errors
2.If you have enough memory and for-loop processes independent data chunks, use parfor or cellfun/arrayfun
3.If code is not as fast you want it to be, use profiler to find the performance bottleneck

Related

Manipulating large sets of data in Matlab, asking for advice on a few things, cells and numeric array operations, with performance in mind

This is a cross-post from here:
Link to post in the Mathworks community
Currently I'm working with large data sets, I've saved those data set as matlab files with the two biggest files being 9.5GB and 5.9GB.
These files contain a cell array each of 1x8 (this is done for addressibility and to prevent mixing up data from each of the 8 cells and I specifically wanted to avoid eval).
Each cell then contains a 3D double matrix, for one it's 1001x2002x201 and the other it is 2003x1001x201 (when process it I chop of 1 row at the end to get it to 2002).
Now I'm already running my script and processing it on a server (64 cores and plenty of RAM, matlab crashed on my laptop, as I need more than 12GB ram on windows). Nonetheless it still takes several hours to finish running my script and I still need to do some extra operations on the data which is why I'm asking advice.
For some of the large cell arrays, I need to find the maximum value of the entire set of all 8 cells, normally I would run a for loop to get the maximum of each cel and store each value in a temporay numeric array and then use the function max again. This will work for sure I'm just wondering if there's a better more efficient way.
After I find the maximum I need to do a manipulation over all this data as well, normally I would do something like this for an array:
B=A./maxvaluefound;
A(B > a) = A(B > a)*constant;
Now I could put this in a for loop, adress each cell and run this, however I'm not sure how efficient that would be though. Do you think there's a better way then a for loop that's not extremely complicated/difficult to implement?
There's one more thing I need to do which is really important, each cell as I said before is a slice (consider it time), while inside each slide is the value for a 3D matrix/plot. Now I need to integrate the data so that I get more slices. The reason I need to do this that I need to create slices/frames/plots to create a movie/gif. I'm planning on plotting the 3d data using scatter3 where this data is represented by color. I plan on using alpha values to make it see through so that one can actually see the intensity in this 3d plot. However I understand how to use griddata but apparently it's quite slow. Some of the other methods where hard to understand. Thus what would be the best way to interpolate these (time) slices in an efficient way over the different cells in the cell array? Please explain it if you can, preferably with an example.
I've added a pic for the Linux server info I'm running it on below, note I can not update the matlab version unfortunately, it's R2016a:
I've also attached part of my code to give a better idea of what I'm doing:
if (or(L03==2,L04==2)) % check if this section needs to be executed based on parameters set at top of file
load('../loadfilewithpathnameonmypc.mat')
E_field_650nm_intAll=cell(1,8); %create empty cell array
parfor ee=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
E_field_650nm_intAll{ee}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cell 1-8
for qq=1:2:xres
tt=(qq+1)/2; %consecutive number instead of spacing 2
T1=griddata(Xsall{ee},Ysall{ee},EfieldsAll{ee}(:,:,qq)',XIT,ZIT,'natural'); %change data on non-uniform grid to uniform gridded data
E_field_650nm_intAll{ee}(:,:,tt)=T1; %fill up each cell with uniform data
end
end
clear T1
clear qq tt
clear ee
save('../savelargefile.mat', 'E_field_650nm_intAll', '-v7.3')
end
if (L05==2) % check if this section needs to be executed based on parameters set at top of file
if ~exist('E_field_650nm_intAll','var') % if variable not in workspace load it
load('../loadanotherfilewithpathnameonmypc.mat');
end
parfor tt=1:8 %run for loop for cell array, changed this to a parfor to increase speed by approximately 8x
CFxLight{tt}=nan(szxit(1),szxit(2),xres); %create nan-filled matrix in cells 1 to 8
for qq=1:xres
CFs=Cafluo3D{tt}(1:lxq2,:,qq)'; %get matrix slice and tranpose matrix for point-wise multiplication
CFxLight{tt}(:,:,qq)=CFs.*E_field_650nm_intAll{tt}(:,:,qq); %point-wise multiple the two large matrices for each cell and put in new cell array
end
end
clear CFs
clear qq tt
save('../saveanotherlargefile.mat', 'CFxLight', '-v7.3')
end

How to transform large matrix calculations into smaller scale in matlab

I have to calculate fourth-order cumulants with large matrices in my current research project.
The following is part of my code.
N=zeros((Kse*Ksf*Ksr)^2);
for k=1:Ke % Ke=3
for i=1:Kr % Kr=3
for j=1:Kf % Kf=1;
Mat=Xb(k:k+Kse-1,i:i+Ksr-1,j:j+Ksf-1); % size of Xb: (Ke+Kse-1)
% *(Kr+Ksr-1)*(Kf+Ksf-1)
Mat_vec=Mat(:);
snapshot_number=70;
Mat_kr=kron(Mat_vec,conj(Mat_vec));
N=N+Mat_kr*Mat_kr'/snapshot_number;
N=N-Mat_kr*Mat_kr'/snapshot_number.^2;
temp=Mat_vec*Mat_vec'/snapshot_number;
N=N-kron(temp,conj(temp));
end
end
end
Under the conditions of Kse=Ksr=5 and Ksf=9, the running memory is about 200GB. The size of N is 50625*50625. My available server has the maximum memory of 250GB. The key limit is memory. So I can not increase the parameters of Kse, Ksr and Ksf any more. My target is to set Kse=Ksr=Ksf=11.
So I have thought of matlab distributed computing.
Firstly, I shrinked the parameter to Kse=3, Ksr=5 and Ksf=8 just for test. I modified the code to the following editon:
myPool=parpool();
Xb_dis=distributed(Xb);
N_dis=distributed(zeros((Kse*Ksf*Ksr)^2));
for k=1:Ke % Ke=3
for i=1:Kr % Kr=3
for j=1:Kf % Kf=1;
Mat_dis=Xb_dis(k:k+Kse-1,i:i+Ksr-1,j:j+Ksf-1);
Mat_vec_dis=Mat_dis(:);
snapshot_number=70;
Mat_kr_dis=kron(Mat_vec_dis,conj(Mat_vec_dis));
N_dis=N_dis+Mat_kr_dis*Mat_kr_dis'/snapshot_number;
N_dis=N_dis-Mat_kr_dis*Mat_kr_dis'/snapshot_number.^2;
temp_dis=Mat_vec_dis*Mat_vec_dis'/snapshot_number;
N_dis=N_dis-kron(temp_dis,conj(temp_dis));
end
end
end
N=gather(N_dis);
The experimental distributed computing cluster consists of two computers. I found that there is little data communication between the two computers during the triple loops. So I thought that the calculations can be done in smaller scale by sequences instead of loading the whole matrices into the memory. In other words, the calculation of large scale matrix may equal to combinations of smaller ones. But after several days in searching the Internet, I still have no idea.
Anybody have some advices or suggestions for this question?
Your matrices seems to have a lot of zeros, have you tried to use sparse matrices ? It helps a lot when you want to save memory.

I want advice about how to optimize my code. It takes too long for execution

I wrote a MATLAB code for finding seismic signal (ex. P wave) from SAC(seismic) file (which is read via another code). This algorithm is called STA/LTA trigger algorithm (actually not that important for my question)
Important thing is that actually this code works well, but since my seismic file is too big (1GB, which is for two months), it takes almost 40 minutes for executing to see the result. Thus, I feel the need to optimize the code.
I heard that replacing loops with advanced functions would help, but since I am a novice in MATLAB, I cannot get an idea about how to do it, since the purpose of code is scan through the every time series.
Also, I heard that preallocation might help, but I have mere idea about how to actually do this.
Since this code is about seismology, it might be hard to understand, but my notes at the top might help. I hope I can get useful advice here.
Following is my code.
function[pstime]=classic_LR(tseries,ltw,stw,thresh,dt)
% This is the code for "Classic LR" algorithm
% 'ns' is the number of measurement in STW-used to calculate STA
% 'nl' is the number of measurement in LTW-used to calculate LTA
% 'dt' is the time gap between measurements i.e. 0.008s for HHZ and 0.02s for BHZ
% 'ltw' and 'stw' are long and short time windows respectively
% 'lta' and 'sta' are long and short time windows average respectively
% 'sra' is the ratio between 'sta' and 'lta' which will be each component
% for a vector containing the ratio for each measurement point 'i'
% Index 'i' denotes each measurement point and this will be converted to actual time
nl=fix(ltw/dt);
ns=fix(stw/dt);
nt=length(tseries);
aseries=abs(detrend(tseries));
sra=zeros(1,nt);
for i=1:nt-ns
if i>nl
lta=mean(aseries(i-nl:i));
sta=mean(aseries(i:i+ns));
sra(i)=sta/lta;
else
sra(i)=0;
end
end
[k]=find(sra>thresh);
if ~isempty(k)
pstime=k*dt;
else
pstime=0;
end
return;
If you have MATLAB 2016a or later, you can use movmean instead of your loop (this means you also don't need to preallocate anything):
lta = movmean(aseries(1:nt-ns),nl+1,'Endpoints','discard');
sta = movmean(aseries(nl+1:end),ns+1,'Endpoints','discard');
sra = sta./lta;
The only difference here is that you will get sra with no leading and trailing zeros. This is most likely to be the fastest way. If for instance, aseries is 'only' 8 MB than this method takes less than 0.02 second while the original method takes almost 6 seconds!
However, even if you don't have Matlab 2016a, considering your loop, you can still do the following:
Remove the else statement - sta(i) is already zero from the preallocating.
Start the loop from nl+1, instead of checking when i is greater than nl.
So your new loop will be:
for i=nl+1:nt-ns
lta = mean(aseries(i-nl:i));
sta = mean(aseries(i:i+ns));
sra(i)=sta/lta;
end
But it won't be so faster.

Matlab for loop vectorization and memory

X,Y and z are coordinates representing surface. In order to calculate some quantity, lets call it flow, at point i,j of the surface, i need to calculate contibution from all other points (i0,j0). To do so i need for example to know cos of angles between point i0,j0 and all other points (alpha). Then all contirbutions from i0,j0 must be multiplied on some constants and added. zv0 at every point i,j is final needed result.
I came up with some code written below and it seems to be extremely unappropriate. First of all it slows down rest of the program and seems to use all of the available memory. My system has 4gb physical memory and 12gb swap file and it always runs out of memory, though all of variables sizes are not bigger then 10kb. Please help up with speed up/vectorization and memory problems.
parfor i0=2:1:length(x00);
for j0=2:1:length(y00);
zv=red3dfunc(X0,Y0,f,z0,i0,j0,st,ang,nx,ny,nz);
zv0=zv0+zv;
end
end
function[X,Y,z,zv]=red3dfunc(X,Y,f,z,i0,j0,st,ang,Nx,Ny,Nz)
x1=X(i0,j0);
y1=Y(i0,j0);
z1=z(i0,j0);
alpha=zeros(size(X));
betha=zeros(size(X));
r=zeros(size(X));
XXa=X-x1;
YYa=Y-y1;
ZZa=z-z1;
VEC=((XXa).^2+(YYa).^2+(ZZa).^2).^(1/2);
VEC(i0,j0)=VEC(i0-1,j0-1);
XXa=XXa./VEC;
YYa=YYa./VEC;
ZZa=ZZa./VEC;
alpha=-(Nx(i0,j0).*XXa+Ny(i0,j0).*YYa+Nz(i0,j0).*ZZa);
betha=Nx.*XXa+Ny.*YYa+Nz.*ZZb;
r=VEC;
zv=(1/pi)*st^2*ang.*f.*(alpha).*betha./r.^2;
The obvious thing to do this is to use Kroneker product. The matlab function is kron(A,B) for matricies of dimensions nAxmA and nBxmB. This function will return matrix of dimension (nA*nB)x(mA*mB), which will look something like
[a11*B a12*B ... a1mA*B;
.......................;
anA1*B ........ anAmA*B]
So your problem may be solved by introducing the matrix of ones I=ones(size(X)). You will then define your XXa, YYa, ZZa and VEC matricies without any loop as
XXa = kron(I,X)-kron(X,I);
YYa = kron(I,Y)-kron(Y,I);
ZZa = kron(I,Z)-kron(Z,I);
VEC=((XXa).^2+(YYa).^2+(ZZa).^2).^(1/2);
You will then find VEC for any i0,j0 as (if you define n and m as size components of X)
VEC((1+n*(i0-1)):(n*i0),(1+m*(j0-1)):(m*j0))

Parallelize or vectorize all-against-all operation on a large number of matrices?

I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm.
In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so
list = who('data_matrix_prefix*')
H = cell(numel(list),numel(list));
for i=1:numel(list)
for j=1:numel(list)
if i ~= j
eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']);
end
end
end
is fast for small subsets of the data (e.g. for 9 matrices, 9*9 - 9 = 72 calls are made in ~1 s, 870 calls in ~2.5 s).
However, operating on all the data requires almost 25 million calls.
I have also tried using deal() to make a cell array composed entirely of the next element in data, so I could use cellfun() in a single loop:
# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
nextData = cell(k,1);
for i=1:k
[nextData{:}] = deal(data{i});
H{:,i} = cellfun(#compare,data,nextData,'UniformOutput',false);
end
Unfortunately, this is not really any faster, because all the time is in compare(). Both of these code examples seem ill-suited for parallelization. I'm having trouble figuring out how to make my variables sliced.
compare() is totally vectorized; it uses matrix multiplication and conv2() exclusively (I am under the impression that all of these operations, including the cellfun(), should be multithreaded in MATLAB?).
Does anyone see a (explicitly) parallelized solution or better vectorization of the problem?
Note
I realize both my examples are inefficient - the first would be twice as fast if it calculated a triangular cell array, and the second is still calculating the self comparisons, as well. But the time savings for a good parallelization are more like a factor of 16 (or 72 if I install MATLAB on everyone's machines).
Aside
There is also a memory issue. I used a couple of evals to append each column of H into a file, with names like H1, H2, etc. and then clear Hi. Unfortunately, the saves are very slow...
Does
compare(a,b) == compare(b,a)
and
compare(a,a) == 1
If so, change your loop
for i=1:numel(list)
for j=1:numel(list)
...
end
end
to
for i=1:numel(list)
for j= i+1 : numel(list)
...
end
end
and deal with the symmetry and identity case. This will cut your calculation time by half.
The second example can be easily sliced for use with the Parallel Processing Toolbox. This toolbox distributes iterations of your code among up to 8 different local processors. If you want to run the code on a cluster, you also need the Distributed Computing Toolbox.
%# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
parfor i=1:k-1 %# this will run the loop in parallel with the parallel processing toolbox
%# only make the necessary comparisons
H{i+1:k,i} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
%# if the above doesn't work, try this
hSlice = cell(k,1);
hSlice{i+1:k} = cellfun(#compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
H{:,i} = hSlice;
end
If I understand correctly you have to perform 5000^2 matrix comparisons ? Rather than try to parallelise the compare function, perhaps you should think of your problem being composed of 5000^2 tasks ? The Matlab Parallel Compute Toolbox supports task-based parallelism. Unfortunately my experience with PCT is with parallelisation of large linear algebra type problems so I can't really tell you much more than that. The documentation will undoubtedly help you more.