Matlab parfor taking longer than normal for - matlab

I have a MATLAB code where I am using parfor to reduce the amount of time taken by for to do some image processing tasks. Basically,it is taking two images and after doing some mathematical calculations, it produces a scalar qunatity called EucDist. For this, one image is kept fixed and another image is generated by a FORTRAN code which is taking around 20 seconds to do that. Below is the outline of my code:
matlabpool open
gray1 = some_image(8192,200);
dep = 0.04:0.01:0.40; % Parameter 1
vel = 1.47:0.01:1.72; % Parameter 2
dist = zeros(length(dep),length(vel));
tic
parfor i = 1:length(dep)
ans = zeros(1,length(vel));
for j = 1:length(vel)
% Updating the Input.txt file
fname = sprintf('Input_%.2d%s',i,'.txt');
fid=fopen(fname,'w');
fprintf(fid,'%-5.2f\n%-5.2f\n%.2d',dep(i),vel(j),i);
fclose(fid);
% Running my fortran code to generate another .dat file (Note that I have already compiled this code outside these loops)
system(['./editcrfl ' fname]);
% Calling IMAGE_GEN script incorporating the above .dat file
system('IMAGE_GEN');
system(sprintf('IMAGE_GEN %d',i));
gray2 = some_image(8192,200);
% Doing some mathematical calculations and getting a value say 'EucDist'
- - - - - - -
- - - - - - -
ans(j) = EucDist;
end
dist(i,:) = ans;
fclose('all');
end
fprintf('Total time taken: %f\n',toc);
matlabpool close
There are two major problems that I am facing with the above code.
First, the dist matrix is not able to store all the EucDist generated. Ideally dist matrix should be of size 37 X 26 but it is only 37 X 1 and all the values are zeros in that. Though I have checked that all 37 X 26 values are getting calculated but don't know why it is not getting stored in dist.
Second, the total time taken when I am using parfor is somewhere around 9.5 hours whereas for normal for it is taking only 5.5 hours.
Can someone please help me to get rid of the above two problems?
Thanks in advance.

Related

Speed-up mex files in Matlab

I have a big loop that I need to compute many times in my code and I thought that it would be possible to speed it up using MATLABcoder. However, the mex file version of my code (the code for the function is attached below) ends up being slower than the m-file. I compiled the code using MINGW64 compiler.
I tried different compiler settings (played with coder.config) and different memory specifications but without any success.
function v_d = test_code(q,v_exp,wealth,b_grid_choice,k_grid_choice,nz,nk,nb)
%#codegen
% Inputs
% q is an array of dimension [nz nk nb]
% v_exp is an array of dimension [nz nk nb]
% wealth is an array of dimension [nz nk nb]
% b_grid_choice is an array of dimension [nk nb]
% k_grid_choice is an array of dimension [nk nb]
% Typically, nz/nk/nb is an integer (currently up to 200)
v_d = coder.nullcopy(zeros(nz,nk,nb));
parfor ii = 1:nz
q_ini_ii = reshape(q(ii,:,:),[],nz);
v_ini_exp_ii = reshape(v_exp(ii,:,:),[],nz);
choice = q_ini_ii.*b_grid_choice - k_grid_choice;
for jj = 1:nk
for kk = 1:nb
% dividends at time t
d = wealth(ii,jj,kk) + choice;
% choosing optimal consumption
vec_d = d + v_ini_exp_ii.*(d>0);
v_d(ii,jj,kk) = max(vec_d,[],'all');
end
end
end
When I run this code when nz=nk=nb=100, the m-file takes 2.5s while the generated mex file takes 7.5s. I tried using both MATLABcoder app as well as the codegen command
codegen -O enable:OpenMP test_code -args {q,v_exp,wealth,b_grid_choice,k_grid_choice,nz,nk,nb}
I also played with codegen.config('mex') but with little impact on the performance.
When I test the speed of the above code I simply generate these matrices using randn command and then pass them into the code. Interestingly, when I generate these matrices inside the function then this has no effect on the speed of the m-file but speeds-up the mex file almost 10 times (so three times faster than the m-file)! This suggests to me that there is a way to speed up that code, but despite my effort I couldn't figure it out. I thought that memory allocation could be behind it, but my attempts with playing with dynamic memory specifications did not lead to any gains either...

matlab: running fft on short time intervals in a for-loop for the length of data

I have some EEG data that I would like to break down into 30-second windows and run a fast Fourier transform on each window of data. I've tried to implement a for-loop and increment the index value by the number of samples in the time window. When I run this, I can see that (1) this works for the first window of data, but not the rest of them because (I think) the "number of samples minus one" leads to fewer elements than necessary for data_fft and thus doesn't have the same dimensions as f, which are both being plotted in a figure. (2) I tried to update the index value by adding the number of samples in a window, but after i = 1, it goes to i = 2 in my workspace and not to i = 7681 as I'd hoped. I've spent an embarrassingly long time on trying to figure out how to change this so it works correctly, so any advice is appreciated! Code is below. Let me know if I can clarify anything.
data_ch6 = data(:,6); % looking at just 1 electrode
tmax = 2*60; % total time in sec I want to analyze; just keeping it to 2 minutes for this exercise
tmax_window = 30; %30 sec window
times = tmax/tmax_window; % number of times fft should be run
Nsamps = tmax*hdr.SPR; % total # samples in tmax; sample rate is 256 hz
Nsamps_window = tmax_window*hdr.SPR; % # samples in time window
f = hdr.SPR*(0:((Nsamps_window-1)/2))/Nsamps_window; % frequency for plotting
for i = 1:Nsamps; % need to loop through data in 30 second windows in tmax
data_fft = abs(fft(data_ch6(i:i+Nsamps_window-1))); %run fft on data window
data_fft = data_fft(i:((i+Nsamps_window-1)/2)); %discard half the points
figure
plot(f, data_fft)
i = i+Nsamps_window;
end
Well there are a few things that are wrong in your code. First, let me start be saying that i is a very poor choice for a variable name since in matlab it usually stand for sqrt{-1}.
As for your code, I assume that you intend to perform windowed FFT without overlapping.
1) Your loop goes from 1 to Nsamps with an increment of 1. That means the each time you advance 1 sample. In other words you have Nsamps_window-1 overlap. You can use perhaps i=1:Nsamps_window:Nsamps-Nsamps_window-1 if you are not interested in overlapping.
2) The length of data_fft is Nsamps_window, so I think what you wanted to do is data_fft = data_fft(1:round(Nsamps_window/2));
3) When plotting FFT results, I suggest using dB: plot(20*log10(abs(data_fft)));
4) The line i = i+Nsamps_window; is meaningless since i is your loop variable (it has not effect).

Lorenz System in MATLAB- making of simulation and movie

I am trying to simulate trajectories in the Lorenz System in MATLAB, with currently using the following code -
clear all
clf;
clc;
% Solution
[t1,x1] = ode45('g',[0 30],[0;2;0]);
[t2,x2] = ode45('g2',[0 30],[0;2.001;0]);
[C,h] = size(x2);
ang = 0;
for j = 1:C
p1(j,:)= x1(j,:);
p2(j,:)= x2(j,:); % Plot
plot3(p1(:,1),p1(:,2),p1(:,3),'k', p2(:,1),p2(:,2),p2(:,3),'r'); hold on;
plot3(p1(j,1),p1(j,2),p1(j,3),'ko','markerfacecolor','k');
plot3(p2(j,1),p2(j,2),p2(j,3),'rd','markerfacecolor','r'); hold off
axis([-20 20 -40 40 0 50])
axis off
set(gca,'color','none') % Rotation
camorbit(ang,0,[p1(1,1),p1(1,2),p1(1,3)])
ang = ang + (360/C); % Record
set(gcf, 'units','normalized','outerposition',[0 0 1 1])
F(j)= getframe(gcf);
end
movie(F)
clf;
close;
With the functions g, g2 defined in the same way:
function xdot = g(t,x)
xdot = zeros(3,1);
sig = 10;
rho = 28;
bet = 8/3;
xdot(1) = sig*(x(2)-x(1));
xdot(2) = rho*x(1)-x(2)-x(1)*x(3);
xdot(3) = x(1)*x(2)-bet*x(3);
Which is the Lorenz System. The purpose of this whole code is to make a movie of the trajectory of two initial states that vary very slightly, in order to demonstrate the chaotic behaviour of this system. The code itself does in fact work, but takes all of my computer's memory, and in an attempt to make a .avi file of the trajectory, it complained about exceeding 7.5 GB - which is of course way too much for this simulation.
My question consists of two parts:
(1) How do I manage this code in order to make it run more smoothly?
(2) How can I make a .avi file of the trajectory? I tried to find a way on the internet for a long time, but either MATLAB or my computer gave up every time.
Thanks in advance!
As already mentioned in my comment above: your code runs quite smoothly on my Laptop machine (an "old" i5 processor, 8 GB memory). Approximately 102 % CPU load is generated and about 55 % of my memory is used during the frame generation process.
To write your frames to a video file is used the following commands:
v = VideoWriter('LorenzAnimation.avi');
open(v);
writeVideo(v,F);
close(v);
This outputs a file of 47 seconds (C=1421 frames, 30 frames per second) duration and frames of size 1364 × 661 pixels each. The file is about 38 MB. Both generating the frames and writing the video took about 3 minutes on my machine (using tic/toc).
I cannot tell you much about CPU load during the video writing process (varying between 5 and 400 %). It took about up to 82 % of my memory. Better do not touch your machine within this process.
Note: make sure that you do not change the size of the figure window as all frames must be the same size, else MATLAB will return with an error message.
Things that might influence the "smoothness":
you are using a bigger frame size than me
you are not using compressed video, what was your approach to write the video file?
the scheduler of your operating system does a bad/good job
your machine is even slower than mine (unlikely)
Edit: initializing variables you are operating on (e.g. vectors and matrices) often speeds up as you are pre-allocating memory. I have tried this for the frame generation process (where 540, 436, 3 should be replaced by your frame dimensions - manually or automatically
G = struct('cdata', uint8( zeros(540, 436, 3) ), 'colormap', []);
G = repmat( G, 1, C );
This gave me a little speed-up, though I am not sure if that's the perfect way to initialize a struct array.

Using Euler's Method in Matlab

First time post here. Pretty frustrated right now working on this assignment for class.
Basically, the idea is to use Euler's method to simulate and graph an equation of motion. The equation of motion is in the form of an ODE.
My professor has already put down some code for slightly similar system and would like us to derive the equation of motion using Lagrange. I believe that I have derived the EOM correctly, however I am running into problems on the Matlab side of things.
What's weird is that using a similar technique on another, seperate EOM, I have no issues. So I am unsure what I am doing wrong.
Here's the code for the part that is working correctly:
close all; clear all; clc;
% System parameters
w = 2*pi;
c = 0.02;
% Time vectors
dt = 1e-5;
t = 0:dt:4;
theta = zeros(size(t));
thetadot = zeros(size(t));
% Initial conditions
theta(1)=pi/2; %theta(0)
thetadot(1)=0; %thetadot(0)
for I = 1 : length(t)-1;
thetaddot = -c*thetadot(I)-w^2*sin(theta(I));
thetadot(I+1)=thetadot(I)+thetaddot*dt;
theta(I+1)=theta(I)+thetadot(I)*dt ;
end
figure(1);
plot(t,theta,'b');
xlabel('time(s)');
ylabel('theta');
title('Figure 1');
zoom on;
% Output the plot to a pdf file, and make it 6 inches by 4 inches
printFigureToPdf('fig1.pdf', [6,4],'in');
% Open the pdf for viewing
open fig1.pdf
Everything runs fine, except Matlab complains about the printFigureToPdf command.
Now, here is the code for the problem that I am having issues with.
close all; clear all; clc; clf
% System parameters
m=0.2;
g=9.81;
c=.2;
d=0.075;
L=0.001; %L is used for Gamma
B=0.001; %B is used for Beta
W=210*pi; %W is used for Omega
%Time vectors
dt = 1e-6; %Time Step
t=0:dt:10; %Range of times that simulation goes through
x=zeros(size(t));
xdot=zeros(size(t));
%Initialconditions
x(1)=0;%x(0)
xdot(1)=0; %xdot(0)
for I = 1 : length(t)-1;
xddot =-1/m*(c*xdot(I)-c*L*W*cos(W)+m*g-3*B*((d+x-L*W*sin(W*t)).^(-4)-(d-x-L*W*sin(W*t)).^(-4)));
xdot(I+1)=xdot(I)+xddot*dt;
x(I+1)=x(I)+xdot(I+1)*dt ;
end
figure(1);
plot(t,x,'b');
xlabel('time(s)');
ylabel('distance(m)');
title('Figure 2');
zoom on;
% Output the plot to a pdf file, and make it 6 inches by 4 inches
printFigureToPdf('fig1.pdf', [6,4],'in');
% Open the pdf for viewing
open fig1.pdf
With this code, I followed the same procedure and is giving an error on line 23: "In an assignment A(I) = B, the number of elements in B and I must be the same."
Like I said, I am confused because the other code worked okay, and this second set of code gives an error.
If anyone could give me a hand with this, I would greatly appreciate it.
Thanks in advance,
Dave
Edit: As suggested, I changed x(I+1)=x(I)+xdot(I+1)*dt to x(I+1)=x(I)+xdot(I)*dt. However, I am still getting an error for line 23: "In an assignment A(I) = B, the number of elements in B and I must be the same."
Line 23 is: xdot(I+1)=xdot(I)+xddot*dt;
So, I tried adjusting the code as suggested for the other line to xdot(I+1)=xdot(I)+xddot(I)*dt;
After making this change, Matlab gets stuck, I tried letting it run for a few minutes but won't execute. I ended up having to close and reopen the application.
The error In an assignment A(I) = B, the number of elements in B and I must be the same. is something you should understand because it may pop up frequently in Matlab if you are not careful.
In your case, you are trying to assign 1 element value xdot(I+1) with something which has more than 1 element xdot(I)+xddot*dt.
Indeed, if you step through the code line by line and observe your workspace, you will notice that xddot is not a scalar value as intended, but a full blown vector the size of t. This is because in the precedent line where you define xddot:
xddot =-1/m*(c*xdot(I)-c*L*W*cos(W)+m*g-3*B*((d+x-L*W*sin(W*t)).^(-4)-(d-x-L*W*sin(W*t)).^(-4)));
you still have many references to x (full vector) and t (full vector). You have to replace all these references to full vectors to only one index of them, i.e use x(I) and t(I). The line becomes:
xddot =-1/m*(c*xdot(I)-c*L*W*cos(W)+m*g-3*B*((d+x(I)-L*W*sin(W*t(I))).^(-4)-(d-x(I)-L*W*sin(W*t(I))).^(-4)));
With that your code runs just fine. However, it is far from optimized and it runs relatively slow. I have a powerful machine and it still takes a long time to run for me. I suggest you reduce your time step to something more sensible, at least when you are still trying your code. If you really need that kind of precision, first make sure your code runs fine then when it is ready let it run at full precision and go have a coffee while your computer is doing the work.
The snippet below is the loop part of your code with the correct assignment for xddot. I also added a simple progress bar so you can see that your code is doing something.
hw = waitbar(0,'Please wait...') ;
npt = length(t)-1 ;
for I = 1 : npt
xddot =-1/m*(c*xdot(I)-c*L*W*cos(W)+m*g-3*B*((d+x(I)-L*W*sin(W*t(I))).^(-4)-(d-x(I)-L*W*sin(W*t(I))).^(-4)));
xdot(I+1) = xdot(I)+xddot*dt;
x(I+1) = x(I)+xdot(I+1)*dt ;
pcdone = I / npt ;
waitbar(pcdone,hw,[num2str(pcdone*100,'%5.2f') '% done'])
end
close(hw)
I strongly suggest you reduce your time step to dt = 1e-3; until you are satisfied with everything else.
In the final version, you can remove or comment the calls to the waitbar as it slows down things too.

MATLAB Error:Out of Memory

So I'm trying to perform STFT on a piano recording using matlab, but I get the following error.
Warning: Input arguments must be scalar.
In test3 at 35
??? Error using ==> zeros
Out of memory. Type HELP MEMORY for your options.
Error in ==> test3 at 35
song = cat(1,song,zeros(n_of_padding,1));
The coding I've used is taken from a sample code found on the net.
clc;
clear all;
[song,FS] = wavread('c scale fast.wav');
song = sum(song,2);
song = song/max(abs(song));
wTime = 0.05;
ZP_exp = 1;
P_OL = 50;
% Number of STFT samples per STFT slice
N_window = floor(wTime*FS);
% Number of overlapping points
window_overlap = floor(N_window*(P_OL/100));
wTime = N_window/FS;
%size checking
%make sure there are integer number of windows if not zero pad until they are
L = size(song);
%determine the number of times-1 the overlapping window will fit the song length
N_of_windows = floor(L - N_window/(N_window - window_overlap));
%determine the remainder
N_of_points_left = L - (N_window + N_of_windows*(N_window - window_overlap));
%Calculate the number of points to zero pad
n_of_padding = (N_window - window_overlap) - N_of_points_left;
%append the zeros to the end of the song
song = cat(1,song,zeros(n_of_padding,1));
clear n_of_windows n_of_points_left n_of_padding
n_of_windows = floor((L - N_window)/(N_window - window_overlap))+1;
windowing = hamming(N_window);
N_padding = 2^(nextpow2(N_window)+ZP_exp);
parfor k = 1:N_of_windows
starting = (k-1)*(N_window -window_overlap) +1;
ending = starting+N_window-1;
%Define the Time of the window, i.e., the center of window
times(k) = (starting + ceil(N_window/2))/Fs;
%apply windowing function
frame_sample = music(starting:ending).*windowing;
%take FFT of sample and apply zero padding
F_trans = fft(frame_sample,N_padding);
%store FFT data for later
STFT_out(:,k) = F_trans;
end
Based on some assumptions I would reason that:
- n_of_padding should be smaller than N_window
- N_window is much smaller FS
- Fs is not too high (frequency of your sound, so should not exceed a few thousand?!)
- Your zeros matrix will not be huge
This should mean that the problem is not that you are creating a too large matrix, but that you already filled up the memory before this call.
How to deal with this?
First type dbstop if error
Run your code
When it stops check all variable sizes to see where the space has gone.
If you don't see anything strange (and the big storage is really needed) then you may be able to process your song in parts.
In line 35 you are trying to make an array that exceeds your available memory. Note that a 1 by n array of zeros alone, is n*8 bytes in size. This means if you make such an array, call it x, and check it with whos('x'), like:
x = zeros(10000,1);
whos('x');
You will likely find that x is 80000 bytes. Maybe by adding such an array to your song variable is adding the last bytes that breaks the memory-camel's back. Using and whos('variableName') take whatever the size of song is before line 35, separately add the size of zeros(n_of_padding,1), convert that to MB, and see if it exceeds your maximum possible memory given by help memory.
The most common implication of Out of memory errors on Matlab is that it is unable to allocate memory due to the lack of a contiguous block. This article explains the various reasons that can cause an Out of memory error on MATLAB.
The Out of memory error often points to a faulty implementation of code that expands matrices on the fly (concatenating, out-of-range indexing). In such scenarios, MATLAB creates a copy in memory i.e memory twice the size of the matrix is consumed with each such occurrence.
On Windows this problem can be alleviated to some extent by passing the /3GB /USERVA=3030 switch during boot as explained here. This enables additional virtual memory to be addressed by the application(MATLAB in this case).