how to improve the running speed of this Matlab code? - matlab

I am trying to simulate a motion of some particles.The program seems to be very slow.
I just started lean programming so I dont know where is the problem exactly that make it slow, I think that plotting is take much time.
could some sone suggest me how to improve it?
function folks(N , T)
if nargin < 1
N = 50;
T = 100;
end
A=20;R=50;a=100;r=2;
x0=10*randn(3,N);
v0=0*randn(3,N);
clear c
% Initilazing plot
color = hsv(N);
xh=zeros(1,N);
f=2*max(max(x0));ff=f/1000000;
figure(2),clf
set(gcf,'doublebuffer','on')
hold on, grid on, axis([-1 1 -1 1 -.5 .5]*f)
for j = 1:N
xh(j) = line(x0(1,j),x0(2,j),x0(3,j),'color',color(j,:), ...
'marker','.','markersize',15);
end
title('t = 0','fontsize',18)
rotate3d;
view([1.8,-1.8,1])
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Solving the ode system
tol = 3e-14;
opts = odeset('reltol',tol,'abstol',tol);
X0=[x0(:);v0(:)];
dt=T/1000;
for tnext = dt:dt:10*T
tspan = [tnext-dt tnext];
[T1,X1] = ode45(#odefolk,tspan,X0,opts,r,R,a,A);
X0 = X1(end,:);
if max(abs(X0(1:3*N)))>f
f=1.1*f;
axis([-1 1 -1 1 -1 1]*f)
end
for j=0:N-1
set(xh(j+1),'xdata',X0(3*j+1),'ydata',X0(3*j+2),'zdata',X0(3*j+3))
end
title(sprintf('t = %3.0f %5.0e',tnext),'fontsize',18),
drawnow
end
I tried to save the data in a matrix and then plot every thing in new loop but it make it much slower.
And also its not related to the solver ode45 or the system odefolk
function dx = fkt(~,x,r,R,a,A)
n = length(x); % n = 90 = 6 *N
M = n/6; % M= 15
dx = zeros(n,1);
for i = 1 : 3 : n/2
vx = 0 ; vy = 0 ; vz = 0;
for j = 1 : 3 : n/2
if (i~=j)
vx = x(i) - x(j) ;
vy = x(i+1) - x(j+1);
vz = x(i+2) - x(j+2);
% ex = expo (vx , vy , vz ,r , a);
%
% val = (ex(1) * R - ex(2) * A);
leng = sqrt(vx^ 2 +vy^ 2 + vz^ 2);
expo1 = exp(-1 * leng / r) / r;
expo2 = exp(-1 * leng / a) / a;
val = (expo1 * R - expo2 * A);
vx = vx * val;
vy = vy * val;
vz = vz * val;
end
end
vx = vx/M ;
vy = vy/M;
vz = vz/M;
dx(i) = x(i + 3 * M );
dx(i + 1) = x(i + 3 * M + 1);
dx(i + 2) = x(i + 3 * M + 2);
dx(i + 3 * M ) = vx;
dx(i + 3 * M + 1 ) = vy;
dx(i + 3 * M + 2 ) = vz;
end
end

To perform a code, it's a good idea to analyze it with the Profiler:
https://es.mathworks.com/help/matlab/ref/profile.html
Anyway, I see some things you could change. First, avoid to draw in each iteration. Do it in the end or draw every M interations.
Secondly, avoid using a loop to define the graphic. Plot it with vector type input data.
Thirdly, ode45 can do the iterations for you and can return you a vector with all the interations. Use it to plot.

Some tips:
use tic and toc to measure the execution time of your code (lines, functions, etc.)
preassign arrays and avoid letting them grow in loops
do not use plot in a loop. Try to use low-level commands and change the value of exisiting lines by refering to their handles (like for example handle = line([0 1][2 3][4 5]) and then set(handle,'XDATA',[4 7],'YDATA',[2 6],'ZDATA',[2 4]))
MATLAB is an interpreted language, so your code will run slower in most cases compared to compiled programs

Related

Speeding up code which integrates a 2D gpuArray matrix using Simpson's Rule

I have a (real) 2D gpuArray, which I am using as part of a larger code, and now am trying to also integrate the array using the Composite Simpson Rule inside my main loop (several 10000 iterations at least). A MWE looks like the following:
%%%%%%%%%%%%%%%%%% MAIN CODE %%%%%%%%%%%%%%%%%%
Ny = 501; % Dimensions of matrix M
Nx = 503; %
dx = 0.1; % Grid spacings
dy = 0.2; %
M = rand(Ny, Nx, 'gpuArray'); % Initialise a matrix
for k = 1:10000
% M = function1(M) % Apply some other functions to M
% ... etc ...
I = simpsons_integration_2D(M, dx, dy, Nx, Ny); % Now integrate M
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%% Integrator %%%%%%%%%%%%%%%%%
function I = simpsons_integration_2D(F, dx, dy, Nx, Ny)
% Integrate the 2D function F with Nx columns and Ny rows, and grid spacings
% dx and dy using Simpson's rule.
% Integrate along x direction (vertically) --> IX is a vector afterwards
sX = sum( F(:,1:2:Nx-2) + 4*F(:,2:2:(Nx-1)) + F(:,3:2:Nx) , 2);
IX = dx/3 * sX;
% Integrate along y direction --> I is a scalar afterwards
sY = sum( IX(1:2:Ny-2) + 4*IX(2:2:(Ny-1)) + IX(3:2:Ny) , 1);
I = dy/3 * sY;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The operation of performing the integration is around 850 µs, which is currently a significant part of my code. This was measured using
f = #() simpsons_integration_2D(M, dx, dy, Nx, Ny);
t = gputimeit(f)
Is there a way to reduce the execution time for integrating the gpuArray matrix?
(The graphics card is the Nvidia Quadro P4000)
Many thanks
Assuming that the matrix has odd dimensions here is a way to optimize the function:
function I = simpsons_integration_2D(F, dx, dy, Nx, Ny)
sX = 2 * sum(F,2) + 2 * sum (F(:,2:2:(Nx-1)),2) - F(:,1) - F(:,Nx);
sY = dx/3 * (2 * sum(sX) + 2 * sum (sX(2:2:(Ny-1))) - sX(1) - sX(Ny));
I = dy/3 * sY;
end
EDIT
A more optimized solution using matrix multiplication:
function I = simpsons_integration_2D2(F, dx, dy, Nx, Ny)
mx = repmat (2, Nx, 1);
mx(2:2:(Nx-1)) = 4;
mx(1) = 1;
mx(Nx) = 1;
my = repmat (2, 1, Ny);
my(2:2:(Ny-1)) = 4;
my(1) = 1;
my(Ny) = 1;
I = (dx*dy/9) * (my * (F * mx));
end
If Nx and Ny are the same you need to compute only one of them mx or my:
function I = simpsons_integration_2D2(F, dx, dy, Nx, Ny)
mx = repmat (2, Nx, 1);
mx(2:2:(Nx-1)) = 4;
mx(1) = 1;
mx(Nx) = 1;
I = (dx*dy/9) * (mx.' * (F * mx));
end
If Nx and Ny are constant you can precompute mx outside the function and pass it as a function argument:
function I = simpsons_integration_2D2(F, dx, dy, mx)
I = (dx*dy/9) * (mx.' * (F * mx));
end
EDIT:
If both mx and my can be precomputed the problem is reduced to a dot product:
m = reshape (my.' .* mx.', 1, []);
function I = simpsons_integration_2D3(F, dx, dy, m)
I = (dx*dy/9) * (m * F(:));
end
Well I cannot test this for you but there are a few things that may help.
First the axis 1 and then the the axis 2 may make some difference in terms of locality of the modified terms (I don't know if to better or to worse).
function I = variation1(F, dx, dy, Nx, Ny)
% Sum each term separately, prevents the creation of a big intermediate matrix
% Multiply outside the summation does only Ny multiplications by 4 instead of Ny*Nx/2
sX = sum(F(:,1:2:Nx-2), 2) + 4*sum(F(:,2:2:(Nx-1)), 2) + sum(F(:,3:2:Nx), 2);
IX = dx/3 * sX;
sY = sum(IX(1:2:Ny-2), 1) + 4*sum(IX(2:2:(Ny-1)), 1) + sum(IX(3:2:Ny) , 1);
I = dy/3 * sY;
end
function I = variation2(F, dx, dy, Nx, Ny)
% a.
% Sum each term separately, prevents the creation of a big intermediate matrix
% Multiply outside the summation does only Ny multiplications by 4 instead of Ny*Nx/2
% b.
% Notice that the terms 2:3:NX-2 appear in two summations
% Saves Nx*Ny/2 additions at the expense of Ny multiplications by 2
sX = 2*sum(F(:,3:2:Nx-2), 2) + 4*sum(F(:,2:2:(Nx-1)), 2) + F(:,1) + F(:,Nx);
% saves Ny multiplications by moving the constant factor after the next sum
sY = 2*sum(sX(3:2:Ny-2), 1) + 4*sum(sX(2:2:(Ny-1)), 1) + sX(1) + sX(Ny);
I = (dy*dy/9) * sY;
end
function I = alternate_simpsons_integration_2D(F, dx, dy, Nx, Ny)
% Integrate the 2D function F with Nx columns and Ny rows, and grid spacings
% dx and dy using Simpson's rule.
% Notice that sum(F(:,1:2:Nx-2) + F(:,3:2:Nx)) have all but the end poitns repeated.
IX = 4*sum(F(:,2:2:Nx-1), 2) + 2 * sum(F(:,3:2:Nx-2) , 2) + F(:,1) + F(:,Nx);
disp(size(IX))
% Integrate along y direction --> I is a scalar afterwards
sY = 4*sum(IX(2:2:Ny-1)) + 2*sum(IX(3:2:Ny-2)) + IX(1) + IX(Ny);
I = dy*dy/9 * sY;
end
If you think it is better to make a single summation then you can do using the formula 2*(sum(2*F(2:2:end-1) + F(1:2:end-2)) + F(end) - F(1) that gives the same result but has Nx*Ny/2 less additions on the first integration. But these options have to be tested in your environment.
Transposed implementation
function I = transposed_simpsons_integration_2D(F, dx, dy, Nx, Ny)
sY = 2*sum(2*F(2:2:end-1, :) + F(1:2:end-2, :), 1) + F(end, :) - F(1, :);
sX = 2*sum(2*sY(2:2:end-1) + sY(1:2:end-2)) + sY(end) - sY(1);
I = dy*dy/9 * sX;
end
Using octave (usually slower than matlab) I get a run time of ~400us per iteration with. This is not the type of workload that will be interesting to run on the GPU. For comparison, randn about 10 times slower than this function.

solving integral in rectangular method in matlab

This is my code to make an integration in the rectangular method in the matlab
f=#(x) (x^(1/2))
a = 1
b = 10
% step size
h = 0.25
n = 0 % the counter
xn= a + (n * h)
%%
%Rectangle Method:
s=0
for i =0:n-1
s = s + f(xn)
end
Rectangle = h * s
the answer should be around 20, but i'm getting 29.5
what's the problem?
Two errors:
1) xn is not updated.
2) number of points n is set to zero.
There are other minor issues which I did not fix, e.g. right and left boundary points should both contribute to the sum with weight 1/2.
Quick fix:
f=#(x) (x^(1/2));
a = 1;
b = 10 ;
% step size
h = 0.25;
n = (b-a)/h; % the counter
%%
%Rectangle Method:
s=0;
for i =0:n-1
xn= a + (i * h);
s = s + f(xn);
end
Rectangle = h * s;

Matlab: How to fix Least Mean square algorithm code

I am studying about Least Mean Square algorithm and saw this code. Based on the algorithm steps, the calculation of the the error and weight updates looks alright. However, it fails to give the correct output. Can somebody please help in fixing the problem? The code has been taken from:
http://www.mathworks.com/matlabcentral/fileexchange/35670-lms-algorithm-implementation/content/lms.m
clc
close all
clear all
N=input('length of sequence N = ');
t=[0:N-1];
w0=0.001; phi=0.1;
d=sin(2*pi*[1:N]*w0+phi);
x=d+randn(1,N)*0.5;
w=zeros(1,N);
mu=input('mu = ');
for i=1:N
e(i) = d(i) - w(i)' * x(i);
w(i+1) = w(i) + mu * e(i) * x(i);
end
for i=1:N
yd(i) = sum(w(i)' * x(i));
end
subplot(221),plot(t,d),ylabel('Desired Signal'),
subplot(222),plot(t,x),ylabel('Input Signal+Noise'),
subplot(223),plot(t,e),ylabel('Error'),
subplot(224),plot(t,yd),ylabel('Adaptive Desired output')
EDIT
The code from the answer :
N = 200;
M = 5;
w=zeros(M,N);
mu=0.2;%input('mu = ');
y(1) = 0.0;
y(2) = 0.0;
for j = 3:N
y(j) = 0.95*y(j-1) - 0.195*y(j-2);
end
x = y+randn(1,N)*0.5;
%x= y;
d = y;
for i=(M+1):N
e(i) = d(i) - x((i-(M)+1):i)*w(:,i);
w(:,i+1) = w(:,i) + mu * e(i) * x((i-(M)+1):i)';
end
for i=(M+1):N
yd(i) = x((i-(M)+1):i)*w(:,i);
end
The weight matrix w which stores the coefficients are all zero, meaning that the LMS equations are not working correctly.
I also do not find any mistake in your code. But I doubt that this algorithm is suitable for this kind of noise. You will get better results when using a filter of higher order (M in this case):
M = 5;
w=zeros(M,N);
mu=0.2;%input('mu = ');
for i=(M+1):N
e(i) = d(i) - x((i-(M)+1):i)*w(:,i);
w(:,i+1) = w(:,i) + mu * e(i) * x((i-(M)+1):i)';
end
for i=(M+1):N
yd(i) = x((i-(M)+1):i)*w(:,i);
end
N=input('length of sequence N = ');
t=[0:N-1];
w0=0.001; phi=0.1;
d=sin(2*pi*[1:N]*w0+phi);
x=d+randn(1,N)*0.5;
w=zeros(1,N);
mu=input('mu = ');
for i=1:N
yd(i)=w*x';
e(i) = d(i) - w * x';
for m=1:N
w(m) = w(m) + mu * e(i) * x(m);
end
end
subplot(221),plot(t,d),ylabel('Desired Signal'),
sub plot(222),plot(t,x),ylabel('Input Signal+Noise'),
subplot(223),plot(t,e),ylabel('Error'),
subplot(224),plot(t,yd),ylabel('Adaptive Desired output')
What you were missing was the multiplication of error term in a single iteration with each sample of input and separate update of weights

matlab and matrix dimensions

Got one more problem with matrix multiplication in Matlab. I have to plot Taylor polynomials for the given function. This question is similar to my previous one (but this time, the function is f: R^2 -> R^3) and I can't figure out how to make the matrices in order to make it work...
function example
clf;
M = 40;
N = 20;
% domain of f(x)
x1 = linspace(0,2*pi,M).'*ones(1,N);
x2 = ones(M,1)*linspace(0,2*pi,N);
[y1,y2,y3] = F(x1,x2);
mesh(y1,y2,y3,...
'facecolor','w',...
'edgecolor','k');
axis equal;
axis vis3d;
axis manual;
hold on
% point for our Taylor polynom
xx1 = 3;
xx2 = 0.5;
[yy1,yy2,yy3] = F(xx1,xx2);
% plots one discrete point
plot3(yy1,yy2,yy3,'ro');
[y1,y2,y3] = T1(xx1,xx2,x1,x2);
mesh(y1,y2,y3,...
'facecolor','w',...
'edgecolor','g');
% given function
function [y1,y2,y3] = F(x1,x2)
% constants
R=2; r=1;
y1 = (R+r*cos(x2)).*cos(x1);
y2 = (R+r*cos(x2)).*sin(x1);
y3 = r*sin(x2);
function [y1,y2,y3] = T1(xx1,xx2,x1,x2)
dy = [
-(R + r*cos(xx2))*sin(xx1) -r*cos(xx1)*sin(xx2)
(R + r*cos(xx2))*cos(xx1) -r*sin(xx1)*sin(xx2)
0 r*cos(xx2) ];
y = F(xx1, xx2) + dy.*[x1-xx1; x2-xx2];
function [y1,y2,y3] = T2(xx1,xx2,x1,x2)
% ?
I know that my code is full of mistakes (I just need to fix my T1 function). dy represents Jacobian matrix (total derivation of f(x) - I hope I got it right...). I am not sure how would the Hessian matrix in T2 look, by I hope I will figure it out, I'm just lost in Matlab...
edit: I tried to improve my formatting - here's my Jacobian matrix
[-(R + r*cos(xx2))*sin(xx1), -r*cos(xx1)*sin(xx2)...
(R + r*cos(xx2))*cos(xx1), -r*sin(xx1)*sin(xx2)...
0, r*cos(xx2)];
function [y1,y2,y3]=T1(xx1,xx2,x1,x2)
R=2; r=1;
%derivatives
y1dx1 = -(R + r * cos(xx2)) * sin(xx1);
y1dx2 = -r * cos(xx1) * sin(xx2);
y2dx1 = (R + r * cos(xx2)) * cos(xx1);
y2dx2 = -r * sin(xx1) * sin(xx2);
y3dx1 = 0;
y3dx2 = r * cos(xx2);
%T1
[f1, f2, f3] = F(xx1, xx2);
y1 = f1 + y1dx1*(x1-xx1) + y1dx2*(x2-xx2);
y2 = f2 + y2dx1*(x1-xx1) + y2dx2*(x2-xx2);
y3 = f3 + y3dx1*(x1-xx1) + y3dx2*(x2-xx2);

Recomposing vector input to algorithm from matrix output

I've written some code to implement an algorithm that takes as input a vector q of real numbers, and returns as an output a complex matrix R. The Matlab code below produces a plot showing the input vector q and the output matrix R.
Given only the complex matrix output R, I would like to obtain the input vector q. Can I do this using least-squares optimization? Since there is a recursive running sum in the code (rs_r and rs_i), the calculation for a column of the output matrix is dependent on the calculation of the previous column.
Perhaps a non-linear optimization can be set up to recompose the input vector q from the output matrix R?
Looking at this in another way, I've used an algorithm to compute a matrix R. I want to run the algorithm "in reverse," to get the input vector q from the output matrix R.
If there is no way to recompose the starting values from the output, thereby treating the problem as a "black box," then perhaps the mathematics of the model itself can be used in the optimization? The program evaluates the following equation:
The Utilde(tau, omega) is the output matrix R. The tau (time) variable comprises the columns of the response matrix R, whereas the omega (frequency) variable comprises the rows of the response matrix R. The integration is performed as a recursive running sum from tau = 0 up to the current tau timestep.
Here are the plots created by the program posted below:
Here is the full program code:
N = 1001;
q = zeros(N, 1); % here is the input
q(1:200) = 55;
q(201:300) = 120;
q(301:400) = 70;
q(401:600) = 40;
q(601:800) = 100;
q(801:1001) = 70;
dt = 0.0042;
fs = 1 / dt;
wSize = 101;
Glim = 20;
ginv = 0;
R = get_response(N, q, dt, wSize, Glim, ginv); % R is output matrix
rows = wSize;
cols = N;
figure; plot(q); title('q value input as vector');
ylim([0 200]); xlim([0 1001])
figure; imagesc(abs(R)); title('Matrix output of algorithm')
colorbar
Here is the function that performs the calculation:
function response = get_response(N, Q, dt, wSize, Glim, ginv)
fs = 1 / dt;
Npad = wSize - 1;
N1 = wSize + Npad;
N2 = floor(N1 / 2 + 1);
f = (fs/2)*linspace(0,1,N2);
omega = 2 * pi .* f';
omegah = 2 * pi * f(end);
sigma2 = exp(-(0.23*Glim + 1.63));
sign = 1;
if(ginv == 1)
sign = -1;
end
ratio = omega ./ omegah;
rs_r = zeros(N2, 1);
rs_i = zeros(N2, 1);
termr = zeros(N2, 1);
termi = zeros(N2, 1);
termr_sub1 = zeros(N2, 1);
termi_sub1 = zeros(N2, 1);
response = zeros(N2, N);
% cycle over cols of matrix
for ti = 1:N
term0 = omega ./ (2 .* Q(ti));
gamma = 1 / (pi * Q(ti));
% calculate for the real part
if(ti == 1)
Lambda = ones(N2, 1);
termr_sub1(1) = 0;
termr_sub1(2:end) = term0(2:end) .* (ratio(2:end).^-gamma);
else
termr(1) = 0;
termr(2:end) = term0(2:end) .* (ratio(2:end).^-gamma);
rs_r = rs_r - dt.*(termr + termr_sub1);
termr_sub1 = termr;
Beta = exp( -1 .* -0.5 .* rs_r );
Lambda = (Beta + sigma2) ./ (Beta.^2 + sigma2); % vector
end
% calculate for the complex part
if(ginv == 1)
termi(1) = 0;
termi(2:end) = (ratio(2:end).^(sign .* gamma) - 1) .* omega(2:end);
else
termi = (ratio.^(sign .* gamma) - 1) .* omega;
end
rs_i = rs_i - dt.*(termi + termi_sub1);
termi_sub1 = termi;
integrand = exp( 1i .* -0.5 .* rs_i );
if(ginv == 1)
response(:,ti) = Lambda .* integrand;
else
response(:,ti) = (1 ./ Lambda) .* integrand;
end
end % ti loop
No, you cannot do so unless you know the "model" itself for this process. If you intend to treat the process as a complete black box, then it is impossible in general, although in any specific instance, anything can happen.
Even if you DO know the underlying process, then it may still not work, as any least squares estimator is dependent on the starting values, so if you do not have a good guess there, it may converge to the wrong set of parameters.
It turns out that by using the mathematics of the model, the input can be estimated. This is not true in general, but for my problem it seems to work. The cumulative integral is eliminated by a partial derivative.
N = 1001;
q = zeros(N, 1);
q(1:200) = 55;
q(201:300) = 120;
q(301:400) = 70;
q(401:600) = 40;
q(601:800) = 100;
q(801:1001) = 70;
dt = 0.0042;
fs = 1 / dt;
wSize = 101;
Glim = 20;
ginv = 0;
R = get_response(N, q, dt, wSize, Glim, ginv);
rows = wSize;
cols = N;
cut_val = 200;
imagLogR = imag(log(R));
Mderiv = zeros(rows, cols-2);
for k = 1:rows
val = deriv_3pt(imagLogR(k,:), dt);
val(val > cut_val) = 0;
Mderiv(k,:) = val(1:end-1);
end
disp('Running iteration');
q0 = 10;
q1 = 500;
NN = cols - 2;
qout = zeros(NN, 1);
for k = 1:NN
data = Mderiv(:,k);
qout(k) = fminbnd(#(q) curve_fit_to_get_q(q, dt, rows, data),q0,q1);
end
figure; plot(q); title('q value input as vector');
ylim([0 200]); xlim([0 1001])
figure;
plot(qout); title('Reconstructed q')
ylim([0 200]); xlim([0 1001])
Here are the supporting functions:
function output = deriv_3pt(x, dt)
% Function to compute dx/dt using the 3pt symmetrical rule
% dt is the timestep
N = length(x);
N0 = N - 1;
output = zeros(N0, 1);
denom = 2 * dt;
for k = 2:N0
output(k - 1) = (x(k+1) - x(k-1)) / denom;
end
function sse = curve_fit_to_get_q(q, dt, rows, data)
fs = 1 / dt;
N2 = rows;
f = (fs/2)*linspace(0,1,N2); % vector for frequency along cols
omega = 2 * pi .* f';
omegah = 2 * pi * f(end);
ratio = omega ./ omegah;
gamma = 1 / (pi * q);
termi = ((ratio.^(gamma)) - 1) .* omega;
Error_Vector = termi - data;
sse = sum(Error_Vector.^2);