I need to create a square matrix $V$ iteratively over 100000+ times per pack.
When just doing it traditionally, the computational consumption is at around 70s.(Over 1 mintes) And I need to repeate this process for over 100 packs.That's about 1 hours extra time.
It turned out to me that when calculating the matrix using a double for loop $V(x,y)$, the matlab is only using a single thread. Howver, there are 12 threads in the computer, and there should be a way to use all of them to assign the matrix faster.
The type of function is
$V(x,y)=exp((x-variation_1).^2+(y-variation_2).^2)$
I thought about using GPU. However, as it turned out, the GPU array is calculating it much slower than CPU.
I also thought about using the parpool function. However, not only it cost more time to send the matrix into the parallel pool, is also denied the access to the $V$ itself.
How can I tell the CPU to calculating the matrix with all the threads at faster speed?
You should always use matrix and vector operations rather than for loop.
If x and y are constant for all cases, you can use meshgrid to generate x and y once.
for example, consider the following code which uses double for loop:
v = zeros(10000,10000);
tic;
for x=1:10000
for y = 1:10000
v(x,y) = exp((x/10000).^2+(y/10000).^2);
end
end
toc
On my computer it runs about 11 seconds.
Now by using meshgrid:
%This is done only once
[x,y] = meshgrid((1:10000)/10000,(1:10000)/10000);
tic;
v = exp(x.^2+y.^2);
toc
Which takes about 4 seconds, not including the meshgrid.
Related
I'm writing a Matlab code that needs to calculate distances of vectors and I execute
X = norm(A(:,i)-B(:,j));
%do something with X
%loop over i and j
quite often. It is a relatively small computation so it is not really suitable for parfor, so I thought the best idea would be to implement it with the gpu functions.
I found that pagefun and arrayfun do something like what I want, but they execute element-wise operations and not on vectors.
So my question is, is there a more clever way of calculating norms without for loops? Or if I actually need to use gpu, what is the best way to do it?
If you need the norm between all the elements of A and B, the fastest way is probably this:
N = 1000; % number of elements
dim = 3; % number of dimensions
A = rand(dim,N, 'gpuArray');
B = rand(dim,1,N, 'gpuArray');
C = sqrt(squeeze(sum(bsxfun(#minus, A, B).^2))); % C(i,j) norm
I get the execution time of 0.16 seconds on CPU and 0.13 seconds on the GPU.
I'm trying to code a MATLAB program and I have arrived at a point where I need to do the following. I have this equation:
I must find the value of the constant "Xcp" (greater than zero), that is the value that makes the integral equal to zero.
In order to do so, I have coded a loop in which the the value of Xcp advances with small increments on each iteration and the integral is performed and checked if it's zero, if it reaches zero the loop finishes and the Xcp is stored with this value.
However, I think this is not an efficient way to do this task. The running time increases a lot, because this loop is long and has the to perform the integral and the integration limits substitution every time.
Is there a smarter way to do this in Matlab to obtain a better code efficiency?
P.S.: I have used conv() to multiply both polynomials. Since cl(x) and (x-Xcp) are both polynomials.
EDIT: Piece of code.
p = [1 -Xcp]; % polynomial (x-Xcp)
Xcp=0.001;
i=1;
found=false;
while(i<=x_te && found~=true) % Xcp is upper bounded by x_te
int_cl_p = polyint(conv(cl,p));
Cm_cp=(-1/c^2)*diff(polyval(int_cl_p,[x_le,x_te]));
if(Cm_cp==0)
found=true;
else
Xcp=Xcp+0.001;
end
end
This is the code I used to run this section. Another problem is that I have to do it for different cases (different cl functions), for this reason the code is even more slow.
As far as I understood, you need to solve the equation for X_CP.
I suggest using symbolic solver for this. This is not the most efficient way for large polynomials, but for polynomials of degree 20 it takes less than 1 second. I do not claim that this solution is fastest, but this provides generic solution to the problem. If your polynomial does not change every iteration, then you can use this generic solution many times and not spend time for calculating integral.
So, generic symbolic solution in terms of xLE and xTE is obtained using this:
syms xLE xTE c x xCP
a = 1:20;
%//arbitrary polynomial of degree 20
cl = sum(x.^a.*randi([-100,100],1,20));
tic
eqn = -1/c^2 * int(cl * (x-xCP), x, xLE, xTE) == 0;
xCP = solve(eqn,xCP);
pretty(xCP)
toc
Elapsed time is 0.550371 seconds.
You can further use matlabFunction for finding the numerical solutions:
xCP_numerical = matlabFunction(xCP);
%// we then just plug xLE = 10 and xTE = 20 values into function
answer = xCP_numerical(10,20)
answer =
19.8038
The slight modification of the code can allow you to use this for generic coefficients.
Hope that helps
If you multiply by -1/c^2, then you can rearrange as
and integrate however you fancy. Since c_l is a polynomial order N, if it's defined in MATLAB using the usual notation for polyval, where coefficients are stored in a vector a such that
then integration is straightforward:
MATLAB code might look something like this
int_cl_p = polyint(cl);
int_cl_x_p = polyint([cl 0]);
X_CP = diff(polyval(int_cl_x_p,[x_le,x_te]))/diff(polyval(int_cl_p,[x_le,x_te]));
I'm looking to duplication a 784x784 matrix in matlab along a 3rd axis. The following code seems to work:
mat = reshape(repmat(mat, 1,10000),784,784,10000);
Unfortunately, it takes so long to run it's worthless (changing the 10,000s to 1000 makes it take a few minutes, and using 10,000 makes my whole machine freeze up practically). is there a faster way to do this?
For reference, I'm looking to use mvnpdf on 10,000 vectors each of length 784, using the same covariance matrix for each. So my final call looks like
mvnpdf(X,mu,mat)
%size(X) = (10000,784), size(mu) = (10000,784), size(mat) = 784,784,10000
If there's a way to do this that's not repeating the covariance matrix 10,000 times, that'd be helpful too. Thanks!
For replication in more than 2 dimensions, you need to supply the replication counts as an array:
out = repmat(mat,[1,1,10000])
Creating a 784x784 matrix 10,000 times isn't going to take advantage of the vectorization in MATLAB, which is going to be more useful for small arrays. Avoiding a for loop also won't help too much, given the following:
The main speedup you can gain here is by computing the inverse of the covariance matrix once, and then computing the pdf yourself. The inverse of sigma takes O(n^3), and you are needlessly doing that 10,000 times. (Also, the square root determinant can be precomputed.) For reference, the PDF of the multivariate normal distribution is computed as follows:
http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Properties
Better to just compute the inverse once, and then compute z = x - mu for each value, then doing z'Sz for each pdf value, and applying a simple function and a constant. But wait! You can vectorize that, too.
I don't have MATLAB in front of me, but this is basically what you need to do, and it'll run in an instant.
s = inv(sigma);
c = -0.5*log(det(s)) - (k/2)*log(2*pi);
z = x - mu; % 10000 x 784 matrix
ans = exp( c - 0.5 .* dot(z*s, z, 2) ); % 10000 x 1 vector
I have a 3 dimensional grid, in which for each point of the grid I want to calculate a time dependent function G(t) for a large number of time steps and then summing the G function for each grid point. Using 4 for loops the execution time is becoming very large, so I am trying to avoid this using parfor.
a part of my code:
for i=1:50
for j=1:50
for k=1:25
x_in=i*dx;
y_in=j*dy;
z_in=k*dz;
%dx,dy, dz are some fixed values
r=sqrt((xx-x_in).^2+(yy-y_in).^2+(zz-z_in).^2);
%xx,yy,zz are 50x50x25 matrices generated from meshgrid
% r is a 3d matrix which produced from a 3 for-loop, for all the points of grid
parfor q=1:100
t=0.5*q;
G(q)=((a*p)/(t.^1.5)).*(exp(-r.^2/(4*a*t)));
% a,p are some fixed values
end
GG(i,j,k)=sum(G(:));
end
end
end
When I am using parfor the execution time becomes larger, and I am not sure why this is happening. Maybe I am not so familiar with sliced and indexed variables on a parfor loop.
My pc processor has 8 threads and ram memory ddr3 8GB
Any help will be great.
Thanks
As has been discussed in a previous question, parfor comes with an overhead. Therefore, loop that is too simple will execute more slowly with parfor.
In your case, the solution may be to parallelize the outermost loops.
%# preassign GG
GG = zeros(50,50,25);
%# loop over indices into GG
parfor idx = 1:(50*50*25)
[i,j,k] = ind2sub([50 50 25],idx);
x_in=i*dx;
y_in=j*dy;
z_in=k*dz;
%dx,dy, dz are some fixed values
r=sqrt((xx-x_in).^2+(yy-y_in).^2+(zz-z_in).^2);
%xx,yy,zz are 50x50x25 matrices generated from meshgrid
% r is a 3d matrix which produced from a 3 for-loop, for all the points of grid
for q=1:100
t=0.5*q;
G(q)=((a*p)/(t.^1.5)).*(exp(-r.^2/(4*a*t)));
% a,p are some fixed values
end
GG(idx)=sum(G(:));
end
I need to perform lots of evaluations of the form
X(:,i)' * A * X(:,i) i = 1...n
where X(:,i) is a vector and A is a symmetric matrix. Ostensibly, I can either do this in a loop
for i=1:n
z(i) = X(:,i)' * A * X(:,i)
end
which is slow, or vectorise it as
z = diag(X' * A * X)
which wastes RAM unacceptably when X has a lot of columns. Currently I am compromising on
Y = A * X
for i=1:n
z(i) = Y(:,i)' * X(:,i)
end
which is a little faster/lighter but still seems unsatisfactory.
I was hoping there might be some matlab/scilab idiom or trick to achieve this result more efficiently?
Try this in MATLAB:
z = sum(X.*(A*X));
This gives results equivalent to Federico's suggestion using the function DOT, but should run slightly faster. This is because the DOT function internally computes the result the same way as I did above using the SUM function. However, DOT also has additional input argument checks and extra computation for cases where you are dealing with complex numbers, which is extra overhead you probably don't want or need.
A note on computational efficiency:
Even though the time difference is small between how fast the two methods run, if you are going to be performing the operation many times over it's going to start to add up. To test the relative speeds, I created two 100-by-100 matrices of random values and timed the two methods over many runs to get an average execution time:
METHOD AVERAGE EXECUTION TIME
--------------------------------------------
Z = sum(X.*Y); 0.0002595 sec
Z = dot(X,Y); 0.0003627 sec
Using SUM instead of DOT therefore reduces the execution time of this operation by about 28% for matrices with around 10,000 elements. The larger the matrices, the more negligible this difference will be between the two methods.
To summarize, if this computation represents a significant bottleneck in how fast your code is running, I'd go with the solution using SUM. Otherwise, either solution should be fine.
Try this:
z = dot(X, A*X)
I don't have Matlab here to test, but it works on Octave, so I expect Matlab to have an analogous dot() function.
From Octave's help:
-- Function File: dot (X, Y, DIM)
Computes the dot product of two vectors. If X and Y are matrices,
calculate the dot-product along the first non-singleton dimension.
If the optional argument DIM is given, calculate the dot-product
along this dimension.
For completeness, gnovice's answer in Scilab would be
z = sum(X .* Y, 1)'