I have a 3 for loops and I would like if possible to vectorize the two inner loops.
for t=1:size(datesdaily1)
for i=1:size(secids,1)
sum=0;
if inc(t,i)==1
for j=1:size(secids,1)
if inc(t,j)==1
sum=sum+weig1(t,j)*sqrt(Rates(t,j))*rhoneutral(i,j);
end
end
b(t,i)=sqrt(Rates(t,i))*sum/MRates(t,1);
end
end
end
Any idea on how to accomplish that? Here 'weig', 'inc' and 'Rates' are (size(datesdaily1) by size(secids,1)) matrixes and 'rhoneutral' is a (size(secids,1) by size(secids,1)) matrix.
I tried but I was not able to figure out how to do it ...
Actual full code:
for t=1:size(datesdaily1)
rho=NaN(size(secids,1),size(secids,1));
aux=datesdaily1(t,1);
windowlenght=252;
index=find(datesdaily==aux);
auxret=dailyret(index-windowlenght+1:index,:);
numerator=0;
denominator=0;
auxret(:,any(isnan(auxret))) = NaN;
rho = corr(auxret, 'rows','pairwise');
rho1 = 1 - rho;
w = weig1(t,:) .* sqrt(Rates(t,:));
x = w.' * w;
y = x .* rho;
z = x .* rho1;
numerator = numerator + nansum(nansum(y));
denominator = denominator + nansum(nansum(z));;
if not(denominator==0)
alpha(t,1)=-(MRates(t,1)-numerator)/denominator;
%Stocks included
inc(t,:)=not(isnan(weig1(t,:).*diag(rho)'.*Rates(t,:)));
rhoneutral=rho-alpha(t,1).*(1-rho);
for i=1:size(secids,1)
sum=0;
if inc(t,i)==1
for j=1:size(secids,1)
if inc(t,j)==1
sum=sum+weig1(t,j)*sqrt(Rates(t,j))*rhoneutral(i,j);
end
end
bet(t,i)=sqrt(Rates(t,i))*sum/MRates(t,1);
end
end
check(t,1)=nansum(weig1(t,:).*bet(t,:));
end
end
One vectorized approach using fast matrix multiplication in MATLAB -
%// Mask of valid calculations
mask = inc==1
%// Store square root of Rates which seem to be used rather than Rates itself
sqRates = sqrt(Rates)
%// Use mask to set invalid positions in weig1 and sqRates to zeros
weig1masked = weig1.*mask
sqRates = sqRates.*mask
%// Perform the sum calculations using matrix multiplication.
%// This is where the magic happens!!
sum_vals = (weig1masked.*sqRates)*rhoneutral' %//'
%// Perform the outermost loop calculations for the final output
b_vect = bsxfun(#rdivide,sum_vals.*sqRates,MRates)
Benchmarking
Here's a benchmark test specially dedicated to #Dmitry Grigoryev for the doubts put on vectorization for performance -
M = 200;
N = 200;
weig1 = rand(M,N);
inc = rand(M,N)>0.5;
Rates = rand(M,N);
rhoneutral = rand(N,N);
MRates = rand(M,1);
disp('--------------------------- With Original Approach')
tic
%// Code from the original approach
toc
disp('--------------------------- With DmitryGrigoryev Approach')
tic
%// Code from the DmitryGrigoryev's solution
toc
disp('--------------------------- With Much-Hated Vectorized Approach')
tic
%// Proposed matrix-multiplication approach in this solution
toc
Runtimes -
--------------------------- With Original Approach
Elapsed time is 0.104084 seconds.
--------------------------- With DmitryGrigoryev Approach
Elapsed time is 3.562170 seconds.
--------------------------- With Much-Hated Vectorized Approach
Elapsed time is 0.002058 seconds.
Posting runtimes for bigger datasizes might just be too embarrasing for loopy approches, way to go vectorization!!
Related
I have the following code for a 8 dimensional empirical copula that creates a 8d matrix but I only need the diagonal of the matrix which is named EC in this code. Since this code is very slow, is there anyway that I can get "EC" without computing all of "ecop"?
function EC = ecopula8d(x)
[m n] = size(x);
y = sort(x);
for r=1:m
for q=1:m
for p=1:m
for o=1:m
for l=1:m
for k=1:m
for j=1:m
for i=1:m
ecop(i,j,k,l,o,p,q,r) = sum( (x(:,1)<=y(i,1)).*(x(:,2)<=y(j,2)).*(x(:,3)<=y(k,3)).*(x(:,4)<=y(l,4))...
.*(x(:,5)<=y(o,5)).*(x(:,6)<=y(p,6)).*(x(:,7)<=y(q,7)).*(x(:,8)<=y(r,8)) )/(m+1);
end
end
end
end
end
end
end
end
for i=1:m
EC(i,1)=ecop(i,i,i,i,i,i,i,i);
end
I haven't checked if your initial computation is correct (as in compared your implementation with the wikipedia article's formula), but your code should be equivalent to:
[m n] = size(x);
y = sort(x);
for i = 1:m
EC(i,1) = sum(all(bsxfun(#le, x, y(i,:)), 2), 1)/(m+1);
end
You can employ a completely vectorized solution with bsxfun -
EC = squeeze(sum(all(bsxfun(#le,x,permute(y,[3 2 1])),2),1))/(m+1)
The magic here happens with the use of permute enabling us to go full throttle on vectorization.
Here's a benchmarking test to compare this approach and the other partially vectorized approach with bsxfun on runtime efficiency -
x = rand(2000,2000);
y = sort(x);
m = size(x,1);
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('----------- With completely vectorized solution')
tic
EC1 = squeeze(sum(all(bsxfun(#le,x,permute(y,[3 2 1])),2),1))/(m+1);
toc, clear EC1
disp('----------- With partial vectorized solution')
tic
for i = 1:m
EC2(i,1) = sum(all(bsxfun(#le, x, y(i,:)), 2), 1)/(m+1);
end
toc
The runtimes thus obtained were -
----------- With completely vectorized solution
Elapsed time is 2.883594 seconds.
----------- With partial vectorized solution
Elapsed time is 4.752508 seconds.
One can pre-allocate for the other partially vectorized approach -
EC2 = zeros(m,1);
for i = 1:m
EC2(i,1) = sum(all(bsxfun(#le, x, y(i,:)), 2), 1)/(m+1);
end
The runtimes thus obtained weren't that different though -
----------- With completely vectorized solution
Elapsed time is 2.963835 seconds.
----------- With partial vectorized solution
Elapsed time is 4.620455 seconds.
Once of the approaches I would use is to convert N-D array into square 2-D(if possible) and then simply extract diagonal term as they should be equal in both cases:
EC=diag(reshape(ecop,size1,size2));
I would suggest to try use Python because numpy has really nice and efficient linear algebra package to deal with N-D arrays. Matlab is pretty slow in adding and updating its libraries.
Let's say I have two matrices A and B
A = rand(4,5,3);
B = rand(4,5,6)
I want to apply the function 'corr2' to calculate the correlation coefficients.
corr2(A(:,:,1),B(:,:,1))
corr2(A(:,:,1),B(:,:,2))
corr2(A(:,:,1),B(:,:,3))
...
corr2(A(:,:,1),B(:,:,6))
...
corr2(A(:,:,2),B(:,:,1))
corr2(A(:,:,2),B(:,:,2))
...
corr2(A(:,:,3),B(:,:,6))
How to avoid using loops to create such a vectorization?
Hacked into the m-file for corr2 to create a customized vectorized version for working with 3D arrays. Proposed here are two approaches with bsxfun (of course!)
Approach #1
szA = size(A);
szB = size(B);
a1 = bsxfun(#minus,A,mean(mean(A)));
b1 = bsxfun(#minus,B,mean(mean(B)));
sa1 = sum(sum(a1.*a1));
sb1 = sum(sum(b1.*b1));
v1 = reshape(b1,[],szB(3)).'*reshape(a1,[],szA(3));
v2 = sqrt(sb1(:)*sa1(:).');
corr3_out = v1./v2; %// desired output
corr3_out stores corr2 results between all 3D slices of A and B.
Thus, for A = rand(4,5,3), B = rand(4,5,6), we would have corr3_out as a 6x3 array.
Approach #2
Slightly different approach to save on few calls to sum and mean by using reshape instead -
szA = size(A);
szB = size(B);
dim12 = szA(1)*szA(2);
a1 = bsxfun(#minus,A,mean(reshape(A,dim12,1,[])));
b1 = bsxfun(#minus,B,mean(reshape(B,dim12,1,[])));
v1 = reshape(b1,[],szB(3)).'*reshape(a1,[],szA(3));
v2 = sqrt(sum(reshape(b1.*b1,dim12,[])).'*sum(reshape(a1.*a1,dim12,[])));
corr3_out = v1./v2; %// desired output
Benchmarking
Benchmark code -
%// Create random input arrays
N = 55; %// datasize scaling factor
A = rand(4*N,5*N,3*N);
B = rand(4*N,5*N,6*N);
%// Warm up tic/toc
for k = 1:50000
tic(); elapsed = toc();
end
%// Run vectorized and loopy approach codes on the input arrays
%// 1. Vectorized approach
%//... solution code (Approach #2) posted earlier
%// clear variables used
%// 2. Loopy approach
tic
s_A=size(A,3);
s_B=size(B,3);
out1 = zeros(s_B,s_A);
for ii=1:s_A
for jj=1:s_B
out1(jj,ii)=corr2(A(:,:,ii),B(:,:,jj));
end
end
toc
Results -
-------------------------- With BSXFUN vectorized solution
Elapsed time is 1.231230 seconds.
-------------------------- With loopy approach
Elapsed time is 139.934719 seconds.
MATLAB-JIT lovers show some love here! :)
Some examples, yet none is better than loops. As Divakar says in a comment below this is not a vectorized solution.
CODE:
A = rand(4,5,1000);
B = rand(4,5,200);
s_A=size(A,3);
s_B=size(B,3);
%%% option 1
tic
corr_AB=cell2mat(arrayfun(#(indx1) arrayfun(#(indx2) corr2(A(:,:,indx1),B(:,:,indx2)),1:s_B),1:s_A,'UniformOutput',false));
toc
%%% option 2
tic
indx1=repmat(1:s_A,s_B,1);
indx1=indx1(:);
indx2=repmat(1:s_B,1,s_A);
indx2=indx2(:);
indx=[indx1,indx2];
corr_AB=arrayfun(#(i) corr2(A(:,:,indx(i,1)),B(:,:,indx(i,2))),1:size(indx,1));
toc
%%% option 3
tic
a=1;
for i=1:s_A
for j=1:s_B
corr_AB(a)=corr2(A(:,:,i),B(:,:,j));
a=a+1;
end
end
toc
OUTPUT:
Elapsed time is 9.655696 seconds.
Elapsed time is 9.398979 seconds.
Elapsed time is 8.489744 seconds.
Following is the octave codes(part of kmeans)
centroidSum = zeros(K);
valueSum = zeros(K, n);
for i = 1 : m
for j = 1 : K
if(idx(i) == j)
centroidSum(j) = centroidSum(j) + 1;
valueSum(j, :) = valueSum(j, :) + X(i, :);
end
end
end
The codes work, is it possible to vectorize the codes?
It is easy to vectorize the codes without if statement,
but how could we vectorize the codes with if statement?
I assume the purpose of the code is to compute the centroids of subsets of a set of m data points in an n-dimensional space, where the points are stored in a matrix X (points x coordinates) and the vector idx specifies for each data point the subset (1 ... K) the point belongs to. Then a partial vectorization is:
centroid = zeros(K, n)
for j = 1 : K
centroid(j, :) = mean(X(idx == j, :));
end
The if is eliminated by indexing, in particular logical indexing: idx == j gives a boolean array which indicates which data points belong to subset j.
I think it might be possible to get rid of the second for-loop, too, but this would result in very convoluted, unintelligible code.
Brief introduction and solution code
This could be one fully vectorized approach based on -
accumarray: For accumulating summations as done for calulating valueSum. This also introduces a technique how one can use accumarray on a 2D matrix along a certain direction, which isn't possible in a straight-forward manner with it.
bsxfun: For calculating linear indices across all columns for matching row indices from idx.
Here's the implementation -
%// Store no. of columns in X for frequent usage later on
ncols = size(X,2);
%// Find indices in idx that are within [1:k] range, call them as labels
%// Also, find their locations in that range array, call those as pos
[pos,id] = ismember(idx,1:K);
labels = id(pos);
%// OR with bsxfun: [pos,labels] = find(bsxfun(#eq,idx(:),1:K));
%// Find all labels, i.e. across all columns of X
all_labels = bsxfun(#plus,labels(:),[0:ncols-1]*K);
%// Get truncated X corresponding to all indices matches across all columns
X_cut = X(pos,:);
%// Accumulate summations within each column based on the labels.
%// Note that accumarray doesn't accept matrices, so we were required
%// to create all_labels that had same labels within each column and
%// offsetted at constant intervals from consecutive columns
acc1 = accumarray(all_labels(:),X_cut(:));
%// Regularise accumulated array and reshape back to a 2D array version
acc1_reg2D = [acc1 ; zeros(K*ncols - numel(acc1),1)];
valueSum = reshape(acc1_reg2D,[],ncols);
centroidSum = histc(labels,1:K); %// Get labels counts as centroid sums
Benchmarking code
%// Datasize parameters
K = 5000;
n = 5000;
m = 5000;
idx = randi(9,1,m);
X = rand(m,n);
disp('----------------------------- With Original Approach')
tic
centroidSum1 = zeros(K,1);
valueSum1 = zeros(K, n);
for i = 1 : m
for j = 1 : K
if(idx(i) == j)
centroidSum1(j) = centroidSum1(j) + 1;
valueSum1(j, :) = valueSum1(j, :) + X(i, :);
end
end
end
toc, clear valueSum1 centroidSum1
disp('----------------------------- With Proposed Approach')
tic
%// ... Code from earlied mentioned section
toc
Runtime results
----------------------------- With Original Approach
Elapsed time is 1.235412 seconds.
----------------------------- With Proposed Approach
Elapsed time is 0.379133 seconds.
Not sure about its runtime performance but here's a non-convoluted vectorized implementation:
b = idx == 1:K;
centroids = (b' * X) ./ sum(b)';
Vectorizing the calculation makes a huge difference in performance. Benchmarking
The original code,
The partial vectorization from A. Donda and
The full vectorization from Tom,
gave me the following results:
Original Code: Elapsed time is 1.327877 seconds.
Partial Vectorization: Elapsed time is 0.630767 seconds.
Full Vectorization: Elapsed time is 0.021129 seconds.
Benchmarking code here:
%// Datasize parameters
K = 5000;
n = 5000;
m = 5000;
idx = randi(9,1,m);
X = rand(m,n);
fprintf('\nOriginal Code: ')
tic
centroidSum1 = zeros(K,1);
valueSum1 = zeros(K, n);
for i = 1 : m
for j = 1 : K
if(idx(i) == j)
centroidSum1(j) = centroidSum1(j) + 1;
valueSum1(j, :) = valueSum1(j, :) + X(i, :);
end
end
end
centroids = valueSum1 ./ centroidSum1;
toc, clear valueSum1 centroidSum1 centroids
fprintf('\nPartial Vectorization: ')
tic
centroids = zeros(K,n);
for k = 1:K
centroids(k,:) = mean( X(idx == k, :) );
end
toc, clear centroids
fprintf('\nFull Vectorization: ')
tic
centroids = zeros(K,n);
b = idx == 1:K;
centroids = (b * X) ./ sum(b)';
toc
Note, I added an extra line to the original code to element-wise divide valueSum1 by centroidSum1 to make the output of each type of code the same.
Finally, I know this isn't strictly an "answer", however I don't have enough reputation to add a comment, and I thought the benchmarking figures were useful to anyone who is learning MATLAB (like myself) and needs some extra motivation to master vectorization.
I have many points and I want to build distance matrix i.e. distance of every point with all of other points but I want to don't use from loop because take too time...
Is a better way for building this matrix?
this is my loop: for a setl with size: 10000x3 this method take a lot of my time :(
for i=1:size(setl,1)
for j=1:size(setl,1)
dist = sqrt((xl(i)-xl(j))^2+(yl(i)-yl(j))^2+...
(zl(i)-zl(j))^2);
distanceMatrix(i,j) = dist;
end
end
How about using some linear algebra? The distance of two points can be computed from the inner product of their position vectors,
D(x, y) = ∥y – x∥ = √ (
xT x + yT y – 2 xT y ),
and the inner product for all pairs of points can be obtained through a simple matrix operation.
x = [xl(:)'; yl(:)'; zl(:)'];
IP = x' * x;
d = sqrt(bsxfun(#plus, diag(IP), diag(IP)') - 2 * IP);
For 10000 points, I get the following timing results:
ahmad's loop + shoelzer's preallocation: 7.8 seconds
Dan's vectorized indices: 5.3 seconds
Mohsen's bsxfun: 1.5 seconds
my solution: 1.3 seconds
You can use bsxfun which is generally a faster solution:
s = [xl(:) yl(:) zl(:)];
d = sqrt(sum(bsxfun(#minus, permute(s, [1 3 2]), permute(s, [3 1 2])).^2,3));
You can do this fully vectorized like so:
n = numel(xl);
[X, Y] = meshgrid(1:n,1:n);
Ix = X(:)
Iy = Y(:)
reshape(sqrt((xl(Ix)-xl(Iy)).^2+(yl(Ix)-yl(Iy)).^2+(zl(Ix)-zl(Iy)).^2), n, n);
If you look at Ix and Iy (try it for like a 3x3 dataset), they make every combination of linear indexes possible for each of your matrices. Now you can just do each subtraction in one shot!
However mixing the suggestions of shoelzer and Jost will give you an almost identical performance performance boost:
n = 50;
xl = rand(n,1);
yl = rand(n,1);
zl = rand(n,1);
tic
for t = 1:100
distanceMatrix = zeros(n); %// Preallocation
for i=1:n
for j=min(i+1,n):n %// Taking advantge of symmetry
distanceMatrix(i,j) = sqrt((xl(i)-xl(j))^2+(yl(i)-yl(j))^2+(zl(i)-zl(j))^2);
end
end
d1 = distanceMatrix + distanceMatrix'; %'
end
toc
%// Vectorized solution that creates linear indices using meshgrid
tic
for t = 1:100
[X, Y] = meshgrid(1:n,1:n);
Ix = X(:);
Iy = Y(:);
d2 = reshape(sqrt((xl(Ix)-xl(Iy)).^2+(yl(Ix)-yl(Iy)).^2+(zl(Ix)-zl(Iy)).^2), n, n);
end
toc
Returns:
Elapsed time is 0.023332 seconds.
Elapsed time is 0.024454 seconds.
But if I change n to 500 then I get
Elapsed time is 1.227956 seconds.
Elapsed time is 2.030925 seconds.
Which just goes to show that you should always bench mark solutions in Matlab before writing off loops as slow! In this case, depending on the scale of your solution, loops could be significantly faster.
Be sure to preallocate distanceMatrix. Your loops will run much, much faster and vectorization probably isn't needed. Even if you do it, there may not be any further speed increase.
The latest versions (Since R2016b) of MATLAB support Implicit Broadcasting (See also noted on bsxfun()).
Hence the fastest way for distance matrix is:
function [ mDistMat ] = CalcDistanceMatrix( mA, mB )
mDistMat = sum(mA .^ 2).' - (2 * mA.' * mB) + sum(mB .^ 2);
end
Where the points are along the columns of the set.
In your case mA = mB.
Have a look on my Calculate Distance Matrix Project.
z - matrix of doubles, size Nx2;
x - matrix of doubles, size Nx2;
sup = x(i, :);
phi(1, i) = {#(z) exp(-g * sum((z - sup(ones([size(z, 1) 1]),:)) .^ 2, 2))};
this is a Radial Basis Function (RBF) for logistic regression. Here is the formula:
I need your advice, can i optimize this formula? coz it calls millions times, and it takes a lot of time...
It seems in your recent edits, you introduced some syntax errors, but I think I understood what you were trying to do (from the first version).
Instead of using REPMAT or indexing to repeat the vector x(i,:) to match the rows of z, consider using the efficient BSXFUN function:
rbf(:,i) = exp( -g .* sum(bsxfun(#minus,z,x(i,:)).^2,2) );
The above obviously loops over every row of x
You can go one step further, and use the PDIST2 to compute the euclidean distance between every pair of rows in z and x:
%# some random data
X = rand(10,2);
Z = rand(10,2);
g = 0.5;
%# one-line solution
rbf = exp(-g .* pdist2(Z,X,'euclidean').^2);
Now every value in the matrix: rbf(i,j) corresponds to the function value between z(i,:) and x(j,:)
EDIT:
I timed the different methods, here is the code I used:
%# some random data
N = 5000;
X = rand(N,2);
Z = rand(N,2);
g = 0.5;
%# PDIST2
tic
rbf1 = exp(-g .* pdist2(Z,X,'euclidean').^2);
toc
%# BSXFUN+loop
tic
rbf2 = zeros(N,N);
for j=1:N
rbf2(:,j) = exp( -g .* sum(bsxfun(#minus,Z,X(j,:)).^2,2) );
end
toc
%# REPMAT+loop
tic
rbf3 = zeros(N,N);
for j=1:N
rbf3(:,j) = exp( -g .* sum((Z-repmat(X(j,:),[N 1])).^2,2) );
end
toc
%# check if results are equal
all( abs(rbf1(:)-rbf2(:)) < 1e-15 )
all( abs(rbf2(:)-rbf3(:)) < 1e-15 )
The results:
Elapsed time is 2.108313 seconds. # PDIST2
Elapsed time is 1.975865 seconds. # BSXFUN
Elapsed time is 2.706201 seconds. # REPMAT
Amro has mentioned some really good methods. But the bsxfun can be further exploited by reshaping one of the matrices.
>> type r.m
N = 5000;
X = rand(N,2);
Z = rand(N,2);
g = 0.5;
%BSXFUN+loop
tic
rbf2 = zeros(N,N);
for j=1:N
rbf2(:,j) = exp( -g .* sum(bsxfun(#minus,Z,X(j,:)).^2,2) );
end
toc
tic
diffs = bsxfun(#minus, reshape(X', [1, 2, N]), Z);
dist = reshape(sum(diffs.^2, 2), [N, N]);
rbf3 = exp(-g .* dist);
toc
>> r
Elapsed time is 2.235527 seconds.
Elapsed time is 0.877833 seconds.
>> r
Elapsed time is 2.253943 seconds.
Elapsed time is 1.047295 seconds.
>> r
Elapsed time is 2.234132 seconds.
Elapsed time is 0.856302 seconds.
>> max(abs(rbf2(:) - rbf3(:)))
ans =
0
You want to subtract every row of X from every row of Z. This usually is straight forward when one of them is a vector and the other is a matrix. But if both of them are matrices, we can do this by making sure that each matrix in the volume contains just one vector. Here I chose X, but Z can be used interchangeably with X.