Matlab formula optimization: Radial Basis Function - matlab

z - matrix of doubles, size Nx2;
x - matrix of doubles, size Nx2;
sup = x(i, :);
phi(1, i) = {#(z) exp(-g * sum((z - sup(ones([size(z, 1) 1]),:)) .^ 2, 2))};
this is a Radial Basis Function (RBF) for logistic regression. Here is the formula:
I need your advice, can i optimize this formula? coz it calls millions times, and it takes a lot of time...

It seems in your recent edits, you introduced some syntax errors, but I think I understood what you were trying to do (from the first version).
Instead of using REPMAT or indexing to repeat the vector x(i,:) to match the rows of z, consider using the efficient BSXFUN function:
rbf(:,i) = exp( -g .* sum(bsxfun(#minus,z,x(i,:)).^2,2) );
The above obviously loops over every row of x
You can go one step further, and use the PDIST2 to compute the euclidean distance between every pair of rows in z and x:
%# some random data
X = rand(10,2);
Z = rand(10,2);
g = 0.5;
%# one-line solution
rbf = exp(-g .* pdist2(Z,X,'euclidean').^2);
Now every value in the matrix: rbf(i,j) corresponds to the function value between z(i,:) and x(j,:)
EDIT:
I timed the different methods, here is the code I used:
%# some random data
N = 5000;
X = rand(N,2);
Z = rand(N,2);
g = 0.5;
%# PDIST2
tic
rbf1 = exp(-g .* pdist2(Z,X,'euclidean').^2);
toc
%# BSXFUN+loop
tic
rbf2 = zeros(N,N);
for j=1:N
rbf2(:,j) = exp( -g .* sum(bsxfun(#minus,Z,X(j,:)).^2,2) );
end
toc
%# REPMAT+loop
tic
rbf3 = zeros(N,N);
for j=1:N
rbf3(:,j) = exp( -g .* sum((Z-repmat(X(j,:),[N 1])).^2,2) );
end
toc
%# check if results are equal
all( abs(rbf1(:)-rbf2(:)) < 1e-15 )
all( abs(rbf2(:)-rbf3(:)) < 1e-15 )
The results:
Elapsed time is 2.108313 seconds. # PDIST2
Elapsed time is 1.975865 seconds. # BSXFUN
Elapsed time is 2.706201 seconds. # REPMAT

Amro has mentioned some really good methods. But the bsxfun can be further exploited by reshaping one of the matrices.
>> type r.m
N = 5000;
X = rand(N,2);
Z = rand(N,2);
g = 0.5;
%BSXFUN+loop
tic
rbf2 = zeros(N,N);
for j=1:N
rbf2(:,j) = exp( -g .* sum(bsxfun(#minus,Z,X(j,:)).^2,2) );
end
toc
tic
diffs = bsxfun(#minus, reshape(X', [1, 2, N]), Z);
dist = reshape(sum(diffs.^2, 2), [N, N]);
rbf3 = exp(-g .* dist);
toc
>> r
Elapsed time is 2.235527 seconds.
Elapsed time is 0.877833 seconds.
>> r
Elapsed time is 2.253943 seconds.
Elapsed time is 1.047295 seconds.
>> r
Elapsed time is 2.234132 seconds.
Elapsed time is 0.856302 seconds.
>> max(abs(rbf2(:) - rbf3(:)))
ans =
0
You want to subtract every row of X from every row of Z. This usually is straight forward when one of them is a vector and the other is a matrix. But if both of them are matrices, we can do this by making sure that each matrix in the volume contains just one vector. Here I chose X, but Z can be used interchangeably with X.

Related

How to calculate 2-norm of a matrix efficiently?

Suppose I have a matrix A. I want to calculate its 2-norm/spectral norm. How can I calculate this efficiently?
I know 2-norm of a matrix is equal to its largest singular value. So, result of the following MATLAB code will be zero
>> [u,s,v]=svd(A,'econ');
norm(A,2)-s(1,1)
But to know 2-norm I have to calculate SVD of full matrix A, is there any efficient way to calculate 2-norm? Answer in form of MATLAB code will be much appereciated.
This example with norm and random data
A = randn(2000,2000);
tic;
n1 = norm(A)
toc;
gives
n1 = 89.298
Elapsed time is 2.16777 seconds.
You can try eigs to find only one (the largest) eigenvalue of the symmetric matrix A'*A (or A*A' if it is smaller for A rectangular). It uses a Lanczos iteration method.
tic;
B = A'*A; % symmetric positive-definite. B = A*A' if it is smaller
n2 = sqrt(eigs(B, 1)),
toc
it outputs:
n2 = 89.298
Elapsed time is 0.311942 seconds.
If you don't want to use norm or eigs, and your matrix A has good properties (singular values properly separated), you can try to approximate it with a power iteration method:
tic;
B = A'*A; % or B = A*A' if it is smaller
x = B(:,1); % example of starting point, x will have the largest eigenvector
x = x/norm(x);
for i = 1:200
y = B*x;
y = y/norm(y);
% norm(x - y); % <- residual, you can try to use it to stop iteration
x = y;
end;
n3 = sqrt(mean(B*x./x)) % translate eigenvalue of B to singular value of A
toc
which for the same random matrix (not particularly good properties) gives a ~0.1% accurate solution:
n3 = 89.420
Elapsed time is 0.428032 seconds.

running mean of a matrix in matlab

Given the nxN matrix A. I want to find the running mean of the rows of the matrix. For this I have done:
mean = cumsum(A, 2);
for k = 1:N
mean(:, k) = mean(:, k)/k;
end
but for large N this takes a while. Is there a more efficient way to do this in MATLAB?
Note: zeeMonkeez's solution is fastest according to some rough benchmarks at the end of my post.
How about
N = 1000;
A = rand(N, N);
m = cumsum(A, 2);
m1 = zeros(size(m));
tic
for j = 1:1000;
for k = 1:N
m1(:, k) = m(:, k)/k;
end
end
toc
Elapsed time is 6.971112 seconds.
tic
for j = 1:1000
n = repmat(1:N, N, 1);
m2 = m./n;
end
toc
Elapsed time is 2.471035 seconds.
Here, you transform your problem into a matrix multiplication (instead of dividing element-wise, divide one matrix by the other point-wise). The matrix you would like to divide by looks like this:
[1, 2, 3, ..., N;
1, 2, .....
.
.
1, 2, .... ]
which you can get using repmat.
EDIT: BENCHMARK
bsxfun as used by #zeeMonkeez is even faster. Both, for the above case (10% difference on my system) and also for a larger matrix (N = 10000) for which case my version actually performs worst (35 sec, vs 30 from OP and 23 from zeeMonkeez's solution).
On my machine, bsxfun is even faster:
N = 1000;
A = rand(N, N);
m = cumsum(A, 2);
tic
for j = 1:1000;
m2 = bsxfun(#rdivide, m, 1:N);
end
toc
Elapsed time: 1.555507 seconds.
bsxfun avoids having to allocate memory for the divisor as repmat does.

Vectorize loop to increase efficiency

I have a 3 for loops and I would like if possible to vectorize the two inner loops.
for t=1:size(datesdaily1)
for i=1:size(secids,1)
sum=0;
if inc(t,i)==1
for j=1:size(secids,1)
if inc(t,j)==1
sum=sum+weig1(t,j)*sqrt(Rates(t,j))*rhoneutral(i,j);
end
end
b(t,i)=sqrt(Rates(t,i))*sum/MRates(t,1);
end
end
end
Any idea on how to accomplish that? Here 'weig', 'inc' and 'Rates' are (size(datesdaily1) by size(secids,1)) matrixes and 'rhoneutral' is a (size(secids,1) by size(secids,1)) matrix.
I tried but I was not able to figure out how to do it ...
Actual full code:
for t=1:size(datesdaily1)
rho=NaN(size(secids,1),size(secids,1));
aux=datesdaily1(t,1);
windowlenght=252;
index=find(datesdaily==aux);
auxret=dailyret(index-windowlenght+1:index,:);
numerator=0;
denominator=0;
auxret(:,any(isnan(auxret))) = NaN;
rho = corr(auxret, 'rows','pairwise');
rho1 = 1 - rho;
w = weig1(t,:) .* sqrt(Rates(t,:));
x = w.' * w;
y = x .* rho;
z = x .* rho1;
numerator = numerator + nansum(nansum(y));
denominator = denominator + nansum(nansum(z));;
if not(denominator==0)
alpha(t,1)=-(MRates(t,1)-numerator)/denominator;
%Stocks included
inc(t,:)=not(isnan(weig1(t,:).*diag(rho)'.*Rates(t,:)));
rhoneutral=rho-alpha(t,1).*(1-rho);
for i=1:size(secids,1)
sum=0;
if inc(t,i)==1
for j=1:size(secids,1)
if inc(t,j)==1
sum=sum+weig1(t,j)*sqrt(Rates(t,j))*rhoneutral(i,j);
end
end
bet(t,i)=sqrt(Rates(t,i))*sum/MRates(t,1);
end
end
check(t,1)=nansum(weig1(t,:).*bet(t,:));
end
end
One vectorized approach using fast matrix multiplication in MATLAB -
%// Mask of valid calculations
mask = inc==1
%// Store square root of Rates which seem to be used rather than Rates itself
sqRates = sqrt(Rates)
%// Use mask to set invalid positions in weig1 and sqRates to zeros
weig1masked = weig1.*mask
sqRates = sqRates.*mask
%// Perform the sum calculations using matrix multiplication.
%// This is where the magic happens!!
sum_vals = (weig1masked.*sqRates)*rhoneutral' %//'
%// Perform the outermost loop calculations for the final output
b_vect = bsxfun(#rdivide,sum_vals.*sqRates,MRates)
Benchmarking
Here's a benchmark test specially dedicated to #Dmitry Grigoryev for the doubts put on vectorization for performance -
M = 200;
N = 200;
weig1 = rand(M,N);
inc = rand(M,N)>0.5;
Rates = rand(M,N);
rhoneutral = rand(N,N);
MRates = rand(M,1);
disp('--------------------------- With Original Approach')
tic
%// Code from the original approach
toc
disp('--------------------------- With DmitryGrigoryev Approach')
tic
%// Code from the DmitryGrigoryev's solution
toc
disp('--------------------------- With Much-Hated Vectorized Approach')
tic
%// Proposed matrix-multiplication approach in this solution
toc
Runtimes -
--------------------------- With Original Approach
Elapsed time is 0.104084 seconds.
--------------------------- With DmitryGrigoryev Approach
Elapsed time is 3.562170 seconds.
--------------------------- With Much-Hated Vectorized Approach
Elapsed time is 0.002058 seconds.
Posting runtimes for bigger datasizes might just be too embarrasing for loopy approches, way to go vectorization!!

How to Build a Distance Matrix without a Loop (Vectorization)?

I have many points and I want to build distance matrix i.e. distance of every point with all of other points but I want to don't use from loop because take too time...
Is a better way for building this matrix?
this is my loop: for a setl with size: 10000x3 this method take a lot of my time :(
for i=1:size(setl,1)
for j=1:size(setl,1)
dist = sqrt((xl(i)-xl(j))^2+(yl(i)-yl(j))^2+...
(zl(i)-zl(j))^2);
distanceMatrix(i,j) = dist;
end
end
How about using some linear algebra? The distance of two points can be computed from the inner product of their position vectors,
D(x, y) = ∥y – x∥ = √ (
xT x + yT y – 2 xT y ),
and the inner product for all pairs of points can be obtained through a simple matrix operation.
x = [xl(:)'; yl(:)'; zl(:)'];
IP = x' * x;
d = sqrt(bsxfun(#plus, diag(IP), diag(IP)') - 2 * IP);
For 10000 points, I get the following timing results:
ahmad's loop + shoelzer's preallocation: 7.8 seconds
Dan's vectorized indices: 5.3 seconds
Mohsen's bsxfun: 1.5 seconds
my solution: 1.3 seconds
You can use bsxfun which is generally a faster solution:
s = [xl(:) yl(:) zl(:)];
d = sqrt(sum(bsxfun(#minus, permute(s, [1 3 2]), permute(s, [3 1 2])).^2,3));
You can do this fully vectorized like so:
n = numel(xl);
[X, Y] = meshgrid(1:n,1:n);
Ix = X(:)
Iy = Y(:)
reshape(sqrt((xl(Ix)-xl(Iy)).^2+(yl(Ix)-yl(Iy)).^2+(zl(Ix)-zl(Iy)).^2), n, n);
If you look at Ix and Iy (try it for like a 3x3 dataset), they make every combination of linear indexes possible for each of your matrices. Now you can just do each subtraction in one shot!
However mixing the suggestions of shoelzer and Jost will give you an almost identical performance performance boost:
n = 50;
xl = rand(n,1);
yl = rand(n,1);
zl = rand(n,1);
tic
for t = 1:100
distanceMatrix = zeros(n); %// Preallocation
for i=1:n
for j=min(i+1,n):n %// Taking advantge of symmetry
distanceMatrix(i,j) = sqrt((xl(i)-xl(j))^2+(yl(i)-yl(j))^2+(zl(i)-zl(j))^2);
end
end
d1 = distanceMatrix + distanceMatrix'; %'
end
toc
%// Vectorized solution that creates linear indices using meshgrid
tic
for t = 1:100
[X, Y] = meshgrid(1:n,1:n);
Ix = X(:);
Iy = Y(:);
d2 = reshape(sqrt((xl(Ix)-xl(Iy)).^2+(yl(Ix)-yl(Iy)).^2+(zl(Ix)-zl(Iy)).^2), n, n);
end
toc
Returns:
Elapsed time is 0.023332 seconds.
Elapsed time is 0.024454 seconds.
But if I change n to 500 then I get
Elapsed time is 1.227956 seconds.
Elapsed time is 2.030925 seconds.
Which just goes to show that you should always bench mark solutions in Matlab before writing off loops as slow! In this case, depending on the scale of your solution, loops could be significantly faster.
Be sure to preallocate distanceMatrix. Your loops will run much, much faster and vectorization probably isn't needed. Even if you do it, there may not be any further speed increase.
The latest versions (Since R2016b) of MATLAB support Implicit Broadcasting (See also noted on bsxfun()).
Hence the fastest way for distance matrix is:
function [ mDistMat ] = CalcDistanceMatrix( mA, mB )
mDistMat = sum(mA .^ 2).' - (2 * mA.' * mB) + sum(mB .^ 2);
end
Where the points are along the columns of the set.
In your case mA = mB.
Have a look on my Calculate Distance Matrix Project.

binary matrix of all combination with equal probability of 1 and 0 in matlab

I want to generate a binary matrix, let say (8,1). with equal probability means four 1's and four 0s in a matrix. with different permutation of these elements total 70 combination are possible (e.g 8C4). I want these all possible combination one by one.
please help.
The straighforward answer is:
unique(perms([true(1, N / 2), false(1, N / 2)]), 'rows')
or in a fancier form:
unique(perms(sparse(1, 1:N / 2, true, 1, N)), 'rows')
where N is the length of your vector (N = 8 in your example). However, expect this solution to be very slow for large arrays.
Surprisingly, in this case a faster method is to generate all possible permutations (see here) and eliminate those that do not satisfy the desired criterion:
C = cell(N, 1); %// Preallocate memory
[C{:}] = ndgrid([true, false]); %// Generate N grids of binary values
p = cellfun(#(x){x(:)}, C); %// Convert grids to column vectors
p = [p{:}]; %// Obtain all combinations
p = p(sum(p, 2) == N / 2, :); %// Keep only desired combinations
Benchmark
N = 8;
%// Method #1 (one-liner)
tic
for k = 1:1e3
p = unique(perms(sparse(1, 1:N / 2, true, 1, N)), 'rows');
end
toc
%// Method #2
tic
for k = 1:1e3
C = cell(N, 1);
[C{:}] = ndgrid([true, false]);
p = cellfun(#(x){x(:)}, C);
p = [p{:}];
p = p(sum(p, 2) == N / 2, :);
end
toc
The results I got were:
Elapsed time is 0.858539 seconds. %// Method #1
Elapsed time is 0.803826 seconds. %// Method #2
... and for N = 10:
Elapsed time is 55.3068 seconds. %// Method #1
Elapsed time is 1.03664 seconds. %// Method #2
Not only that nchoosek fails for large values of N, it's also slower.
Here is an even faster way to do it, using the fact that you are looking for a subset of binary number representations:
b = dec2bin(1:2^N-1);
x = b-'0';
x = x(sum(x,2)==N/2,:);
The performance comparison:
N = 8;
% Dennis solution
tic
b = dec2bin(1:2^N-1);
x = b-'0';
x=x(sum(x,2)==N/2,:);
toc
% Eitan Method 2
tic
for k = 1:1e3
C = cell(N, 1);
[C{:}] = ndgrid([true, false]);
p = cellfun(#(x){x(:)}, C);
p = [p{:}];
p = p(sum(p, 2) == N / 2, :);
end
toc
Gives these timings:
Elapsed time is 0.002200 seconds.
Elapsed time is 0.594309 seconds.
Note that the resulting lines will be in different orders for both solutions.