Function "pdist" in Matlab - matlab

I have an N by 2 matrix called r (N is very large). r is the position of points in 2D. I searched for the best-optimized way of calculating distance between point. I find that dist function is the best on in less time-consuming if one doesn't try to change it to a square matrix. I wonder if I write
D= pdist(r, 'euclidean');
When I need distance between particle i and j, what is the best way to find it using D vector? I do not really any way without using if.
I know that I can do it by
if (i < j)
D((i–1)*(m–i/2)+j–i)
end
But as N is very large, this is not efficient. Could anyone help me, please?

I'm using ii and jj as row and column indices into the hypohetical distance matrix M = squareform(D) of size and N. The result is ind, such that D(ind) equals M(ii,jj).
t = sort([ii, jj]); % temporary variable
ii = t(2); % maximum of ii and jj
jj = t(1); % minimum of ii and jj
t = N-1:-1:1;
ind = sum(t(1:jj-1)) + ii - jj;

Related

Generating random diagonally dominant dense/sparse matrices in matlab

Is there a matlab command for generating a random n by n matrix, with elements taken in the interval [0,1], with x% of the entries on the off-diagonal to be 0. Then, additionally setting the element in the diagonal to be the sum of every element in its respective column? In order to create a diagonally dominant dense/sparse matrix? This may be easy enough to write a code for but I was wondering if there was already a built in function with this capability.
EDIT:
I am new to Matlab/programming so this was an easier said than done. I'm having trouble making the matrix with the percentage ignoring the diagonal. It's a n x n matrix, so there are $n^2$ entries, with n of them on the diagonal, I want the percentage of zeros to be taken from $n^2 - n$ elements, i.e. all the off-diagonal elements. I cannot implement this correctly. I do not know how to initialize my M (see below) to correspond correctly.
% Enter percentage as a decimal
function [M] = DiagDomSparse(n,x)
M = rand(n);
disp("Original matrix");
disp(M);
x = sum(M);
for i=1:n
for j=1:n
if(i == j)
M(i,j) = x(i);
end
end
end
disp(M);
Here is one approach that you could use. I'm sure you will get some other answers now with a more clever approach, but I like to keep things simple and understandable.
What I'm doing below is creating the data to be put in the off-diagonal elements first. I create an empty matrix and copy this data into the off-diagonal elements using linear indexing. Now I can compute the sum of columns and write those into the diagonal elements using linear indexing again. Because the matrix was initialized to zero, the diagonal elements are still zero when I compute the sum of columns, so they don't interfere.
n = 5;
x = 0.3; % fraction of zeros in off-diagonal
k = round(n*(n-1)*x); % number of zeros in off-diagonal
data = randn(n*(n-1)-k,1); % random numbers, pick your distribution here!
data = [data;zeros(k,1)]; % the k zeros
data = data(randperm(length(data))); % shuffle
diag_index = 1:n+1:n*n; % linear index to all diagonal elements
offd_index = setdiff(1:n*n,diag_index); % linear index to all other elements
M = zeros(n,n);
M(offd_index) = data; % set off-diagonal elements to data
M(diag_index) = sum(M,1); % set diagonal elements to sum of columns
To refer to the diagonal you want eye(n,'logical'). Here is a solution:
n=5;
M = rand(n);
disp("Original matrix");
disp(M);
x = sum(M);
for i=1:n
for j=1:n
if(i == j)
M(i,j) = x(i);
end
end
end
disp('loop solution:')
disp(M);
M(eye(n,'logical'))=x;
disp('eye solution:')
disp(M);

Simulate and plot with matlab

I'm trying to simulate some random variables Y such that P(Y=1)=P(y=-1)=0.5, and X_n = sum of Y_i (i from 1 to n). I want to use matlab to simulate X_n and plot it versus different n's, where n = 1,2,3,...100. Here is my matlab code:
N = 100;
for M = 1:N
y_i = randi([-1 1], M, 1);
X_n = sum(y_i);
end
plot(M, X_n)
But my plot looks like this, can someone help me fix it? Is there something wrong with my code? Thank you.
Seems like somebody provided you with the right answer already but let me explain and how i would go about it. The only thing you're doing wrong is about the indexing. Try this.
N = 100; % sets your maximum
for M = 1:N % loops from 1 - N
y_i = randi([-1 1], M, 1); % your formula
X(M) = sum(y_i); % stores your data in vectors with increasing index from 1 - 100
end
index = 1:N % generates a vector 1-100 to serve as indexes
plot(index, X) % plots each point of X a corresponding index

Vectorize with Matlab Meshgrid in Chebfun

I am trying to use meshgrid in Matlab together with Chebfun to get rid of double for loops. I first define a quasi-matrix of N functions,
%Define functions of type Chebfun
N = 10; %number of functions
x = chebfun('x', [0 8]); %Domain
psi = [];
for i = 1:N
psi = [psi sin(i.*pi.*x./8)];
end
A sample calculation would be to compute the double sum $\sum_{i,j=1}^10 psi(:,i).*psi(:,j)$. I can achieve this using two for loops in Matlab,
h = 0;
for i = 1:N
for j = 1:N
h = h + psi(:,i).*psi(:,j);
end
end
I then tried to use meshgrid to vectorize in the following way:
[i j] = meshgrid(1:N,1:N);
h = psi(:,i).*psi(:,j);
I get the error "Column index must be a vector of integers". How can I overcome this issue so that I can get rid of my double for loops and make my code a bit more efficient?
BTW, Chebfun is not part of native MATLAB and you have to download it in order to run your code: http://www.chebfun.org/. However, that shouldn't affect how I answer your question.
Basically, psi is a N column matrix and it is your desire to add up products of all combinations of pairs of columns in psi. You have the right idea with meshgrid, but what you should do instead is unroll the 2D matrix of coordinates for both i and j so that they're single vectors. You'd then use this and create two N^2 column matrices that is in such a way where each column corresponds to that exact column numbers specified from i and j sampled from psi. You'd then do an element-wise multiplication between these two matrices and sum across all of the columns for each row. BTW, I'm going to use ii and jj as variables from the output of meshgrid instead of i and j. Those variables are reserved for the complex number in MATLAB and I don't want to overshadow those unintentionally.
Something like this:
%// Your code
N = 10; %number of functions
x = chebfun('x', [0 8]); %Domain
psi = [];
for i = 1:N
psi = [psi sin(i.*pi.*x./8)];
end
%// New code
[ii,jj] = meshgrid(1:N, 1:N);
%// Create two matrices and sum
matrixA = psi(:, ii(:));
matrixB = psi(:, jj(:));
h = sum(matrixA.*matrixB, 2);
If you want to do away with the temporary variables, you can do it in one statement after calling meshgrid:
h = sum(psi(:, ii(:)).*psi(:, jj(:)), 2);
I don't have Chebfun installed, but we can verify that this calculates what we need with a simple example:
rng(123);
N = 10;
psi = randi(20, N, N);
Running this code with the above more efficient solution gives us:
>> h
h =
8100
17161
10816
12100
14641
9216
10000
8649
9025
11664
Also, running the above double for loop code also gives us:
>> h
h =
8100
17161
10816
12100
14641
9216
10000
8649
9025
11664
If you want to be absolutely sure, we can have both codes run with the outputs as separate variables, then check if they're equal:
%// Setup
rng(123);
N = 10;
psi = randi(20, N, N);
%// Old code
h = 0;
for i = 1:N
for j = 1:N
h = h + psi(:,i).*psi(:,j);
end
end
%// New code
[ii,jj] = meshgrid(1:N, 1:N);
hnew = sum(psi(:, ii(:)).*psi(:, jj(:)), 2);
%// Check for equality
eql = isequal(h, hnew);
eql checks if both variables are equal, and we do get them as such:
>> eql
eql =
1

Vectorize octave/matlab codes

Following is the octave codes(part of kmeans)
centroidSum = zeros(K);
valueSum = zeros(K, n);
for i = 1 : m
for j = 1 : K
if(idx(i) == j)
centroidSum(j) = centroidSum(j) + 1;
valueSum(j, :) = valueSum(j, :) + X(i, :);
end
end
end
The codes work, is it possible to vectorize the codes?
It is easy to vectorize the codes without if statement,
but how could we vectorize the codes with if statement?
I assume the purpose of the code is to compute the centroids of subsets of a set of m data points in an n-dimensional space, where the points are stored in a matrix X (points x coordinates) and the vector idx specifies for each data point the subset (1 ... K) the point belongs to. Then a partial vectorization is:
centroid = zeros(K, n)
for j = 1 : K
centroid(j, :) = mean(X(idx == j, :));
end
The if is eliminated by indexing, in particular logical indexing: idx == j gives a boolean array which indicates which data points belong to subset j.
I think it might be possible to get rid of the second for-loop, too, but this would result in very convoluted, unintelligible code.
Brief introduction and solution code
This could be one fully vectorized approach based on -
accumarray: For accumulating summations as done for calulating valueSum. This also introduces a technique how one can use accumarray on a 2D matrix along a certain direction, which isn't possible in a straight-forward manner with it.
bsxfun: For calculating linear indices across all columns for matching row indices from idx.
Here's the implementation -
%// Store no. of columns in X for frequent usage later on
ncols = size(X,2);
%// Find indices in idx that are within [1:k] range, call them as labels
%// Also, find their locations in that range array, call those as pos
[pos,id] = ismember(idx,1:K);
labels = id(pos);
%// OR with bsxfun: [pos,labels] = find(bsxfun(#eq,idx(:),1:K));
%// Find all labels, i.e. across all columns of X
all_labels = bsxfun(#plus,labels(:),[0:ncols-1]*K);
%// Get truncated X corresponding to all indices matches across all columns
X_cut = X(pos,:);
%// Accumulate summations within each column based on the labels.
%// Note that accumarray doesn't accept matrices, so we were required
%// to create all_labels that had same labels within each column and
%// offsetted at constant intervals from consecutive columns
acc1 = accumarray(all_labels(:),X_cut(:));
%// Regularise accumulated array and reshape back to a 2D array version
acc1_reg2D = [acc1 ; zeros(K*ncols - numel(acc1),1)];
valueSum = reshape(acc1_reg2D,[],ncols);
centroidSum = histc(labels,1:K); %// Get labels counts as centroid sums
Benchmarking code
%// Datasize parameters
K = 5000;
n = 5000;
m = 5000;
idx = randi(9,1,m);
X = rand(m,n);
disp('----------------------------- With Original Approach')
tic
centroidSum1 = zeros(K,1);
valueSum1 = zeros(K, n);
for i = 1 : m
for j = 1 : K
if(idx(i) == j)
centroidSum1(j) = centroidSum1(j) + 1;
valueSum1(j, :) = valueSum1(j, :) + X(i, :);
end
end
end
toc, clear valueSum1 centroidSum1
disp('----------------------------- With Proposed Approach')
tic
%// ... Code from earlied mentioned section
toc
Runtime results
----------------------------- With Original Approach
Elapsed time is 1.235412 seconds.
----------------------------- With Proposed Approach
Elapsed time is 0.379133 seconds.
Not sure about its runtime performance but here's a non-convoluted vectorized implementation:
b = idx == 1:K;
centroids = (b' * X) ./ sum(b)';
Vectorizing the calculation makes a huge difference in performance. Benchmarking
The original code,
The partial vectorization from A. Donda and
The full vectorization from Tom,
gave me the following results:
Original Code: Elapsed time is 1.327877 seconds.
Partial Vectorization: Elapsed time is 0.630767 seconds.
Full Vectorization: Elapsed time is 0.021129 seconds.
Benchmarking code here:
%// Datasize parameters
K = 5000;
n = 5000;
m = 5000;
idx = randi(9,1,m);
X = rand(m,n);
fprintf('\nOriginal Code: ')
tic
centroidSum1 = zeros(K,1);
valueSum1 = zeros(K, n);
for i = 1 : m
for j = 1 : K
if(idx(i) == j)
centroidSum1(j) = centroidSum1(j) + 1;
valueSum1(j, :) = valueSum1(j, :) + X(i, :);
end
end
end
centroids = valueSum1 ./ centroidSum1;
toc, clear valueSum1 centroidSum1 centroids
fprintf('\nPartial Vectorization: ')
tic
centroids = zeros(K,n);
for k = 1:K
centroids(k,:) = mean( X(idx == k, :) );
end
toc, clear centroids
fprintf('\nFull Vectorization: ')
tic
centroids = zeros(K,n);
b = idx == 1:K;
centroids = (b * X) ./ sum(b)';
toc
Note, I added an extra line to the original code to element-wise divide valueSum1 by centroidSum1 to make the output of each type of code the same.
Finally, I know this isn't strictly an "answer", however I don't have enough reputation to add a comment, and I thought the benchmarking figures were useful to anyone who is learning MATLAB (like myself) and needs some extra motivation to master vectorization.

Gaussian Elimination in Matlab

I am using the matlab code from this book: http://books.google.com/books/about/Probability_Markov_chains_queues_and_sim.html?id=HdAQdzAjl60C
Here is the Code:
function [pi] = GE(Q)
A = Q';
n = size(A);
for i=1:n-1
for j=i+1:n
A(j,i) = -A(j,i)/A(i,i);
end
for j =i+1:n
for k=i+1:n
A(j,k) = A(j,k)+ A(j,i) * A(i,k);
end
end
end
x(n) = 1;
for i = n-1:-1:1
for j= i+1:n
x(i) = x(i) + A(i,j)*x(j);
end
x(i) = -x(i)/A(i,i);
end
pi = x/norm(x,1);
Is there a faster code that I am not aware of? I am calling this functions millions of times, and it takes too much time.
MATLAB has a whole set of built-in linear algebra routines - type help slash, help lu or help chol to get started with a few of the common ways to efficiently solve linear equations in MATLAB.
Under the hood these functions are generally calling optimised LAPACK/BLAS library routines, which are generally the fastest way to do linear algebra in any programming language. Compared with a "slow" language like MATLAB it would not be unexpected if they were orders of magnitude faster than an m-file implementation.
Hope this helps.
Unless you are specifically looking to implement your own, you should use Matlab's backslash operator (mldivide) or, if you want the factors, lu. Note that mldivide can do more than Gaussian elimination (e.g., it does linear least squares, when appropriate).
The algorithms used by mldivide and lu are from C and Fortran libraries, and your own implementation in Matlab will never be as fast. If, however, you are determined to use your own implementation and want it to be faster, one option is to look for ways to vectorize your implementation (maybe start here).
One other thing to note: the implementation from the question does not do any pivoting, so its numerical stability will generally be worse than an implementation that does pivoting, and it will even fail for some nonsingular matrices.
Different variants of Gaussian elimination exist, but they are all O(n3) algorithms. If any one approach is better than another depends on your particular situation and is something you would need to investigate more.
function x = naiv_gauss(A,b);
n = length(b); x = zeros(n,1);
for k=1:n-1 % forward elimination
for i=k+1:n
xmult = A(i,k)/A(k,k);
for j=k+1:n
A(i,j) = A(i,j)-xmult*A(k,j);
end
b(i) = b(i)-xmult*b(k);
end
end
% back substitution
x(n) = b(n)/A(n,n);
for i=n-1:-1:1
sum = b(i);
for j=i+1:n
sum = sum-A(i,j)*x(j);
end
x(i) = sum/A(i,i);
end
end
Let's assume Ax=d
Where A and d are known matrices.
We want to represent "A" as "LU" using "LU decomposition" function embedded in matlab thus:
LUx = d
This can be done in matlab following:
[L,U] = lu(A)
which in terms returns an upper triangular matrix in U and a permuted lower triangular matrix in L such that A = LU. Return value L is a product of lower triangular and permutation matrices. (https://www.mathworks.com/help/matlab/ref/lu.html)
Then if we assume Ly = d where y=Ux.
Since x is Unknown, thus y is unknown too, by knowing y we find x as follows:
y=L\d;
x=U\y
and the solution is stored in x.
This is the simplest way to solve system of linear equations providing that the matrices are not singular (i.e. the determinant of matrix A and d is not zero), otherwise, the quality of the solution would not be as good as expected and might yield wrong results.
if the matrices are singular thus cannot be inversed, another method should be used to solve the system of the linear equations.
For the naive approach (aka without row swapping) for an n by n matrix:
function A = naiveGauss(A)
% find's the size
n = size(A);
n = n(1);
B = zeros(n,1);
% We have 3 steps for a 4x4 matrix so we have
% n-1 steps for an nxn matrix
for k = 1 : n-1
for i = k+1 : n
% step 1: Create multiples that would make the top left 1
% printf("multi = %d / %d\n", A(i,k), A(k,k), A(i,k)/A(k,k) )
for j = k : n
A(i,j) = A(i,j) - (A(i,k)/A(k,k)) * A(k,j);
end
B(i) = B(i) - (A(i,k)/A(k,k)) * B(k);
end
end
function Sol = GaussianElimination(A,b)
[i,j] = size(A);
for j = 1:i-1
for i = j+1:i
Sol(i,j) = Sol(i,:) -( Sol(i,j)/(Sol(j,j)*Sol(j,:)));
end
end
disp(Sol);
end
I think you can use the matlab function rref:
[R,jb] = rref(A,tol)
It produces a matrix in reduced row echelon form.
In my case it wasn't the fastest solution.
The solution below was faster in my case by about 30 percent.
function C = gauss_elimination(A,B)
i = 1; % loop variable
X = [ A B ];
[ nX mX ] = size( X); % determining the size of matrix
while i <= nX % start of loop
if X(i,i) == 0 % checking if the diagonal elements are zero or not
disp('Diagonal element zero') % displaying the result if there exists zero
return
end
X = elimination(X,i,i); % proceeding forward if diagonal elements are non-zero
i = i +1;
end
C = X(:,mX);
function X = elimination(X,i,j)
% Pivoting (i,j) element of matrix X and eliminating other column
% elements to zero
[ nX mX ] = size( X);
a = X(i,j);
X(i,:) = X(i,:)/a;
for k = 1:nX % loop to find triangular form
if k == i
continue
end
X(k,:) = X(k,:) - X(i,:)*X(k,j);
end