Find largest subset of linearly independent vectors with Matlab - matlab

I need to create a matlab function that finds the largest subset of linearly independent vectors in a matrix A.
Initialize the output of the program to be 0, which corresponds to the empty set (containing no column vectors). Scanning the columns of A from left to right one by one; if adding the current column vector to the set of linearly independent vectors found so far makes the new set of vectors linearly DEPENDENT, then skip this vector, otherwise add this vector to the solution set; and move to the next column.
function [ out ] = maxindependent(A)
%MAXINDEPENDENT takes a matrix A and produces an array in which the columns
%are a subset of independent vectors with maximum size.
[r c]= size(A);
out=0;
A=A(:,rank(A))
for jj=1:c
M=[A A(:,jj)]
if rank(M)~=size(M,2)
A=A
elseif rank(M)==size(M,2)
A=M
end
end
out=A
if max(out)==0
0;
end
end

The number of linearly independent vectors in a matrix is equal to the rank of the matrix, and a particular subset of linearly independent vectors is not unique. Any 'largest subset' of linearly independent vectors will have size equal to the rank.
There is a function for this in MATLAB:
n = rank(A);
The algorithm you described is not necessary; you should just use the SVD. There is a concise way to do it here: how to get the maximally independent vectors given a set of vectors in MATLAB?

Related

Calculating the most similar pair of column vectors using cosine distance in a matrix

I have a 943x1682 matrix in which I want to calculate the two most similar vectors in this matrix. So I want see the cosine distance of each vector in the matrix to each vector in the matrix, of course not including the vector with itself, if one cannot do that I can just ignore those.
I made this loop to try to calculate this, so I can get a 1682x1682 matrix, with each cell corresponding to the similarity between i and j. However when I run this, it takes forever to run, and when I try to open the resulting matrix in my workspace, it says:
Cannot display summaries of variables with more than 524288 elements.
Is there an easier way to do this or am I doing something wrong?
Cross posted on MATLAB Answers. Repeating answer here:
Use a standard matrix multiply to get the dot products. MATLAB is very fast at standard matrix multiplies. And then normalize the result. E.g.,
AA = A' * A; % the column dot products via a standard matrix multiply
Anorm = sqrt(diag(AA)); % the norms of the columns
Adist = AA ./ (Anorm .* Anorm.'); % normalize the column dot products into cosine distances
Then pick off the maximum value for your answer, disregarding the diagonal. E.g.,
n = size(A,2); % the number of columns
Adist(1:n+1:end) = -inf; % disregard the diagonal (column compared to itself)
[~,x] = max(Adist(:)); % find the max cosine distance linear index
[col1,col2] = ind2sub(size(Adist),x); % convert linear index into the original columns
Then col1 and col2 are the column numbers of the most similiar columns using cosine distance as a measure.
You can normalise the columns of the matrix first, then the cosine similarity equation simplifies to a matrix multiplication:
aNorm = normc(A);
cosSim = aNorm' * aNorm;
Generally, matrix multiplication is more performant than looping. In a quick test, with N = 1000, the looping code takes ~7 seconds and the matrix multiplication code ~0.5 seconds.
The resultant matrix may still be too large to open in your workspace, you could copy any individual rows or columns into a temporary and view those, or do a contour plot (heat-map) of the matrix to get a visual representation.

Determine the 'greatest' singular vector of U matrix after SVD in Matlab

It is known that in Matlab SVD function outputs three matrices: [U,S,V] = svd(X).
Actually, 'U' is a square m X m matrix where m is the number of rows/columns. Also, 'S' is a non-square matrix with dimensions m X n that stores n singular values (produced from left singular vectors of U matrix) in descending order(in diagonal).
My question is how to determine (in Matlab) which 'm' singular vectors of matrix 'U' correspond to the first (greatest) singular value of the 'S' matrix. Furthermore, some values of the specific singular vector are positive and others are negative. Does this minus or plus sign hides any mathematical meaning? I have seen examples that use the sign of the 'greatest' singular vector as for classification purposes.
The diagonal of the S matrix contains the singular values. So for the ith singular value (in the i,i position on S matrix), ith column of the U and V vectors respectively for the two constraint equations.
I don't think the +/- hides any special meaning. After all, you could multiply both the U and the V matrices by a -1 constant and the result would still be valid.
To be perfectly accurate, by definition singular values of SVD are not necessarly reordered, but MATLAB SVD reorders them.
The ith column of U corresponds to the ith singular value of M.
Namely for the ith singular value sigma_j, there exist j such that
M* .u_i = sigma_j v_j
you also have
M. v_j = sigma_i u_i
Be careful, it might not be what you are looking for.
The coordinates of your singular values are the coordonates in the original basis. A positive values means your new variable is positively proportional to the corresponding original variable. In statistics it is generally used when you know that both original and transformed variables increase or decrease together.

Matlab : Help in finding minimum distance

I am trying to find the point that is at a minimum distance from the candidate set. Z is a matrix where the rows are the dimension and columns indicate points. Computing the inter-point distances, and then recording the point with minimum distance and its distance as well. Below is the code snippet. The code works fine for a small dimension and small set of points. But, it takes a long time for large data set (N = 1 million data points and dimension is also high). Is there an efficient way?
I suggest that you use pdist to do the heavy lifting for you. This function will compute the pairwise distance between every two points in your array. The resulting vector has to be put into matrix form using squareform in order to find the minimal value for each pair:
N = 100;
Z = rand(2,N); % each column is a 2-dimensional point
% pdist assumes that the second index corresponds to dimensions
% so we need to transpose inside pdist()
distmatrix = squareform(pdist(Z.','euclidean')); % output is [N, N] in size
% set diagonal values to infinity to avoid getting 0 self-distance as minimum
distmatrix = distmatrix + diag(inf(1,size(distmatrix,1)));
mindists = min(distmatrix,[],2); % find the minimum for each row
sum_dist = sum(mindists); % sum of minimal distance between each pair of points
This computes every pair twice, but I think this is true for your original implementation.
The idea is that pdist computes the pairwise distance between the columns of its input. So we put the transpose of Z into pdist. Since the full output is always a square matrix with zero diagonal, pdist is implemented such that it only returns the values above the diagonal, in a vector. So a call to squareform is needed to get the proper distance matrix. Then, the row-wise minimum of this matrix have to be found, but first we have to exclude the zero in the diagonals. I was lazy so I put inf into the diagonals, to make sure that the minimum is elsewhere. In the end we just have to sum up the minimal distances.

Matlab: How to convert a matrix into a Toeplitz matrix

Considering a discrete dynamical system where x[0]=rand() denotes the initial condition of the system.
I have generated an m by n matrix by the following step -- generate m vectors with m different initial conditions each with dimension N (N indicates the number of samples or elements). This matrix is called R. Using R how do I create a Toeplitz matrix, T? T
Mathematically,
R = [ x_0[0], ....,x_0[n-1];
..., ,.....;
x_m[0],.....,x_m[n-1]]
The toeplitz matrix T =
x[n-1], x[n-2],....,x[0];
x[0], x[n-1],....,x[1];
: : :
x[m-2],x[m-3]....,x[m-1]
I tried working with toeplitz(R) but the dimension changes. The dimension should no change, as seen mathematically.
According to the paper provided (Toeplitz structured chaotic sensing matrix for compressive sensing by Yu et al.) there are two Chaotic Sensing Matrices involved. Let's explore them separately.
The Chaotic Sensing Matrix (Section A)
It is clearly stated that to create such matrix you have to build m independent signals (sequences) with m different initials conditions (in range ]0;1[) and then concatenate such signals per rows (that is, one signal = one row). Each of these signals must have length N. This actually is your matrix R, which is correctly evaluated as it is. Although I'd like to suggest a code improvement: instead of building a column and then transpose the matrix you can directly build such matrix per rows:
R=zeros(m,N);
R(:,1)=rand(m,1); %build the first column with m initial conditions
Please note: by running randn() you select values with Gaussian (Normal) distribution, such values might not be in range ]0;1[ as stated in the paper (right below equation 9). As instead by using rand() you take uniformly distributed values in such range.
After that, you can build every row separately according to the for-loop:
for i=1:m
for j=2:N %skip first column
R(i,j)=4*R(i,j-1)*(1-R(i,j-1));
R(i,j)=R(i,j)-0.5;
end
end
The Toeplitz Chaotic Sensing Matrix (Section B)
It is clearly stated at the beginning of Section B that to build the Toeplitz matrix you should consider a single sequence x with a given, single, initial condition. So let's build such sequence:
x=rand();
for j=2:N %skip first element
x(j)=4*x(j-1)*(1-x(j-1));
x(j)=x(j)-0.5;
end
Now, to build the matrix you can consider:
how do the first row looks like? Well, it looks like the sequence itself, but flipped (i.e. instead of going from 0 to n-1, it goes from n-1 to 0)
how do the first column looks like? It is the last item from x concatenated with the elements in range 0 to m-2
Let's then build the first row (r) and the first column (c):
r=fliplr(x);
c=[x(end) x(1:m-1)];
Please note: in Matlab the indices start from 1, not from 0 (so instead of going from 0 to m-2, we go from 1 to m-1). Also end means the last element from a given array.
Now by looking at the help for the toeplitz() function, it is clearly stated that you can build a non-squared Toeplitz matrix by specifying the first row and the first column. Therefore, finally, you can build such matrix as:
T=toeplitz(c,r);
Such matrix will indeed have dimensions m*N, as reported in the paper.
Even though the Authors call both of them \Phi, they actually are two separate matrices.
They do not take the Toeplitz of the Beta-Like Matrix (Toeplitz matrix is not a function or operator of some kind), neither do they transform the Beta-Like Matrix into a Toeplitz-matrix.
You have the Beta-Like Matrix (i.e. the Chaotic Sensing Matrix) at first, and then the Toeplitz-structured Chaotic Sensing Matrix: such structure is typical for Toeplitz matrices, that is a diagonal-constant structure (all elements along a diagonal have the same value).

Difference in between Covariance and Correlation Matrix

In Matlab, I have created a matrix A with size (244x2014723)
and a matrix B with size (244x1)
I was able to calculate the correlation matrix using corr(A,B) which yielded in a matrix of size 2014723x1. So, every column of matrix A correlates with matrix B and gives one row value in the matrix of size 2014723x1.
My question is when I ask for a covariance matrix using cov(A,B), I get an error saying A and B should be of same sizes. Why do I get this error? How is the method to find corr(A,B) any different from cov(A,B)?
The answer is pretty clear if you read the documentation:
cov:
If A and B are matrices of observations, cov(A,B) treats A and B as vectors and is equivalent to cov(A(:),B(:)). A and B must have equal size.
corr
corr(X,Y) returns a p1-by-p2 matrix containing the pairwise correlation coefficient between each pair of columns in the n-by-p1 and n-by-p2 matrices X and Y.
The difference between corr(X,Y) and the MATLABĀ® function corrcoef(X,Y) is that corrcoef(X,Y) returns a matrix of correlation coefficients for the two column vectors X and Y. If X and Y are not column vectors, corrcoef(X,Y) converts them to column vectors.
One way you could get the covariances of your vector with each column of you matrix is to use a loop. Another way (might be in-efficient depending on the size) is
C = cov([B,A])
and then look at the first row (or column) or C.
See link
In the more about section, the equation describing how cov is computed for cov(A,B) makes it clear why they need to be the same size. The summation is over only one variable which enumerates the elements of A,B.