Remove duplicates in correlations in matlab - matlab

Please see the following issue:
P=rand(4,4);
for i=1:size(P,2)
for j=1:size(P,2)
[r,p]=corr(P(:,i),P(:,j))
end
end
Clearly, the loop will cause the number of correlations to be doubled (i.e., corr(P(:,1),P(:,4)) and corr(P(:,4),P(:,1)). Does anyone have a suggestion on how to avoid this? Perhaps not using a loop?
Thanks!

I have four suggestions for you, depending on what exactly you are doing to compute your matrices. I'm assuming the example you gave is a simplified version of what needs to be done.
First Method - Adjusting the inner loop index
One thing you can do is change your j loop index so that it only goes from 1 up to i. This way, you get a lower triangular matrix and just concentrate on the values within the lower triangular half of your matrix. The upper half would essentially be all set to zero. In other words:
for i = 1 : size(P,2)
for j = 1 : i
%// Your code here
end
end
Second Method - Leave it unchanged, but then use unique
You can go ahead and use the same matrix like you did before with the full two for loops, but you can then filter the duplicates by using unique. In other words, you can do this:
[Y,indices] = unique(P);
Y will give you a list of unique values within the matrix P and indices will give you the locations of where these occurred within P. Note that these are column major indices, and so if you wanted to find the row and column locations of where these locations occur, you can do:
[rows,cols] = ind2sub(size(P), indices);
Third Method - Use pdist and squareform
Since you're looking for a solution that requires no loops, take a look at the pdist function. Given a M x N matrix, pdist will find distances between each pair of rows in a matrix. squareform will then transform these distances into a matrix like what you have seen above. In other words, do this:
dists = pdist(P.', 'correlation');
distMatrix = squareform(dists);
Fourth Method - Use the corr method straight out of the box
You can just use corr in the following way:
[rho, pvals] = corr(P);
corr in this case will produce a m x m matrix that contains the correlation coefficient between each pair of columns an n x m matrix stored in P.
Hopefully one of these will work!

this works ?
for i=1:size(P,2)
for j=1:i

Since you are just correlating each column with the other, then why not just use (straight from the documentation)
[Rho,Pval] = corr(P);
I don't have the Statistics Toolbox, but according to http://www.mathworks.com/help/stats/corr.html,
corr(X) returns a p-by-p matrix containing the pairwise linear correlation coefficient between each pair of columns in the n-by-p matrix X.

Related

Is there a way to parse each row of a matrix in Octave?

I am new to Octave and I wanted to know if there is a way to parse each row of a matrix and use it individually. Ultimately I want to use the rows to check if they are all vertical to each other (the dot product have to be equal to 0 for two vectors to be vertical to each other) so if you have some ideas about that I would love to hear them. Also I wanted to know if there is a function to determine the length (or the amplitude) of a vector.
Thank you in advance.
If by "parse each row" you mean a loop that takes each row one by one, you only need a for loop over the transposed matrix. This works because the for loop takes successive columns of its argument.
Example:
A = [10 20; 30 40; 50 60];
for row = A.'; % loop over columns of transposed matrix
row = row.'; % transpose back to obtain rows of the original matrix
disp(row); % do whatever you need with each row
end
However, loops can often be avoided in Matlab/Octave, in favour of vectorized code. For the specific case you mention, computing the dot product between each pair of rows of A is the same as computing the matrix product of A times itself transposed:
A*A.'
However, for the general case of a complex matrix, the dot product is defined with a complex conjugate, so you should use the complex-conjugate transpose:
P = A*A';
Now P(m,n) contains the dot product of the n-th and m-th rows of A. The condition you want to test is equivalent to P being a diagonal matrix:
result = isdiag(P); % gives true of false

Vectorizing a parallel FOR loop across multiple dimensions MATLAB

Please correct me if there are somethings unclear in this question. I have two matrices pop, and ben of 3 dimensions. Call these dimensions as c,t,w . I want to repeat the exact same process I describe below for all of the c dimensions, without using a for loop as that is slow. For the discussion below, fix a value of the dimension c, to explain my thinking, later I will give a MWE. So when c is fixed I have a 2D matrix with dimension t,w.
Now I repeat the entire process (coming below!) for all of the w dimension.
If the value of u is zero, then I find the next non zero entry in this same t dimension. I save both this entry as well as the corresponding t index. If the value of u is non zero, I simply store this value and the corresponding t index. Call the index as i - note i would be of dimension (c,t,w). The last entry of every u(c,:,w) is guaranteed to be non zero.
Example if the u(c,:,w) vector is [ 3 0 4 2 0 1], then the corresponding i values are [1,3,3,4,6,6].
Now I take these entries and define a new 3d array of dimension (c,t,w) as follows. I take my B array and do the following what is not a correct syntax but to explain you: B(c,t,w)/u(c,i(c,t,w),w). Meaning I take the B values and divide it by the u values corresponding to the non zero indices of u from i that I computed.
For the above example, the denominator would be [3,4,4,2,1,1]. I hope that makes sense!!
QUESTION:
To do this, as this process simply repeats for all c, I can do a very fast vectorizable calculation for a single c. But for multiple c I do not know how to avoid the for loop. I don't knw how to do vectorizable calculations across dimensions.
Here is what I did, where c_size is the dimension of c.
for c=c_size:-1:1
uu=squeeze(pop(c,:,:)) ; % EXTRACT A 2D MATRIX FROM pop.
BB=squeeze(B(c,:,:)) ; % EXTRACT A 2D MATRIX FROM B
ii = nan(size(uu)); % Start with all nan values
[dum_row, ~] = find(uu); % Get row indices of non-zero values
ii(uu ~= 0) = dum_row; % Place row indices in locations of non-zero values
ii = cummin(ii, 1, 'reverse'); % Column-wise cumulative minimum, starting from bottomi
dum_i = ii+(time_size+1).*repmat(0:(scenario_size-1), time_size+1, 1); % Create linear index
ben(c,:,:) = BB(dum_i)./uu(dum_i);
i(c,:,:) = ii ;
clear dum_i dum_row uu BB ii
end
The central question is to avoid this for loop.
Related questions:
Vectorizable FIND function with if statement MATLAB
Efficiently finding non zero numbers from a large matrix
Vectorizable FIND function with if statement MATLAB

Matlab: How to convert a matrix into a Toeplitz matrix

Considering a discrete dynamical system where x[0]=rand() denotes the initial condition of the system.
I have generated an m by n matrix by the following step -- generate m vectors with m different initial conditions each with dimension N (N indicates the number of samples or elements). This matrix is called R. Using R how do I create a Toeplitz matrix, T? T
Mathematically,
R = [ x_0[0], ....,x_0[n-1];
..., ,.....;
x_m[0],.....,x_m[n-1]]
The toeplitz matrix T =
x[n-1], x[n-2],....,x[0];
x[0], x[n-1],....,x[1];
: : :
x[m-2],x[m-3]....,x[m-1]
I tried working with toeplitz(R) but the dimension changes. The dimension should no change, as seen mathematically.
According to the paper provided (Toeplitz structured chaotic sensing matrix for compressive sensing by Yu et al.) there are two Chaotic Sensing Matrices involved. Let's explore them separately.
The Chaotic Sensing Matrix (Section A)
It is clearly stated that to create such matrix you have to build m independent signals (sequences) with m different initials conditions (in range ]0;1[) and then concatenate such signals per rows (that is, one signal = one row). Each of these signals must have length N. This actually is your matrix R, which is correctly evaluated as it is. Although I'd like to suggest a code improvement: instead of building a column and then transpose the matrix you can directly build such matrix per rows:
R=zeros(m,N);
R(:,1)=rand(m,1); %build the first column with m initial conditions
Please note: by running randn() you select values with Gaussian (Normal) distribution, such values might not be in range ]0;1[ as stated in the paper (right below equation 9). As instead by using rand() you take uniformly distributed values in such range.
After that, you can build every row separately according to the for-loop:
for i=1:m
for j=2:N %skip first column
R(i,j)=4*R(i,j-1)*(1-R(i,j-1));
R(i,j)=R(i,j)-0.5;
end
end
The Toeplitz Chaotic Sensing Matrix (Section B)
It is clearly stated at the beginning of Section B that to build the Toeplitz matrix you should consider a single sequence x with a given, single, initial condition. So let's build such sequence:
x=rand();
for j=2:N %skip first element
x(j)=4*x(j-1)*(1-x(j-1));
x(j)=x(j)-0.5;
end
Now, to build the matrix you can consider:
how do the first row looks like? Well, it looks like the sequence itself, but flipped (i.e. instead of going from 0 to n-1, it goes from n-1 to 0)
how do the first column looks like? It is the last item from x concatenated with the elements in range 0 to m-2
Let's then build the first row (r) and the first column (c):
r=fliplr(x);
c=[x(end) x(1:m-1)];
Please note: in Matlab the indices start from 1, not from 0 (so instead of going from 0 to m-2, we go from 1 to m-1). Also end means the last element from a given array.
Now by looking at the help for the toeplitz() function, it is clearly stated that you can build a non-squared Toeplitz matrix by specifying the first row and the first column. Therefore, finally, you can build such matrix as:
T=toeplitz(c,r);
Such matrix will indeed have dimensions m*N, as reported in the paper.
Even though the Authors call both of them \Phi, they actually are two separate matrices.
They do not take the Toeplitz of the Beta-Like Matrix (Toeplitz matrix is not a function or operator of some kind), neither do they transform the Beta-Like Matrix into a Toeplitz-matrix.
You have the Beta-Like Matrix (i.e. the Chaotic Sensing Matrix) at first, and then the Toeplitz-structured Chaotic Sensing Matrix: such structure is typical for Toeplitz matrices, that is a diagonal-constant structure (all elements along a diagonal have the same value).

MATLAB: Efficient (vectorized) way to apply function on two matrices?

I have two matrices X and Y, both of order mxn. I want to create a new matrix O of order mxm such that each i,j th entry in this new matrix is computed by applying a function to ith and jth row of X and Y respectively. In my case m = 10000 and n = 500. I tried using a loop but it takes forever. Is there an efficient way to do it?
I am targeting two functions dot product -- dot(row_i, row_j) and exp(-1*norm(row_i-row_j)). But I was wondering if there is a general way so that I can plugin any function.
Solution #1
For the first case, it looks like you can simply use matrix multiplication after transposing Y -
X*Y'
If you are dealing with complex numbers -
conj(X*ctranspose(Y))
Solution #2
For the second case, you need to do a little more work. You need to use bsxfun with permute to re-arrange dimensions and employ the raw form of norm calculations and finally squeeze to get a 2D array output -
squeeze(exp(-1*sqrt(sum(bsxfun(#minus,X,permute(Y,[3 2 1])).^2,2)))
If you would like to avoid squeeze, you can use two permute's -
exp(-1*sqrt(sum(bsxfun(#minus,permute(X,[1 3 2]),permute(Y,[3 1 2])).^2,3)))
I would also advise you to look into this problem - Efficiently compute pairwise squared Euclidean distance in Matlab.
In conclusion, there isn't a common most efficient way that could be employed for every function to ith and jth row of X. If you are still hell bent on that, you can use anonymous function handles with bsxfun, but I am afraid it won't be the most efficient technique.
For the second part, you could also use pdist2:
result = exp(-pdist2(X,Y));

Correlation coefficient between rows of two matrices

I have the following code, how will I be able to simplify it using the function as it currently runs pretty slow, assume X is 10x7 and Y is 4x7 and D is a matrix stores the correlation between each pair of vectors. If the solution is to use the xcorr2 function can someone show me how it is done?
for i = 1:4
for j = 1:10
D(j,i) = corr2(X(j,:),Y(i,:));
end
end
Use pdist2 (Statistics toolbox) with 'correlation' option. It's faster than your code (even with preallocation), and requires just one line:
D = 1-pdist2(X,Y,'correlation');
Here is how I would do it:
First of all, store/process your matrix transposed. This makes for easier use of the correlation function.
Now assuming you have matrices X and Y and want to get the correlations between combinations of columns, this is easily achieved with a single loop:
Take the first column of X
use corrcoef to determine the correlation with all columns of Y at once.
As long as there is one, take the next column of X
Alternate to this, you can check whether it helps to replace corr2 in your original code with corr, xcorr or corrcoef and see which one runs fastest.
With corrcoef you can do this without a loop, and without using a toolbox:
D = corrcoef([X', Y']);
D = D(1 : size(X, 1), end - size(Y, 1) + 1 : end);
A drawback is that more coefficients are computed than necessary.
The transpose is necessary because your matrices do not follow the Matlab convention to enumerate samples with the first index and variables with the second.