Auto-correlation of each column of a matrix in matlab - matlab

My data is included in a matrix (dim: 900 x 10) called input_data_matrix, each column of this matrix has 900 time-series random signals (light readings integer values).
I want to compute the relation (or correlation) between these 900 readings of same column independently, (not correlation with the other columns readings), such that I can get 10 correlation result values corresponding to the 10 column which are indicate how much the 900 readings of each column are correlate,
So, my question is how I can compute this in matlab and what is the best type of correlation to do this.

If I have understood correctly, what you want is the autocorrelation of each column of your input data. In that case, I would use the xcorr function (https://es.mathworks.com/help/signal/ref/xcorr.html), which for a given vector computes its autocorrelation. The code would be the following:
[m, n] = size(input_data_matrix);
output_matrix = zeros(m, n);
for i = 1:n
output_matrix(:,i) = xcorr(input_data_matrix(:,i));
end

Related

Vectorizing by splitting matrix by rows unequally

I have X_test which is a matrix of size 967874 x 3 where the columns are: doc#, wordID, wordCount, and there's 7505 unique doc#'s (length(unique(X_test(:,1))) == length(Y_test) == 7505). The matrix rows are also already sorted according to the doc#'s column.
I also have a likelihoods matrix of size 61188 x 20 where the rows are all possible wordIDs, and the columns are different classes (length(unique(Y_test))==20)
The result I'm trying to obtain is a matrix of size 7505 x 20 where each row signifies a different document and contains, for each class (column), the sum of the wordCounts of the values in the likelihood matrix rows which correspond to the wordIDs for that document (trying to think of better phrasing...)
My first thought was to rearrange this 2D matrix into a 3D matrix according to doc#s, but the number of rows for each unique doc# are unequal. I also think making a cell array of 7505 matrices isn't a great idea, but may be wrong about that.
It's probably more explanatory if I just show the code I have that works, but is slow because it iterates through each of the 7505 documents:
probabilities = zeros(length(Y_test),nClasses); % 7505 x 20
for n=1:length(Y_test) % 7505 iterations
doc = X_test(X_test(:,1)==n,:);
result = bsxfun(#times, doc(:,3), log(likelihoods(doc(:,2),:)));
% result ends up size length(doc) x 20
probabilities(n,:) = sum(result);
end
for context, this is what I use the probabilities matrix for:
% MAP decision rule
probabilities = bsxfun(#plus, probabilities, logpriors'); % add priors
[~,predictions] = max(probabilities,[],2);
CCR = sum(predictions==Y_test)/length(Y_test); % correct classification rate
fprintf('Correct classification percentage: %0.2f%%\n\n', CCR*100);
edit: so I separated the matrix into a cell array according to doc#'s, but don't know how to apply bsxfun to all arrays in a cell at the same time.
counts = histc(X_test(:,1),unique(X_test(:,1)));
testdocs = mat2cell(X_test,counts);

Finding the maximum condition number of a matrix after erasure

I am dealing with the following question: Given a random Gaussian matrix of large size, say for example 1000 by 500. Then arbitrarily remove 500 rows and consider the condition number of the matrix left. What is the maximum possible condition number we can get with high probability?
Here Gaussian matrix means the matrix has i.i.d standard normal entries. I would like to write a MATLAB program to do some simulations. How can I write the program? Thanks for any help.
That's an interesting problem. I don't know of any theoretical results, but it's easy to set up a Monte Carlo simulation and see.
Note that arbitrarily removing 500 rows is equivalent to always removing the last 500 rows for example, because the rows are i.i.d. and the condition number is invariant to changing the order of the rows.
M = 100; %// initial number of rows
N = 50; %// number of columns
R = 1e4; %// number of Monte Carlo realizations
cond1 = NaN(1,R); %// preallocate
cond2 = NaN(1,R); %// preallocate
for r = 1:R
X = randn(M,N); %// matrix with i.i.d normalized Gaussian entries
cond1(r) = cond(X);
cond2(r) = cond(X(1:N,:));
end
loglog(cond1, cond2, '.', 'markersize', 1) %// scatter plot of results in logarithmic scale
xlabel('Condition number of original matrix')
ylabel('Condition number of reduced matrix')
This is the result for M=100; N=50;. Note that for M=100; N=50; it may take long to obtain a large number of realizations.
As expected, the condition number increases when you remove rows (although I didn't expect it to increase so much!).
From the obtained vectors cond1 and cond2 you can compute statistics, or percentiles. For example, the value that is exceeded with only 10% probability is, in each case,
>> quantile(cond1,.9)
ans =
5.837510220358853
>> quantile(cond2,.9)
ans =
9.422516183444204e+02
This means that in the original matrix, 90% of the times the condition number is less than 5.8375; whereas in the reduced matrix, 90% of the times the condition number is less than 942.25.

Vector and matrix comparison in MATLAB

I have vector with 5 numbers in it, and a matrix of size 6000x20, so every row has 20 numbers. I want to count how many of the 6000 rows contain all values from the vector.
As the vector is a part of a matrix which has 80'000'000 rows, each containing unique combinations, I want a fast solution (which doesn't take more than 2 days).
Thanks
With the sizes you have, a bsxfun-based approach that builds an intermediate 6000x20x5 3D-array is affordable:
v = randi(9,1,5); %// example vector
M = randi(9,6000,20); %// example matrix
t = bsxfun(#eq, M, reshape(v,1,1,[]));
result = sum(all(any(t,2),3));

Find frequency of elements above a threshold for each cell in MATLAB

I have a 4-D matrix. The dimensions are longitude, latitude, days, years as [17,14,122,16].
I have to find out frequency of values above 98 percentile for each cell so that final output comes as as array of 17x14 containing number of occurrence of values above a 98 percent threshold.
I did something which gives me a matrix 17x14 of values associated with 98 percentile for each cell but I am unable to determine the frequency of occurrences.
k=0;
p=cell(1,238);
r=cell(1,238);
for i=1:17
for j=1:14
n=m(i,j,[1:122],[1:16]);
n=squeeze(n);
k=k+1;
q=prctile(n(:),98);
r{k}=nansum(nansum(n>=q));
p{k}=q;
end
end
This code gives matrix p fine but matrix r contains same values for all cells. How can this be possible? What am I doing wrong with this? Please help.
By definition, the frequency of values above the 98th percentile is 2%.
I'm guessing the same value you are getting for r is 39; the number of elements in the top 2% of your 122x16 matrix (i.e. 1952 elements).
r = 0.02*1952;
r =
39.040
Your code is verifying the theoretical value. Perhaps you are thinking of a different question?
Here's a simulated example, using randomly generated (uniform distribution) from 0 to 100 for your data (n).
p=cell(1,238);
r=cell(1,238);
for i=1:17
for j=1:14
% n=m(i,j,[1:122],[1:16]);
% n=squeeze(n);
% After you do n=squeeze(n), it gives 2-D matrix of 122x16
% dimensions.
n = rand(122,16)*100; % simulation for your 2-D matrix
k=k+1;
q=prctile(n(:),98);
r{k}=nansum(nansum(n>=q));
p{k}=q;
end
end

How to select values with the higher occurences from several matrices having the same size in matlab?

I would like to have a program that makes the following actions:
Read several matrices having the same size (1126x1440 double)
Select the most occuring value in each cell (same i,j of the matrices)
write this value in an output matrix having the same size 1126x1440 in the corresponding i,j position, so that this output matrix will have in each cell the most occurent value from the same position of all the input matrices.
Building on #angainor 's answer, I think there is a simpler method using the mode function.
nmatrices - number of matrices
n, m - dimensions of a single matrix
maxval - maximum value of an entry (99)
First organize data into a 3-D matrix with dimensions [n X m X nmatrices]. As an example, we can just generate the following random data in a 3-D form:
CC = round(rand(n, m, nmatrices)*maxval);
and then the computation of the most frequent values is one line:
B = mode(CC,3); %compute the mode along the 3rd dimension
Here is the code you need. I have introduced a number of constants:
nmatrices - number of matrices
n, m - dimensions of a single matrix
maxval - maximum value of an entry (99)
I first generate example matrices with rand. Matrices are changed to vectors and concatenated in the CC matrix. Hence, the dimensions of CC are [m*n, nmatrices]. Every row of CC holds individual (i,j) values for all matrices - those you want to analyze.
CC = [];
% concatenate all matrices into CC
for i=1:nmatrices
% generate some example matrices
% A = round(rand(m, n)*maxval);
A = eval(['neurone' num2str(i)]);
% flatten matrix to a vector, concatenate vectors
CC = [CC A(:)];
end
Now we do the real work. I have to transpose CC, because matlab works on column-based matrices, so I want to analyze individual columns of CC, not rows. Next, using histc I find the most frequently occuring values in every column of CC, i.e. in (i,j) entries of all matrices. histc counts the values that fall into given bins (in your case - 1:maxval) in every column of CC.
% CC is of dimension [nmatrices, m*n]
% transpose it for better histc and sort performance
CC = CC';
% count values from 1 to maxval in every column of CC
counts = histc(CC, 1:maxval);
counts have dimensions [maxval, m*n] - for every (i,j) of your original matrices you know the number of times a given value from 1:maxval is represented. The last thing to do now is to sort the counts and find out, which is the most frequently occuring one. I do not need the sorted counts, I need the permutation that will tell me, which entry from counts has the highest value. That is exactly what you want to find out.
% sort the counts. Last row of the permutation will tell us,
% which entry is most frequently found in columns of CC
[~,perm] = sort(counts);
% the result is a reshaped last row of the permutation
B = reshape(perm(end,:)', m, n);
B is what you want.