Matlab: Why is correlation NaN when using 'corrcoef'? - matlab

When I run corrcoef to find correlation coefficients among two data arrays, I get NaNs. It only does that for one batch of data. Here is a download link to the data within .mat file.
I run this code
[R(1).R,R(1).P,R(1).RL,R(1).RU] = corrcoef([data.Series1], [data.Series2], 'rows', 'pairwise');
and it gives me
NaN NaN
NaN 1
for R, P, RL, and RU.
I don't think the NaNs in the data are the problem because I use 'pairwise' parameter for corrcoef function, which tells it to ignore NaNs.
I copied the same data into Microsoft Excel and it calculated the correlation coefficient just fine. Here is the Excel file with the coefficient of correlation calculated. Why doesn't corrcoef do it? What can possibly go wrong here?

I had to download this file and plug it in to see what happened.
Yes you are right that when treating the data with pairwise functionality, the pairs with anyone element = NaN are effectively removed from the operation;
BUT - what about INFs? In your [data.Series1] - you have INF entries, and that seems to be causing the problem.
I extracted your data series into 2 vectors A and B:
A = [data.Series1];
B = [data.Series2];
>> max (A)
ans =
Inf
Now by setting Inf to NaN:
A(isinf(A)) = NaN;
[R(1).R,R(1).P,R(1).RL,R(1).RU] = corrcoef(A,B, 'rows', 'pairwise');
>> R.RL
ans =
1.0000 -0.0794
-0.0794 1.0000
Discussions: Obviously INF will not work in MATLAB, but the question is why did it work for Excel? Did Excel turn Inf into NaN by default when using CORREL? Because the data certainly got loaded in as inf.
---------- EDIT ---------
After carefully reading the excel instructions:
Remarks from Office Support
"If an array or reference argument contains text, logical values, or empty cells, those values are ignored; however, cells with the value zero are included."
So when a NaN and Inf gets loaded into excel, they are treated as Strings(Text format) not numbers, and thus are ignored - this should explain why it worked on Excel.

Related

Output of unique function in Matlab

I am using the unique function in Matlab and I am confused about the output of such a function.
Consider the following simple code
rng default
T=randn(232,50); %232*50
equalorder=randsample(232,80802,true); %80802*1
T_extended=T(equalorder,:); %80802*50
By construction, I expect the size of T_extended to be 232. In fact,
S=size(unique(T_extended,'rows'),1); %232
Now, consider the specific T and equalorder function that are produced by some codes of mine (T and equalorder are upload here
https://filebin.net/603zn7mt2efzq91c
unfortunately my code is too long to be reproduced and I think that the issue may be numerical). Let's apply the code above to these arrays:
clear
load matrices %T, equalorder
T_extended=T(equalorder,:);
However, if I do
S=size(unique(T_extended,'rows'),1);
I get S=4694 and not S=232. Why?
The code or data necessary to reproduce the problem should be included in the question itself, as external links may stop working in the future. In this case, however, it was easy to identify the pattern that causes the problem (see below), so the question together with this answer should be self-contained.
In your linked example, T contains NaN at entry (216,37):
>> T(216,37)
ans =
NaN
(and this is the only such entry):
>> nnz(isnan(T))
ans =
1
By design, NaN values are not equal to each other. So when computing unique(T_extended, 'rows'), all rows of T_extended that correspond to the original 216-th row of T are counted as being different. This is what causes the count of unique rows to increase. If you don't consider the 37-the column (which is the only one that contains NaN) you get the expected result:
>> S=size(unique(T_extended(:,[1:36 38:end]),'rows'),1)
S =
232
Let's count how many times a NaN entry appears in T_extended:
>> nnz(isnan(T_extended))
ans =
4465
(Of course, this happens because):
>> sum(equalorder==216)
ans =
4465
This means that the count of unique rows is increased by 4465 - 1 when each repetition of the row containing NaN is counted as a different row. And 4465 - 1 + 232 is 4696, which is the result you get.

Is there any general way to remove NaNs from a matrix?

Is there any general way to remove NaNs from a matrix? Sometimes I come across this problem in the middle of some code and then it creates problems to get appropriate outputs. Is there any way to generate any kind of check to avoid NaNs arising in a MATLAB code? It will be really helpful if someone can kindly give me an example with some idea related to it.
You can detect nan values with the isnan function:
A = [1 NaN 3];
A(~isnan(A))
1 3
This actually removes nan values, however this is not always possible, e.g.
A = [1 nan; 2 3];
A(~isnan(A))
1
2
3
as you can see this destroys the matrix structure. You can avoid this by preallocating first and thereby setting the nan values to zero:
B = zeros(size(A));
B(~isnan(A))=A(~isnan(A))
B =
1 0
2 3
or, overwriting our original matrix A
A(isnan(A))=0
A =
1 0
2 3
There are several functions that work with NaNs: isnan, nanmean, max() and min() also have a NaN flag ('omitnan') whether you want to include NaNs in the min or max evaluation.
Although you must pay attention: sometimes the NaNs can be as well generated by your code (e.g. 0/0 or also when performing standardization (x-mean(x))/std(x) if x contains either 1 value or several but equal values).
You cannot avoid NaN since some computations produces it as a result. For example, if you compute 1/0-1/0 you will get NaN. You should deal with NaNs in the code level, using builtin functions like isnan.
Several situations that come up with a matrix A containing NaN values:
(1) Construct a new matrix where all rows with a NaN are removed.
row_mask = ~any(isnan(A),2);
A_nonans = A(row_mask,:);
(2) Construct a new matrix where all columns with a NaN are removed.
column_mask = ~any(isnan(A),1);
A_nonans = A(:, column_mask);
(3) Construct a new matrix where all NaN entries are replaced with 0.
A_nans_replaced = A;
A_nans_replaced(isnan(A_nans_replaced)) = 0;
Easy:
A=[1 2; nan 4];
A(isnan(A))=0;

Matlab programming code understanding

I have come across a matlab code that I am unable to understand. If anybody knows what this code means then help me in this regard.
Lambda(:,1) = [randi([1,4], 1,4), randi([1,30],1)*rand];
I do know that randi will return random integer between [min, max]. What I would like to know, that what lambda will receive? a row values, a column values or only a scalar value?
Well.. You could just run the code and see what happens:
[randi([1,4], 1,4), randi([1,30],1)*rand]
ans =
4.0000 2.0000 4.0000 1.0000 11.9046
So the answer will be: a row vector with 5 entries.
But let's look at it more detailed: With randi([1,4], 1,4) you create a row vector of size 1 x 4, containing random integers between [min,max], i.e. between 1 and 4. The second part similarly creates one integer in the range [1,30] and multiplies it by a random number from the interval (0,1).
With [x,y] you concatenate the two numbers or vectors. This leads to a row vector of size 1 x 5, as we saw in the beginning.
In the end you assign this to Lambda(:,1). As in MATLAB the first index is for rows and the second for columns, you select the first column of Lambda. You are thus trying to assign a 1 x 5 row vector to a 5 x 1 column vector. Luckily MATLAB is smart enough to handle that, so this snippet will work anyways. It would be a nicer and more clear solution, if you created a column vector instead of a row vector in the first place. That would be
Lambda(:,1) = [randi([1,4], 4,1); randi([1,30],1)*rand];

Matlab csvread: 450k+ columns

I am attempting to read a file with Matlab's csvread. It has 10 rows and ~900,000 columns. In the case of such a large number of columns, the function returns a vector, as opposed to a matrix of proper dimensions.
In order to test this, I truncated it to two sizes using cut, and when there are 457,000 columns we have the same behavior:
>> A = csvread( 'test.csv' );
>> size(A)
ans =
4570000 1
But when cut down to 45,700 columns, we have the desired behavior:
>> A = csvread( 'test.csv' );
>> size(A)
ans =
10 45700
Of course, Matlab is capable of handling matrices of the size 10x457,000, and I suppose I can use fscanf in a loop (I feel like this would be less inefficient?), but I was wondering if anyone had any insight.
EDIT: I suppose I could also just make the vector into a matrix of proper dimensions--but I still would like to understand this seemingly strange behavior of the matrix

Find all NaN elements inside an Array

Is there a command in MATLAB that allows me to find all NaN (Not-a-Number) elements inside an array?
As noted, the best answer is isnan() (though +1 for woodchips' meta-answer). A more complete example of how to use it with logical indexing:
>> a = [1 nan;nan 2]
a =
1 NaN
NaN 2
>> %replace nan's with 0's
>> a(isnan(a))=0
a =
1 0
0 2
isnan(a) returns a logical array, an array of true & false the same size as a, with "true" every place there is a nan, which can be used to index into a.
While isnan is the correct solution, I'll just point out the way to have found it. Use lookfor. When you don't know the name of a function in MATLAB, try lookfor.
lookfor nan
will quickly give you the names of some functions that work with NaNs, as well as giving you the first line of their help blocks. Here, it would have listed (among other things)
ISNAN True for Not-a-Number.
which is clearly the function you want to use.
I just found the answer:
k=find(isnan(yourarray))
k will be a list of NaN element indicies.