Is there a command in MATLAB that allows me to find all NaN (Not-a-Number) elements inside an array?
As noted, the best answer is isnan() (though +1 for woodchips' meta-answer). A more complete example of how to use it with logical indexing:
>> a = [1 nan;nan 2]
a =
1 NaN
NaN 2
>> %replace nan's with 0's
>> a(isnan(a))=0
a =
1 0
0 2
isnan(a) returns a logical array, an array of true & false the same size as a, with "true" every place there is a nan, which can be used to index into a.
While isnan is the correct solution, I'll just point out the way to have found it. Use lookfor. When you don't know the name of a function in MATLAB, try lookfor.
lookfor nan
will quickly give you the names of some functions that work with NaNs, as well as giving you the first line of their help blocks. Here, it would have listed (among other things)
ISNAN True for Not-a-Number.
which is clearly the function you want to use.
I just found the answer:
k=find(isnan(yourarray))
k will be a list of NaN element indicies.
Related
How can I add two matrices and keep only the numbers ignoring the NaN values?
for example:
A=[NaN 2 NaN];
B=[1 NaN 3];
I want some form of plus C=A+B such that:
C=[1 2 3]
You can achieve this without using any specific function call just by setting the NaNs to 0s and then performing the sum:
A(A~=A)=0
B(B~=B)=0
C=A+B
Edit: Another way of achieving this as #rayryeng suggested in the first comment is to use isnan:
A(isnan(A))=0
B(isnan(B))=0
C=A+B
You can use nansum (you need Statistics and Machine Learning Toolbox):
C = nansum([A;B])
and get:
C =
1 2 3
Alternatively, you can use sum with an excluding NaN flag:
C = sum([A;B],'omitnan')
And you will get the same result.
Is there any general way to remove NaNs from a matrix? Sometimes I come across this problem in the middle of some code and then it creates problems to get appropriate outputs. Is there any way to generate any kind of check to avoid NaNs arising in a MATLAB code? It will be really helpful if someone can kindly give me an example with some idea related to it.
You can detect nan values with the isnan function:
A = [1 NaN 3];
A(~isnan(A))
1 3
This actually removes nan values, however this is not always possible, e.g.
A = [1 nan; 2 3];
A(~isnan(A))
1
2
3
as you can see this destroys the matrix structure. You can avoid this by preallocating first and thereby setting the nan values to zero:
B = zeros(size(A));
B(~isnan(A))=A(~isnan(A))
B =
1 0
2 3
or, overwriting our original matrix A
A(isnan(A))=0
A =
1 0
2 3
There are several functions that work with NaNs: isnan, nanmean, max() and min() also have a NaN flag ('omitnan') whether you want to include NaNs in the min or max evaluation.
Although you must pay attention: sometimes the NaNs can be as well generated by your code (e.g. 0/0 or also when performing standardization (x-mean(x))/std(x) if x contains either 1 value or several but equal values).
You cannot avoid NaN since some computations produces it as a result. For example, if you compute 1/0-1/0 you will get NaN. You should deal with NaNs in the code level, using builtin functions like isnan.
Several situations that come up with a matrix A containing NaN values:
(1) Construct a new matrix where all rows with a NaN are removed.
row_mask = ~any(isnan(A),2);
A_nonans = A(row_mask,:);
(2) Construct a new matrix where all columns with a NaN are removed.
column_mask = ~any(isnan(A),1);
A_nonans = A(:, column_mask);
(3) Construct a new matrix where all NaN entries are replaced with 0.
A_nans_replaced = A;
A_nans_replaced(isnan(A_nans_replaced)) = 0;
Easy:
A=[1 2; nan 4];
A(isnan(A))=0;
When I run corrcoef to find correlation coefficients among two data arrays, I get NaNs. It only does that for one batch of data. Here is a download link to the data within .mat file.
I run this code
[R(1).R,R(1).P,R(1).RL,R(1).RU] = corrcoef([data.Series1], [data.Series2], 'rows', 'pairwise');
and it gives me
NaN NaN
NaN 1
for R, P, RL, and RU.
I don't think the NaNs in the data are the problem because I use 'pairwise' parameter for corrcoef function, which tells it to ignore NaNs.
I copied the same data into Microsoft Excel and it calculated the correlation coefficient just fine. Here is the Excel file with the coefficient of correlation calculated. Why doesn't corrcoef do it? What can possibly go wrong here?
I had to download this file and plug it in to see what happened.
Yes you are right that when treating the data with pairwise functionality, the pairs with anyone element = NaN are effectively removed from the operation;
BUT - what about INFs? In your [data.Series1] - you have INF entries, and that seems to be causing the problem.
I extracted your data series into 2 vectors A and B:
A = [data.Series1];
B = [data.Series2];
>> max (A)
ans =
Inf
Now by setting Inf to NaN:
A(isinf(A)) = NaN;
[R(1).R,R(1).P,R(1).RL,R(1).RU] = corrcoef(A,B, 'rows', 'pairwise');
>> R.RL
ans =
1.0000 -0.0794
-0.0794 1.0000
Discussions: Obviously INF will not work in MATLAB, but the question is why did it work for Excel? Did Excel turn Inf into NaN by default when using CORREL? Because the data certainly got loaded in as inf.
---------- EDIT ---------
After carefully reading the excel instructions:
Remarks from Office Support
"If an array or reference argument contains text, logical values, or empty cells, those values are ignored; however, cells with the value zero are included."
So when a NaN and Inf gets loaded into excel, they are treated as Strings(Text format) not numbers, and thus are ignored - this should explain why it worked on Excel.
I need a fast way in Matlab to do something like this (I am dealing with huge vectors, so a normal loop takes forever!):
from a vector like
[0 0 2 3 0 0 0 5 0 0 7 0]
I need to get this:
[NaN NaN 2 3 3 3 3 5 5 5 7 7]
Basically, each zero value is replaced with the value of the previous non-zero one. The first are NaN because there is no previous non-zero element
in the vector.
Try this, not sure about speed though. Got to run so explanation will have to come later if you need it:
interp1(1:nnz(A), A(A ~= 0), cumsum(A ~= 0), 'NearestNeighbor')
Try this (it uses the cummax function, introduced in R2014b):
i1 = x==0;
i2 = cummax((1:numel(x)).*~i1);
x(i1&i2) = x(i2(i3));
x(~i2) = NaN;
Just for reference, here are some similar/identical functions from exchange central and/or SO columns.
nearestpoint ,
try knnimpute function.
Or best of all, a function designed to do exactly your task:
repnan (obviously, first replace your zero values with NaN)
I had a similar problem once, and decided that the most effective way to deal with it is to write a mex file. The c++ loop is extremely trivial. After you'l figure out how to work with mex interface, it will be very easy.
I was looking to find the most efficient way to find the non zero minimum of a matrix and found this on a forum :
Let the data be a matrix A.
A(~A) = nan;
minNonZero = min(A);
This is very short and efficient (at least in number of code lines) but I don't understand what happens when we do this. I can't find any documentation about this since it's not an operation on matrices like +,-,\,... would be.
Could anyone explain me or give me a link or something that could help me understand what is done ?
Thank you !
It uses logical indexing
~ in Matlab is the not operator. When used on a double array, it finds all elements equal to zero. e.g.:
~[0 3 4 0]
Results in the logical matrix
[1 0 0 1]
i.e. it's a quick way to find all the zero elements
So if A = [0 3 4 0] then ~A = [1 0 0 1] so now A(~A) = A([1 0 0 1]). A([1 0 0 1]) uses logical indexing to only affect the elements that are true so in this case element 1 and element 4.
Finally A(~A) = NaN will replace all the elements in A that were equal to 0 with NaN which min ignores and thus you find the smallest non-zero element.
The code you provided:
A(~A) = NaN;
minNonZero = min(A);
Does the following:
Create a logical index
Apply the logical index on A
Change A, by assigning NaN values
Get the minimum of all values, while not including NaN values
Note that this leaves you with a changed A, which may be indesirable. But more importantly this has some inefficiencies as you spend time changing A and possibly even because you get the minimum of a large matrix.
Therefore you could speed things up (and even reduce one line) by doing:
minNonZero = min(A(logical(A)))
Basically you have now skipped step 3 and possibly reduced step 4.
Furthermore, you seem to get an additional small speedup by doing:
minNonZero = min(A(A~=0))
I don't have any good reason for this, but it seems like step 1 is now done more efficiently.