I have a matrix A:
NaN NaN NaN NaN NaN NaN NaN 10 1 8 7 2 5 6 2 3 49 NaN NaN NaN NaN NaN NaN
I was wondering if there was a way to detect when the NaNs first turn to numbers and turn the 1st 2 points to NaNs, such as the 10 & 1 both to NaN.
Then find when the numbers turn to NaNs and turn the last two number points, 3 and 49 to NaNs.
Originally I was thinking of using the following, but I was wondering if this was the best way:
i= 2;
while i < 1440
if isnan(A(i)) < isnan(A(i-1)) //Transitioning from NaN to numbers
A(i:i+2) = NaN;
i = i+ 4;
elseif isnan(A(i)) > isnan(A(i-1)) //Transitioning from numbers to NaNs
A(i-2:i) = NaN;
i = i + 1;
else
i = i + 1;
end
end
but was wondering if there was any other way I could optimize it?
First I assume that your vector A is organized with NaN's at the start and end and a continous set of numerics in the middle, as in
A = [NaN ... NaN, contiguous numeric data, NaN ... NaN]
First, I suggest locating the numeric data and working from there, as in,
flagNumeric = ~isnan(A);
Now flagNumeric will be a true for the entries that are numeric and false for NaN's.
So the first numeric will be at
firstIndex = find(flagNumeric,1,'first');
and the last numeric at
lastIndex = find(flagNumeric,1,'last');
You can then use firstIndex and lastIndex to change the first and last numeric data to NaN's
A(firstIndex:firstIndex+1) = NaN;
A(lastIndex-1:lastIndex) = NaN;
% Set the first two non-NaN numbers to NaN
first = find(isfinite(A), 1, 'first');
A(first:first+1) = NaN;
% Set the last two non-NaN numbers to NaN
last = find(isfinite(A), 1, 'last');
A(last-1:last) = NaN;
Of course, the above code will break in special cases (e.g. when last == 1), but these should be straightforward to filter out.
Here's a slightly simpler version based on the same assumption as Azim's answer:
nums = find(~isnan(A));
A( nums([1 2 end-1 end]) ) = NaN;
Related
Let's consider this code only for exemplification purpose:
A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];
dates = datetime({'2010','2011','2012','2013','2014'},'InputFormat','yyyy')';
TT = array2timetable(A,'RowTimes',dates);
The resulting timetable is:
I would like to use the matlab function fillmissing to impute missing data according to the following rules:
missing data at the beginning of the time series should not be
imputed
missing data at the end of the time series should not be
imputed
missing data within known values should be imputed only if
the number of missing values between known values is strictly minor
than 2
The resulting timetable should be:
Notice that only the 4th row in the column A2 has been imputed here. Can I do that with fillmissing? Otherwise how can I do that?
You can find the first and last non-NaN values using find. Based on these indicies, you can conditionally fill missing data if there are fewer than 2 missing values. For some vector v:
idxNaN = isnan( v ); % Get indicies of values which are NaN
idxDataStart = find( ~idxNaN, 1, 'first' ); % First NaN index
idxDataEnd = find( ~idxNaN, 1, 'last' ); % Last NaN index
idxData = idxDataStart:idxDataEnd; % Indices of valid data
numValsMissing = nnz( idxNaN(idxData) ); % Number of NaNs in valid data
if numValsMissing < 2 % Check for max number of NaNs
v(idxData) = fillmissing(v(idxData)); % Fill missing on this data
end
For your array A you can loop over the columns and apply the above, where each column is a vector v.
A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];
for ii = 1:size(A,2)
v = A(:,ii);
idxNaN = isnan( v );
idxDataStart = find( ~idxNaN, 1, 'first' );
idxDataEnd = find( ~idxNaN, 1, 'last' );
idxData = idxDataStart:idxDataEnd;
numValsMissing = nnz( idxNaN(idxData) );
if numValsMissing < 2
v(idxData) = fillmissing(v(idxData),'linear');
end
A(:,ii) = v;
end
I have three matlab matrices A, B, and C with the same size:
A = [1:3; 4:6; 7:9];
B = [2 NaN 5; NaN NaN 7; 0 1 NaN];
C = [3 NaN 2; 1 NaN NaN; 1 NaN 5];
%>> A = %>>B = %>>C =
% 1 2 3 % 2 NaN 5 % 3 NaN 2
% 4 5 6 % NaN NaN 7 % 1 NaN NaN
% 7 8 9 % 0 1 NaN % 1 NaN 5
I would like the three matrices to keep only values for which each of the 3 matrices does not have a NaN in that specific position. That is, I would like to obtain the following:
%>> A = %>>B = %>>C =
% 1 NaN 3 % 2 NaN 5 % 3 NaN 2
% NaN NaN NaN % NaN NaN NaN % NaN NaN NaN
% 7 NaN NaN % 0 NaN NaN % 1 NaN NaN
In my attempt, I'm stacking the three matrices along the third dimension of a new matrix ABC with size 3x3x3 and then I'm using a for loop to make sure all the three matrices do not have NaN in that specific position.
ABC(:,:,1)=A; ABC(:,:,2)=B; ABC(:,:,3)=C;
for i=1:size(A,1)
for j=1:size(A,2)
count = squeeze(ABC(i,j,:));
if sum(~isnan(count))<size(ABC,3)
A(i,j)=NaN;
B(i,j)=NaN;
C(i,j)=NaN;
end
end
end
This code works fine. However, as I have more than 30 matrices of bigger size I was wondering whether there is a more elegant solution to this problem.
Thank you for your help.
Lets do fancy indexing!
First, the solution:
indnan=sum(isnan(cat(3,A,B,C)),3)>0;
A(indnan)=NaN;
B(indnan)=NaN;
C(indnan)=NaN;
What this code does is essentially creates a 3D matrix, and computes how many NaN there are in each (i,j,:) arrays. Then, if there are more than 0 (i.e.any of them is NaN) it gets a logical index for it. Finally, we fill up all those with NaN, leaving only the non-NaN alive.
Ander’s answer is good, but for very large matrices it might be expensive to create that 3D matrix.
First of all, I would suggest putting the matrices into a cell array. That makes it a lot easier to programmatically manage many arrays. That is, instead of A, B, etc, work with C{1}, C{2}, etc:
C = {A,B,C};
It takes essentially zero cost to make this change.
Now, to find all elements where one of the matrices is NaN:
M = isnan(C{1});
for ii=2:numel(C)
M = M | isnan(C{ii});
end
A similar loop then sets the corresponding elements to NaN:
for ii=1:numel(C)
C{ii}(M) = NaN,
end
This latter loop can be replaced by a call to cellfun, but I like explicit loops.
EDIT: Here are some timings. This is yet another example of loops being faster in modern MATLAB than the equivalent vectorized code. Back in the old days, the loop code would have been 100x slower.
This is the test code:
function so(sz) % input argument is the size of the arrays
C3 = cell(1,3);
for ii=1:numel(C3)
C3{ii} = create(sz,0.05);
end
C20 = cell(1,20);
for ii=1:numel(C20)
C20{ii} = create(sz,0.01);
end
if(~isequal(method1(C3),method2(C3))), error('not equal!'), end
if(~isequal(method1(C20),method2(C20))), error('not equal!'), end
fprintf('method 1, 3 arrays: %f s\n',timeit(#()method1(C3)))
fprintf('method 2, 3 arrays: %f s\n',timeit(#()method2(C3)))
fprintf('method 1, 20 arrays: %f s\n',timeit(#()method1(C20)))
fprintf('method 2, 20 arrays: %f s\n',timeit(#()method2(C20)))
% method 1 is the vectorized code from Ander:
function mask = method1(C)
mask = sum(isnan(cat(3,C{:})),3)>0;
% method 2 is the loop code from this answer:
function mask = method2(C)
mask = isnan(C{1});
for ii=2:numel(C)
mask = mask | isnan(C{ii});
end
function mat = create(sz,p)
mat = rand(sz);
mat(mat<p) = nan;
These are the results on my machine (with R2017a):
>> so(500)
method 1, 3 arrays: 0.003215 s
method 2, 3 arrays: 0.000386 s
method 1, 20 arrays: 0.016503 s
method 2, 20 arrays: 0.001257 s
The loop is 10x faster! For small arrays I see much less of a difference, but the loop code is still several times faster, even for 5x5 arrays.
I have a matrix A and B as the following:
A = [1 NaN 3 4 5 NaN NaN 8 9 10];
B = [2 6 7];
Matrix B has the same size as there are NaN values in matrix A (so 3x1 in this case).
I would like to replace the NaN values in the same order as the values appear in B. So the output should look like:
C = [1 2 3 4 5 6 7 8 9 10];
I can replace the NaN, if both matrices have the same size. For T = 10 and N = 1, I would use:
for t=1:T
for i=1:N
if A == NaN
C(t,i) = B;
else
C(t,i) = A(t,i);
end
end
end
However, I would like to know whether I could compare these matrices and replace the values even if the matrices are of different size? Saying differently, if A = NaN take the first value of B. For the next A = NaN take the second value in B.
You can simply do:
A(find(isnan(A))) = B; % store the result of find(...) to keep track of NaN indices
isnan() is the proper way of determining whether a value is NaN (since NaN ~= NaN), while find() returns the indices of A where an element is NaN in this case.
As per #Adiel's suggestion, you can use logical indexing instead to more compactly achieve the same result, provided you don't need the indices of NaN elements later on:
A(isnan(A)) = B;
I have the 4x2 matrix A:
A = [2 NaN 5 8; 14 NaN 23 NaN]';
I want to replace the non-NaN values with their associated indices within each column in A. The output looks like this:
out = [1 NaN 3 4; 1 NaN 3 NaN]';
I know how to do it for each column manually, but I would like an automatic solution, as I have much larger matrices to handle. Anyone has any idea?
out = bsxfun(#times, A-A+1, (1:size(A,1)).');
How it works:
A-A+1 replaces actual numbers in A by 1, and keeps NaN as NaN
(1:size(A,1)).' is a column vector of row indices
bsxfun(#times, ...) multiplies both of the above with singleton expansion.
As pointed out by #thewaywewalk, in Matlab R2016 onwards bsxfun(#times...) can be replaced by .*, as singleton expansion is enabled by default:
out = (A-A+1) .* (1:size(A,1)).';
An alternative suggested by #Dev-Il is
out = bsxfun(#plus, A*0, (1:size(A,1)).');
This works because multiplying by 0 replaces actual numbers by 0, and keeps NaN as is.
Applying ind2sub to a mask created with isnan will do.
mask = find(~isnan(A));
[rows,~] = ind2sub(size(A),mask)
A(mask) = rows;
Note that the second output of ind2sub needs to be requested (but neglected with ~) as well [rows,~] to indicate you want the output for a 2D-matrix.
A =
1 1
NaN NaN
3 3
4 NaN
A.' =
1 NaN 3 4
1 NaN 3 NaN
Also be careful the with the two different transpose operators ' and .'.
Alternative
[n,m] = size(A);
B = ndgrid(1:n,1:m);
B(isnan(A)) = NaN;
or even (with a little inspiration by Luis Mendo)
[n,m] = size(A);
B = A-A + ndgrid(1:n,1:m)
or in one line
B = A-A + ndgrid(1:size(A,1),1:size(A,2))
This can be done using repmat and isnan as follows:
A = [ 2 NaN 5 8;
14 NaN 23 NaN];
out=repmat([1:size(A,2)],size(A,1),1); % out contains indexes of all the values
out(isnan(A))= NaN % Replacing the indexes where NaN exists with NaN
Output:
1 NaN 3 4
1 NaN 3 NaN
You can take the transpose if you want.
I'm adding another answer for a couple of reasons:
Because overkill (*ahem* kron *ahem*) is fun.
To demonstrate that A*0 does the same as A-A.
A = [2 NaN 5 8; 14 NaN 23 NaN].';
out = A*0 + kron((1:size(A,1)).', ones(1,size(A,2)))
out =
1 1
NaN NaN
3 3
4 NaN
I have a vector of values such as the following:
1
2
3
NaN
4
7
NaN
NaN
54
5
2
7
2
NaN
NaN
NaN
5
54
3
2
NaN
NaN
NaN
NaN
4
NaN
How can I use
interp1
in such way that only a variable amount of consecutive NaN-values would be interpolated? That is for example I would want to interpolate only those NaN-values where there are at most three consecutive NaN-values. So NaN, NaN NaN and NaN NaN NaN would be interpolated but not NaN NaN NaN NaN.
Thank you for any help =)
P.S. If I can't do this with interp1, any ideas how to do this in another way? =)
To give an example, the vector I gave would become:
1
2
3
interpolated
4
7
interpolated
interpolated
54
5
2
7
2
interpolated
interpolated
interpolated
5
54
3
2
NaN
NaN
NaN
NaN
4
interpolated
First of all, find the positions and lengths of all sequences of NaN values:
nan_idx = isnan(x(:))';
nan_start = strfind([0, nan_idx], [0 1]);
nan_len = strfind([nan_idx, 0], [1 0]) - nan_start + 1;
Next, find the indices of the NaN elements not to interpolate:
thr = 3;
nan_start = nan_start(nan_len > thr);
nan_end = nan_start + nan_len(nan_len > thr) - 1;
idx = cell2mat(arrayfun(#colon, nan_start, nan_end, 'UniformOutput', false));
Now, interpolate everything and replace the elements that shouldn't have been interpolated back with NaN values:
x_new = interp1(find(~nan_idx), x(~nan_idx), 1:numel(x));
x_new(idx) = NaN;
I know this is an bad habit in matlab, but I would think this particular case requires a loop:
function out = f(v)
out = zeros(numel(v));
k = 0;
for i = 1:numel(v)
if v(i) ~= NaN
if k > 3
out(i-k:i - 1) = ones(1, k) * NaN;
else
out(i-k: i - 1) = interp1();%TODO: call interp1 with right params
end
out(i) = v(i)
k = 0
else
k = k + 1 % number of consecutive NaN value encoutered so far
end
end