I want to change [1 nan 1 2 2 nan nan 3 nan 4 nan nan 5] into [1 1.5 1 2 2 2 3 3 3.5 4 4 5 5]. If there is a single NAN, I want the NAN to be filled in with the average of the numbers before and after. If there is more than one NAN.
I want the NAN to be filled in with the nearest number.
So far, I only have the code to find the single NAN's:
max_x = x(:, 2);
min_x = x(:, 3);
for jj = 1:length(max_x)
for kk = 1:length(min_x)
if isnan(max_x(jj))
max_x (jj) = ((max_x(jj-1)+max_x(jj+1))/2);
elseif isnan (min_x(kk))
min_x (kk) = ((min_x(kk-1)+min_x(kk+1))/2);
end
end
end
How do I fill in the NAN's that aren't single?
Much thanks.
The title of this question is also nearly the answer - Fill in missing values using fillmissing.
A = [1 nan 1 2 2 nan nan 3 nan 4 nan nan 5];
B = fillmissing(A,'linear');
This function was introduced in R2016b.
The same logic can be implemented using interp1 and isnan.
idx = ~isnan( A );
x = 1:numel(A);
B = interp1( x(idx), A(idx), x, 'linear', 'extrap' );
Note that the extrapolation here gives slightly different behaviour for NaN values at each end of the input vectors.
A sample code:
% To paste in main .m file
A = [1 nan 1 2 2 nan nan 3 nan 4 nan nan 5]; % Input array
[A] = new_array(A) % Function to get a new array
% To paste in individual .m file as function
function [x]= new_array(x)
is_nan_ar = isnan(x); % Getting 0/1 array of nan elements
array_l = length(x); % Getting length of x array (just to do it only once)
for k = 1:array_l % Checking every element of input array whether it's...
if (k==1) && (is_nan_ar(k)==1) % First element and nan
kk = 2; % Initial index for searching the nearest non-nan element
while (isnan(is_nan_ar(kk))==1) % Checking elements for being nan
kk=kk+1; % Increasing index while we're searching
end
x(k) = x(kk); % Writing down the first not nan element
elseif (k==array_l) && (is_nan_ar(k)==1) % The same search for the last
kk = array_l-1; % Intial index
while (isnan(is_nan_ar(kk))==1) % Reversed search for not not nan
kk=kk-1;
end
x(k) = x(kk); % Writing down what we found
elseif (is_nan_ar(k)==1) % When we're checking not the first and not the last
s_r = 1; % Search range (1 element to the left/right)
while (is_nan_ar(k-s_r)==1) && (is_nan_ar(k+s_r)==1) %Looking for not nan
s_r = s_r+1; % Increasment of the range if didn't find
end
if (is_nan_ar(k-s_r)==0) && (is_nan_ar(k+s_r)==0) % Two non-nans are near
x(k) = (x(k-s_r)+x(k+s_r))/2;
elseif (is_nan_ar(k-s_r)==0) % Only one non-nan on the left
x(k) = x(k-s_r);
else % Only one non-nan on the right
x(k) = x(k+s_r);
end
end
end
end
Related
Let's consider this code only for exemplification purpose:
A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];
dates = datetime({'2010','2011','2012','2013','2014'},'InputFormat','yyyy')';
TT = array2timetable(A,'RowTimes',dates);
The resulting timetable is:
I would like to use the matlab function fillmissing to impute missing data according to the following rules:
missing data at the beginning of the time series should not be
imputed
missing data at the end of the time series should not be
imputed
missing data within known values should be imputed only if
the number of missing values between known values is strictly minor
than 2
The resulting timetable should be:
Notice that only the 4th row in the column A2 has been imputed here. Can I do that with fillmissing? Otherwise how can I do that?
You can find the first and last non-NaN values using find. Based on these indicies, you can conditionally fill missing data if there are fewer than 2 missing values. For some vector v:
idxNaN = isnan( v ); % Get indicies of values which are NaN
idxDataStart = find( ~idxNaN, 1, 'first' ); % First NaN index
idxDataEnd = find( ~idxNaN, 1, 'last' ); % Last NaN index
idxData = idxDataStart:idxDataEnd; % Indices of valid data
numValsMissing = nnz( idxNaN(idxData) ); % Number of NaNs in valid data
if numValsMissing < 2 % Check for max number of NaNs
v(idxData) = fillmissing(v(idxData)); % Fill missing on this data
end
For your array A you can loop over the columns and apply the above, where each column is a vector v.
A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];
for ii = 1:size(A,2)
v = A(:,ii);
idxNaN = isnan( v );
idxDataStart = find( ~idxNaN, 1, 'first' );
idxDataEnd = find( ~idxNaN, 1, 'last' );
idxData = idxDataStart:idxDataEnd;
numValsMissing = nnz( idxNaN(idxData) );
if numValsMissing < 2
v(idxData) = fillmissing(v(idxData),'linear');
end
A(:,ii) = v;
end
Is there a way to do the following?
I would like to turn a MATLAB array:
>> (1:10)'
ans =
1
2
3
4
5
6
7
8
9
10
Into the following sequential matrix (I am not sure what is the name for this):
ans =
NaN NaN NaN 1
NaN NaN NaN 2
NaN NaN NaN 3
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
7 8 9 10
I am able to do this, with the following function, but it iterates over each row and it is slow:
function V = vec2stack(X, num_sequences)
n = numel(X);
len_out = n - num_sequences + 1;
V = zeros(len_out,num_sequences);
for kk = 1:len_out
V(kk,:) = X(kk:kk +num_sequences - 1)';
end
V = [nan(num_sequences,num_sequences); V(1:end-1,:)];
end
X = 1:10;
AA = vec2stack(X,3)
Running the above function for a vector of length 1,000,000 takes about 1 second, which is not fast enough for my purposes,
tic;
Lag = vec2stack(1:1000000,5);
toc;
Elapsed time is 1.217854 seconds.
You can use repmat() to repeat the X vector horizontally. Then, notice that each column is one more than the previous. You can add a row-vector that has the same number of columns as the matrix, and Matlab will broadcast the vector onto the entire matrix and do the addition for you.
In older versions of Matlab, you might need to explicitly repmat() the row vector to get the shapes to match.
function V = vec2stack(X, num_sequences)
% Repeat X, 1 time vertically, num_seq times horizontally
% X(:) ensures it's a column vector so we have the correct shape
mat = repmat(X(:), 1, num_sequences);
% make a row vector from 0:num_sequences-1
vec = 0:(num_sequences-1);
% or explicitly repmat on vec if you need to:
% vec = repmat(0:(num_sequences-1), numel(X), 1);
% Add the two. Matlab broadcasts the row vector onto the matrix
% Because they have the same number of columns
mat = mat + vec;
% Build the return matrix
V = [nan(num_sequences, num_sequences); mat];
end
X = (1:10)';
AA = vec2stack(X,3)
% You can easily add the last column as another column
Testing speedon Octave Online:
%% Loopy version
tic;
Lag = vec2stack(1:1000000,5);
toc;
Elapsed time is 17.4092 seconds
%% Vectorized version
tic;
Lag = vec2stack((1:1000000)',5);
toc;
Elapsed time is 0.110762 seconds.
~150x speedup. Pretty cool!
I have three matlab matrices A, B, and C with the same size:
A = [1:3; 4:6; 7:9];
B = [2 NaN 5; NaN NaN 7; 0 1 NaN];
C = [3 NaN 2; 1 NaN NaN; 1 NaN 5];
%>> A = %>>B = %>>C =
% 1 2 3 % 2 NaN 5 % 3 NaN 2
% 4 5 6 % NaN NaN 7 % 1 NaN NaN
% 7 8 9 % 0 1 NaN % 1 NaN 5
I would like the three matrices to keep only values for which each of the 3 matrices does not have a NaN in that specific position. That is, I would like to obtain the following:
%>> A = %>>B = %>>C =
% 1 NaN 3 % 2 NaN 5 % 3 NaN 2
% NaN NaN NaN % NaN NaN NaN % NaN NaN NaN
% 7 NaN NaN % 0 NaN NaN % 1 NaN NaN
In my attempt, I'm stacking the three matrices along the third dimension of a new matrix ABC with size 3x3x3 and then I'm using a for loop to make sure all the three matrices do not have NaN in that specific position.
ABC(:,:,1)=A; ABC(:,:,2)=B; ABC(:,:,3)=C;
for i=1:size(A,1)
for j=1:size(A,2)
count = squeeze(ABC(i,j,:));
if sum(~isnan(count))<size(ABC,3)
A(i,j)=NaN;
B(i,j)=NaN;
C(i,j)=NaN;
end
end
end
This code works fine. However, as I have more than 30 matrices of bigger size I was wondering whether there is a more elegant solution to this problem.
Thank you for your help.
Lets do fancy indexing!
First, the solution:
indnan=sum(isnan(cat(3,A,B,C)),3)>0;
A(indnan)=NaN;
B(indnan)=NaN;
C(indnan)=NaN;
What this code does is essentially creates a 3D matrix, and computes how many NaN there are in each (i,j,:) arrays. Then, if there are more than 0 (i.e.any of them is NaN) it gets a logical index for it. Finally, we fill up all those with NaN, leaving only the non-NaN alive.
Ander’s answer is good, but for very large matrices it might be expensive to create that 3D matrix.
First of all, I would suggest putting the matrices into a cell array. That makes it a lot easier to programmatically manage many arrays. That is, instead of A, B, etc, work with C{1}, C{2}, etc:
C = {A,B,C};
It takes essentially zero cost to make this change.
Now, to find all elements where one of the matrices is NaN:
M = isnan(C{1});
for ii=2:numel(C)
M = M | isnan(C{ii});
end
A similar loop then sets the corresponding elements to NaN:
for ii=1:numel(C)
C{ii}(M) = NaN,
end
This latter loop can be replaced by a call to cellfun, but I like explicit loops.
EDIT: Here are some timings. This is yet another example of loops being faster in modern MATLAB than the equivalent vectorized code. Back in the old days, the loop code would have been 100x slower.
This is the test code:
function so(sz) % input argument is the size of the arrays
C3 = cell(1,3);
for ii=1:numel(C3)
C3{ii} = create(sz,0.05);
end
C20 = cell(1,20);
for ii=1:numel(C20)
C20{ii} = create(sz,0.01);
end
if(~isequal(method1(C3),method2(C3))), error('not equal!'), end
if(~isequal(method1(C20),method2(C20))), error('not equal!'), end
fprintf('method 1, 3 arrays: %f s\n',timeit(#()method1(C3)))
fprintf('method 2, 3 arrays: %f s\n',timeit(#()method2(C3)))
fprintf('method 1, 20 arrays: %f s\n',timeit(#()method1(C20)))
fprintf('method 2, 20 arrays: %f s\n',timeit(#()method2(C20)))
% method 1 is the vectorized code from Ander:
function mask = method1(C)
mask = sum(isnan(cat(3,C{:})),3)>0;
% method 2 is the loop code from this answer:
function mask = method2(C)
mask = isnan(C{1});
for ii=2:numel(C)
mask = mask | isnan(C{ii});
end
function mat = create(sz,p)
mat = rand(sz);
mat(mat<p) = nan;
These are the results on my machine (with R2017a):
>> so(500)
method 1, 3 arrays: 0.003215 s
method 2, 3 arrays: 0.000386 s
method 1, 20 arrays: 0.016503 s
method 2, 20 arrays: 0.001257 s
The loop is 10x faster! For small arrays I see much less of a difference, but the loop code is still several times faster, even for 5x5 arrays.
I have a vector of values such as the following:
1
2
3
NaN
4
7
NaN
NaN
54
5
2
7
2
NaN
NaN
NaN
5
54
3
2
NaN
NaN
NaN
NaN
4
NaN
How can I use
interp1
in such way that only a variable amount of consecutive NaN-values would be interpolated? That is for example I would want to interpolate only those NaN-values where there are at most three consecutive NaN-values. So NaN, NaN NaN and NaN NaN NaN would be interpolated but not NaN NaN NaN NaN.
Thank you for any help =)
P.S. If I can't do this with interp1, any ideas how to do this in another way? =)
To give an example, the vector I gave would become:
1
2
3
interpolated
4
7
interpolated
interpolated
54
5
2
7
2
interpolated
interpolated
interpolated
5
54
3
2
NaN
NaN
NaN
NaN
4
interpolated
First of all, find the positions and lengths of all sequences of NaN values:
nan_idx = isnan(x(:))';
nan_start = strfind([0, nan_idx], [0 1]);
nan_len = strfind([nan_idx, 0], [1 0]) - nan_start + 1;
Next, find the indices of the NaN elements not to interpolate:
thr = 3;
nan_start = nan_start(nan_len > thr);
nan_end = nan_start + nan_len(nan_len > thr) - 1;
idx = cell2mat(arrayfun(#colon, nan_start, nan_end, 'UniformOutput', false));
Now, interpolate everything and replace the elements that shouldn't have been interpolated back with NaN values:
x_new = interp1(find(~nan_idx), x(~nan_idx), 1:numel(x));
x_new(idx) = NaN;
I know this is an bad habit in matlab, but I would think this particular case requires a loop:
function out = f(v)
out = zeros(numel(v));
k = 0;
for i = 1:numel(v)
if v(i) ~= NaN
if k > 3
out(i-k:i - 1) = ones(1, k) * NaN;
else
out(i-k: i - 1) = interp1();%TODO: call interp1 with right params
end
out(i) = v(i)
k = 0
else
k = k + 1 % number of consecutive NaN value encoutered so far
end
end
I have a matrix A:
NaN NaN NaN NaN NaN NaN NaN 10 1 8 7 2 5 6 2 3 49 NaN NaN NaN NaN NaN NaN
I was wondering if there was a way to detect when the NaNs first turn to numbers and turn the 1st 2 points to NaNs, such as the 10 & 1 both to NaN.
Then find when the numbers turn to NaNs and turn the last two number points, 3 and 49 to NaNs.
Originally I was thinking of using the following, but I was wondering if this was the best way:
i= 2;
while i < 1440
if isnan(A(i)) < isnan(A(i-1)) //Transitioning from NaN to numbers
A(i:i+2) = NaN;
i = i+ 4;
elseif isnan(A(i)) > isnan(A(i-1)) //Transitioning from numbers to NaNs
A(i-2:i) = NaN;
i = i + 1;
else
i = i + 1;
end
end
but was wondering if there was any other way I could optimize it?
First I assume that your vector A is organized with NaN's at the start and end and a continous set of numerics in the middle, as in
A = [NaN ... NaN, contiguous numeric data, NaN ... NaN]
First, I suggest locating the numeric data and working from there, as in,
flagNumeric = ~isnan(A);
Now flagNumeric will be a true for the entries that are numeric and false for NaN's.
So the first numeric will be at
firstIndex = find(flagNumeric,1,'first');
and the last numeric at
lastIndex = find(flagNumeric,1,'last');
You can then use firstIndex and lastIndex to change the first and last numeric data to NaN's
A(firstIndex:firstIndex+1) = NaN;
A(lastIndex-1:lastIndex) = NaN;
% Set the first two non-NaN numbers to NaN
first = find(isfinite(A), 1, 'first');
A(first:first+1) = NaN;
% Set the last two non-NaN numbers to NaN
last = find(isfinite(A), 1, 'last');
A(last-1:last) = NaN;
Of course, the above code will break in special cases (e.g. when last == 1), but these should be straightforward to filter out.
Here's a slightly simpler version based on the same assumption as Azim's answer:
nums = find(~isnan(A));
A( nums([1 2 end-1 end]) ) = NaN;