Finding the NaN boundary of a matrix in MATLAB

Finding the NaN boundary of a matrix in MATLAB - matlab

I have a very large (2019x1678 double) DEM (digital elevation model) file put as a matrix in MATLAB. The edges of it contain NaN values. In order to account for edge effects in my code, I have to put a 1 cell buffer (same value as adjacent cell) around my DEM. Where NaNs are present, I need to find the edge of the NaN values in order to build that buffer. I have tried doing this two ways:
In the first I get the row and column coordinates all non-NaN DEM values, and find the first and last row numbers for each column to get the north and south boundaries, then find the first and last column numbers for each row to get the east and west boundaries. I use these in the sub2ind() to create my buffer.
[r, c] = find(~isnan(Zb_ext)); %Zb is my DEM matrix
idx = accumarray(c, r, [], #(x) {[min(x) max(x)]});
idx = vertcat(idx{:});
NorthBoundary_row = transpose(idx(:,1)); % the values to fill my buffer with
NorthBoundary_row_ext = transpose(idx(:,1) - 1); % My buffer cells
columnmax = length(NorthBoundary_row);
column1 = min(c);
Boundary_Colu = linspace(column1,column1+columnmax-1,columnmax);
SouthBoundary_row = (transpose(idx(:,2))); % Repeat for south Boundary
SouthBoundary_row_ext = transpose(idx(:,2) + 1);
SouthB_Ind = sub2ind(size(Zb_ext),SouthBoundary_row,Boundary_Colu);
SouthB_Ind_ext = sub2ind(size(Zb_ext),SouthBoundary_row_ext, Boundary_Colu);
NorthB_Ind = sub2ind(size(Zb_ext),NorthBoundary_row, Boundary_Colu);
NorthB_Ind_ext = sub2ind(size(Zb_ext),NorthBoundary_row_ext, Boundary_Colu);
Zb_ext(NorthB_Ind_ext) = Zb_ext(NorthB_Ind);
Zb_ext(SouthB_Ind_ext) = Zb_ext(SouthB_Ind);
% Repeat above for East and West Boundary by reversing the roles of row and
% column
[r, c] = find(~isnan(Zb_ext));
idx = accumarray(r, c, [], #(x) {[min(x) max(x)]});
idx = vertcat(idx{:});
EastBoundary_colu = transpose(idx(:,1)); % Repeat for east Boundary
EastBoundary_colu_ext = transpose(idx(:,1) - 1);
row1 = min(r);
rowmax = length(EastBoundary_colu);
Boundary_row = linspace(row1,row1+rowmax-1,rowmax);
WestBoundary_colu = transpose(idx(:,2)); % Repeat for west Boundary
WestBoundary_colu_ext = transpose(idx(:,2) + 1);
EastB_Ind = sub2ind(size(Zb_ext),Boundary_row, EastBoundary_colu);
EastB_Ind_ext = sub2ind(size(Zb_ext),Boundary_row, EastBoundary_colu_ext);
WestB_Ind = sub2ind(size(Zb_ext),Boundary_row, WestBoundary_colu);
WestB_Ind_ext = sub2ind(size(Zb_ext),Boundary_row, WestBoundary_colu_ext);
Zb_ext(NorthB_Ind_ext) = Zb_ext(NorthB_Ind);
Zb_ext(SouthB_Ind_ext) = Zb_ext(SouthB_Ind);
Zb_ext(EastB_Ind_ext) = Zb_ext(EastB_Ind);
Zb_ext(WestB_Ind_ext) = Zb_ext(WestB_Ind);
This works well on my small development matrix, but fails on my full sized DEM. I do not understand the behavior of my code, but looking at the data there are gaps in my boundary. I wonder if I need to better control the order of max/min row/column values, though in my test on a smaller dataset, all seemed in order....
The second method I got from a similar question to this and basically uses a dilation method. However, when I transition to my full dataset, it takes hours to calculate ZbDilated. Although my first method does not work, it at least calculates within seconds.
[m, n] = size(Zb); %
Zb_ext = nan(size(Zb)+2);
Zb_ext(2:end-1, 2:end-1) = Zb; % pad Zb with zeroes on each side
ZbNANs = ~isnan(Zb_ext);
ZbDilated = zeros(m + 2, n + 2); % this will hold the dilated shape.
for i = 1:(m+2)
if i == 1 %handling boundary situations during dilation
i_f = i;
i_l = i+1;
elseif i == m+2
i_f = i-1;
i_l = i;
else
i_f = i-1;
i_l = i+1;
end
for j = 1:(n+2)
mask = zeros(size(ZbNANs));
if j == 1 %handling boundary situations again
j_f = j;
j_l = j+1;
elseif j == n+2
j_f = j-1;
j_l = j;
else
j_f = j-1;
j_l = j+1;
end
mask(i_f:i_l, j_f:j_l) = 1; % this places a 3x3 square of 1's around (i, j)
ZbDilated(i, j) = max(ZbNANs(logical(mask)));
end
end
Zb_ext(logical(ZbDilated)) = fillmissing(Zb_ext(logical(ZbDilated)),'nearest');
Does anyone have any ideas on making either of these usable?
Here is what I start out with:
NaN NaN 2 5 39 55 44 8 NaN NaN
NaN NaN NaN 7 33 48 31 66 17 NaN
NaN NaN NaN 28 NaN 89 NaN NaN NaN NaN
Here is the matrix buffered on the limits with NaNs:
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN 2 5 39 55 44 8 NaN NaN NaN
NaN NaN NaN NaN 7 33 48 31 66 17 NaN NaN
NaN NaN NaN NaN 28 NaN 89 NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Here is what I want to get after using fillmissing (though I have noticed some irregularities with how buffer values are filled...):
NaN NaN 2 2 5 39 55 44 8 17 NaN NaN
NaN NaN 2 2 5 39 55 44 8 17 17 NaN
NaN NaN 2 2 7 33 48 31 66 17 17 NaN
NaN NaN NaN 2 28 33 89 31 66 17 17 NaN
NaN NaN NaN 5 28 55 89 8 NaN NaN NaN NaN
To try and clear up any confusion about what I am doing, here is the logical I get from dilation I use for fillmissing
0 0 1 1 1 1 1 1 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 0
0 0 1 1 1 1 1 1 1 1 1 0
0 0 0 1 1 1 1 1 1 1 1 0
0 0 0 1 1 1 1 1 0 0 0 0

A faster way to apply a 3x3 dilation would be as follows. This does involve some large intermediate matrices, which make it less efficient than, say applying imdilate.
[m, n] = size(Zb); %
Zb_ext = nan(size(Zb)+2);
Zb_ext(2:end-1, 2:end-1) = Zb; % pad A with zeroes on each side
ZbNANs = ~isnan(Zb_ext);
ZbDilated = ZbNANs; % this will hold the dilated shape.
% up and down neighbors
ZbDilated(2:end, :) = max(ZbDilated(2:end, :), ZbNANs(1:end-1, :));
ZbDilated(1:end-1, :) = max(ZbDilated(1:end-1, :), ZbNANs(2:end, :));
% left and right neighbors
ZbDilated(:, 2:end) = max(ZbDilated(:, 2:end), ZbNANs(:, 1:end-1));
ZbDilated(:, 1:end-1) = max(ZbDilated(:, 1:end-1), ZbNANs(:, 2:end));
% and 4 diagonal neighbors
ZbDilated(2:end, 2:end) = max(ZbDilated(2:end, 2:end), ZbNANs(1:end-1, 1:end-1));
ZbDilated(1:end-1, 2:end) = max(ZbDilated(1:end-1, 2:end), ZbNANs(2:end, 1:end-1));
ZbDilated(2:end, 1:end-1) = max(ZbDilated(2:end, 1:end-1), ZbNANs(1:end-1, 2:end));
ZbDilated(1:end-1, 1:end-1) = max(ZbDilated(1:end-1, 1:end-1), ZbNANs(2:end, 2:end));
This is a tedious way to write it, I'm sure there's a loop that can be written that is shorter, but this I think makes the intention clearer.
[Edit: Because we're dealing with a logical array here, instead of max(A,B) we could also do A | B. I'm not sure if there would be any difference in time.]
What #beaker said in a comment was to not use
mask = zeros(size(ZbNANs));
mask(i_f:i_l, j_f:j_l) = 1; % this places a 3x3 square of 1's around (i, j)
ZbDilated(i, j) = max(ZbNANs(logical(mask)));
but rather do
ZbDilated(i, j) = max(ZbNANs(i_f:i_l, j_f:j_l), [], 'all');
[Edit: Because we're dealing with a logical array here, instead of max(A,[],'all') we could also do any(A,'all'), which should be faster. See #beaker's other comment.]

Related

How to make Matlab fillmissing function impute only a certain number of missing values between known values?

Let's consider this code only for exemplification purpose:
A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];
dates = datetime({'2010','2011','2012','2013','2014'},'InputFormat','yyyy')';
TT = array2timetable(A,'RowTimes',dates);
The resulting timetable is:
I would like to use the matlab function fillmissing to impute missing data according to the following rules:
missing data at the beginning of the time series should not be
imputed
missing data at the end of the time series should not be
imputed
missing data within known values should be imputed only if
the number of missing values between known values is strictly minor
than 2
The resulting timetable should be:
Notice that only the 4th row in the column A2 has been imputed here. Can I do that with fillmissing? Otherwise how can I do that?

You can find the first and last non-NaN values using find. Based on these indicies, you can conditionally fill missing data if there are fewer than 2 missing values. For some vector v:
idxNaN = isnan( v ); % Get indicies of values which are NaN
idxDataStart = find( ~idxNaN, 1, 'first' ); % First NaN index
idxDataEnd = find( ~idxNaN, 1, 'last' ); % Last NaN index
idxData = idxDataStart:idxDataEnd; % Indices of valid data
numValsMissing = nnz( idxNaN(idxData) ); % Number of NaNs in valid data
if numValsMissing < 2 % Check for max number of NaNs
v(idxData) = fillmissing(v(idxData)); % Fill missing on this data
end
For your array A you can loop over the columns and apply the above, where each column is a vector v.
A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];
for ii = 1:size(A,2)
v = A(:,ii);
idxNaN = isnan( v );
idxDataStart = find( ~idxNaN, 1, 'first' );
idxDataEnd = find( ~idxNaN, 1, 'last' );
idxData = idxDataStart:idxDataEnd;
numValsMissing = nnz( idxNaN(idxData) );
if numValsMissing < 2
v(idxData) = fillmissing(v(idxData),'linear');
end
A(:,ii) = v;
end

histogram of signals gaps width (Matlab)

I am looking for algorithm (effective + vectorized) how to find histogram of gaps (NaN) width in the following manner:
signals are represented by (Nsamples x Nsig) array
gaps in signal are encoded by NaN's
width of gaps: is number of consecutive NaN's in the signal
gaps width histogram: is frequency of gaps with specific widths in signals
And the following conditions are fullfilled:
[Nsamples,Nsig ]= size(signals)
isequal(size(signals),size(gapwidthhist)) % true
isequal(sum(gapwidthhist.*(1:Nsamples)',1),sum(isnan(signals),1)) % true
Of course, compressed form of gapwidthhist (represented by two cells: "gapwidthhist_compressed_widths" and "gapwidthhist_compressed_freqs") is required too.
Example:
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN]' % signal No. 2
gapwidthhist = [1 1 1 1 0 0 0 0 0 0 0 0 0 0; % gap histogram for signal No. 1
3 1 0 0 1 0 0 0 0 0 0 0 0 0]' % gap histogram for signal No. 2
where integer histogram bins (gap widths) are 1:Nsamples (Nsamples=14).
Coresponding compressed gap histogram looks like:
gapwidthhist_compressed_widths = cell(1,Nsig)
gapwidthhist_compressed_widths =
1×2 cell array
{[1 2 3 4]} {[1 2 5]}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
gapwidthhist_compressed_freqs = cell(1, Nsig)
gapwidthhist_compressed_freqs =
1×2 cell array
{[1 1 1 1]} {[3 1 1]}
Typical problem dimension:
Nsamples = 1e5 - 1e6
Nsig = 1e2 - 1e3.
Thanks in advance for any help.
Added remark: My so far best solution is the following code:
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]' % signal No. 3
[numData, numSignals] = size(signals)
gapwidthhist = zeros(numData, numSignals);
for column = 1 : numSignals
thisSignal = signals(:, column); % Extract this column.
% Find lengths of all NAN runs
props = regionprops(isnan(thisSignal), 'Area');
allLengths = [props.Area]
edges = [1:max(allLengths), inf]
hc = histcounts(allLengths, edges)
% Load up gapwidthhist
for k2 = 1 : length(hc)
gapwidthhist(k2, column) = hc(k2);
end
end
% What it is:
gapwidthhist'
But I am looking mainly for pure Matlab code without any built-in matlab functions (like "regionprops" from Image Processing Toolbox)!!!

This is much more simple Matlab implementation but still not optimal (+ not vectorized):
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]'; % signal No. 3
signals
[numData, numSignals] = size(signals);
gapwidthhist = zeros(numData, numSignals);
gaps = zeros(numData+1,numSignals);
auxnan = isnan(signals);
for i = 1:numSignals
c = 0;
for j = 1:numData
if auxnan(j,i)
c = c + 1;
else
gaps(j,i) = c;
c = 0;
end
end
gaps(numData+1,i) = c;
gapwidthhist(:,i) = histcounts(gaps(:,i),1:numData+1);
end
gapwidthhist
Thanks to #breaker for help.
Any idea how to optimize (vectorize) this code to be more effective?

Here is a slightly more vectorized version that may be a bit quicker. I use Octave, so I don't know how much MATLAB's JIT compiler will optimize the inner loop in the other approach.
% Set up the data
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]'; % signal No. 3
signals
[numData, numSignals] = size(signals);
gapwidthhist = zeros(numData, numSignals);
gaps = zeros(numData+1,numSignals);
auxnan = ~isnan(signals); % We want non-NaN values to be 1
for i = 1:numSignals
difflist = diff(find([1; auxnan(:,i); 1])) - 1; % get the gap lengths
gapList = difflist(find(difflist)); % keep only the non-zero gaps
for c = gapList.' % need row vector to loop over elements
gapwidthhist(c,i) = gapwidthhist(c,i) + 1; % each gap length increments the histogram
end
end
gapwidthhist
Here's the program flow:
First, negate the auxnan array so that NaN is 0 and non-NaN is 1.
In the outer loop, pad each column with 1's on top and bottom to capture strings of NaN at the beginning and end of the signal.
Use find to get the indices of the 1 (non-NaN) elements.
Take the diff of the indices.
A diff of 1 means no gap and a diff greater than 1 gives the length of the gap plus 1, so subtract 1 from the diff result.
Use the results (indices) of find to get the values of the nonzero elements. These are the gap widths.
Now loop through the values and accumulate the results in the histogram. You might try replacing this inner loop with accumarray to see if that speeds things up any.

May be final solution:
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]'; % signal No. 3
signals
[numData, numSignals] = size(signals);
gapwidthhist = zeros(numData, numSignals);
auxnan = isnan(signals);
for i = 1:numSignals
c = 0;
for j = 1:numData
if auxnan(j,i)
c = c + 1;
else
if c > 0
gapwidthhist(c,i) = gapwidthhist(c,i) + 1;
c = 0;
end
end
end
if c > 0
gapwidthhist(c,i) = gapwidthhist(c,i) + 1;
end
end
gapwidthhist
Open question: how to modify the code where outer for-loop should be able to use parfor-loop?

Replace non-NaN values with their row indices within matrix

I have the 4x2 matrix A:
A = [2 NaN 5 8; 14 NaN 23 NaN]';
I want to replace the non-NaN values with their associated indices within each column in A. The output looks like this:
out = [1 NaN 3 4; 1 NaN 3 NaN]';
I know how to do it for each column manually, but I would like an automatic solution, as I have much larger matrices to handle. Anyone has any idea?

out = bsxfun(#times, A-A+1, (1:size(A,1)).');
How it works:
A-A+1 replaces actual numbers in A by 1, and keeps NaN as NaN
(1:size(A,1)).' is a column vector of row indices
bsxfun(#times, ...) multiplies both of the above with singleton expansion.
As pointed out by #thewaywewalk, in Matlab R2016 onwards bsxfun(#times...) can be replaced by .*, as singleton expansion is enabled by default:
out = (A-A+1) .* (1:size(A,1)).';
An alternative suggested by #Dev-Il is
out = bsxfun(#plus, A*0, (1:size(A,1)).');
This works because multiplying by 0 replaces actual numbers by 0, and keeps NaN as is.

Applying ind2sub to a mask created with isnan will do.
mask = find(~isnan(A));
[rows,~] = ind2sub(size(A),mask)
A(mask) = rows;
Note that the second output of ind2sub needs to be requested (but neglected with ~) as well [rows,~] to indicate you want the output for a 2D-matrix.
A =
1 1
NaN NaN
3 3
4 NaN
A.' =
1 NaN 3 4
1 NaN 3 NaN
Also be careful the with the two different transpose operators ' and .'.
Alternative
[n,m] = size(A);
B = ndgrid(1:n,1:m);
B(isnan(A)) = NaN;
or even (with a little inspiration by Luis Mendo)
[n,m] = size(A);
B = A-A + ndgrid(1:n,1:m)
or in one line
B = A-A + ndgrid(1:size(A,1),1:size(A,2))

This can be done using repmat and isnan as follows:
A = [ 2 NaN 5 8;
14 NaN 23 NaN];
out=repmat([1:size(A,2)],size(A,1),1); % out contains indexes of all the values
out(isnan(A))= NaN % Replacing the indexes where NaN exists with NaN
Output:
1 NaN 3 4
1 NaN 3 NaN
You can take the transpose if you want.

I'm adding another answer for a couple of reasons:
Because overkill (*ahem* kron *ahem*) is fun.
To demonstrate that A*0 does the same as A-A.
A = [2 NaN 5 8; 14 NaN 23 NaN].';
out = A*0 + kron((1:size(A,1)).', ones(1,size(A,2)))
out =
1 1
NaN NaN
3 3
4 NaN

speed up replace NaNs with last non-Nan value

I'd like to replace all the NaNs in a vector with the last previous non-NaN value
input = [1 2 3 NaN NaN 2];
output = [1 2 3 3 3 2];
i'd like to try and speed up the loop I already have
input = [1 2 3 NaN NaN 2];
if isnan(input(1))
input(1) = 0;
end
for i= 2:numel(input)
if isnan(input(i))
input(i) = input(i-1);
end
end
thanks in advance

Since you want the previous non-NaN value, I'll assume that the first value must be a number.
while(any(isnan(input)))
input(isnan(input)) = input(find(isnan(input))-1);
end
I profiled dylan's solution, Oleg's solution, and mine on a 47.7 million long vector. The times were 12.3s for dylan, 3.7 for Oleg, and 1.9 for mine.

Here a commented solution, works for a vector only but might be enxtended to work on a matrix:
A = [NaN NaN 1 2 3 NaN NaN 2 NaN NaN NaN 3 NaN 5 NaN NaN];
% start/end positions of NaN sequences
sten = diff([0 isnan(A) 0]);
B = [NaN A];
% replace with previous non NaN
B(sten == -1) = B(sten == 1);
% Trim first value (previously padded)
B = B(2:end);
Comparison
A: NaN NaN 1 2 3 NaN NaN 2 NaN NaN NaN 3 NaN 5 NaN NaN
B: NaN NaN 1 2 3 NaN 3 2 NaN NaN 2 3 3 5 NaN 5

Not fully vectorized but quite simple and probably still fairly efficient:
x = [1 2 3 NaN NaN 2];
for f = find(isnan(x))
x(f)=x(f-1);
end
Of course this is only slightly different than the solution provided by #Hugh Nolan

nan_ind = find(isnan(A)==1);
A(nan_ind) = A(nan_ind-1);

How to remove variable amount of consecutive NaN values from vector in Matlab?

I have a vector of values such as the following:
1
2
3
NaN
4
7
NaN
NaN
54
5
2
7
2
NaN
NaN
NaN
5
54
3
2
NaN
NaN
NaN
NaN
4
NaN
How can I use
interp1
in such way that only a variable amount of consecutive NaN-values would be interpolated? That is for example I would want to interpolate only those NaN-values where there are at most three consecutive NaN-values. So NaN, NaN NaN and NaN NaN NaN would be interpolated but not NaN NaN NaN NaN.
Thank you for any help =)
P.S. If I can't do this with interp1, any ideas how to do this in another way? =)
To give an example, the vector I gave would become:
1
2
3
interpolated
4
7
interpolated
interpolated
54
5
2
7
2
interpolated
interpolated
interpolated
5
54
3
2
NaN
NaN
NaN
NaN
4
interpolated

First of all, find the positions and lengths of all sequences of NaN values:
nan_idx = isnan(x(:))';
nan_start = strfind([0, nan_idx], [0 1]);
nan_len = strfind([nan_idx, 0], [1 0]) - nan_start + 1;
Next, find the indices of the NaN elements not to interpolate:
thr = 3;
nan_start = nan_start(nan_len > thr);
nan_end = nan_start + nan_len(nan_len > thr) - 1;
idx = cell2mat(arrayfun(#colon, nan_start, nan_end, 'UniformOutput', false));
Now, interpolate everything and replace the elements that shouldn't have been interpolated back with NaN values:
x_new = interp1(find(~nan_idx), x(~nan_idx), 1:numel(x));
x_new(idx) = NaN;

I know this is an bad habit in matlab, but I would think this particular case requires a loop:
function out = f(v)
out = zeros(numel(v));
k = 0;
for i = 1:numel(v)
if v(i) ~= NaN
if k > 3
out(i-k:i - 1) = ones(1, k) * NaN;
else
out(i-k: i - 1) = interp1();%TODO: call interp1 with right params
end
out(i) = v(i)
k = 0
else
k = k + 1 % number of consecutive NaN value encoutered so far
end
end