Essentially, I have a matrix of data with many "holes" represented by NaN, and I want to retrieve the indices of all NaN's that are clustered fewer than 4 times in a single column.
e.g. with the matrix:
A =
23 12 NaN 56 60 21 NaN
60 56 94 22 45 NaN NaN
23 55 19 83 NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN
84 99 43 32 89 12 NaN
76 92 73 47 22 12 10
23 55 12 93 61 94 20
NaN NaN NaN NaN NaN NaN NaN
41 16 83 39 82 37 43
14 78 92 40 81 29 60
it would return:
ans =
[4; 5; 6; 10; 16; 17; 18; 22; 25; 28; 29; 30; 34; 40; 41; 42; 46; 58; 70; 82]
So far, I have a vector with the indices of all the NaN values from
nan_list=find(isnan(A(:)))
but I don't know how to extract sequential numbers from that vector without using loops, which would be too expensive. I also tried something similar to the answer posted by b3 here, by switching all NaN's to a value that doesn't appear in the matrix, but that code was not as transferable for other data sets.
Thanks for any suggestions!
Code
N = 4; %// Fewer than clusters of N or N+ NaNs are to be detecteed
nan_pos = isnan(A) %// Find NaN positions as a binary array
conv_res = conv2(double(nan_pos),[0 ones(1,N)]')==N %//' Perform convolution
start_ind = find(conv_res(N+1:end,:)) %// Find positions where clusters of N or N+ NaNs start
nan_pos(unique(bsxfun(#plus,start_ind,[0:N-1])))=0 %// Get positions of all those clustered N or N+ NaNs and set them in NaN position array as zeros
out = find(nan_pos) %// Finally the desired output
Example
As an example, let's try this code on a slightly different input that would hopefully test out various aspects of the problem -
A = [
23 12 NaN 56 60 21 NaN
60 56 94 22 45 NaN NaN
23 55 19 83 NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN
84 99 43 32 89 12 NaN
76 92 73 47 22 12 10
23 55 12 93 61 94 20
NaN NaN NaN NaN NaN NaN NaN
41 NaN NaN 39 82 37 43
14 78 NaN 40 81 NaN 60]
Now, let's assume that we are looking to find indices of cluster fewer than 3 NaNs. Thus editing N as 3 in the code, the output is -
out =
10 22 23 25 46 58 70 72 82
This makes sense when we look into the input.
This should work:
[rows, ~] = size(A);
maxNansPerCol = 4;
% find which columns have few enough NaNs
Anans = isnan(A);
nansInCols = sum(Anans);
qualifyingCols = nansInCols <= maxNansPerCol;
% zero the other columns
mask = repmat(qualifyingCols,rows,1);
B = Anans .* mask;
% get the NaN locations
indices = find(B(:));
(Apologies if something is slightly off--I don't have MATLAB on this computer to test it)
Related
I have a matrix with values in the center and NaNs on the border (imagine a matrix representing a watershed which is never square). I need to pad it with one cell to do some component stress calculations. I am trying to avoid using outside libraries from the core Matlab functionality however what i am trying to do is similar to padarray symmetric but for an irregular border:
padarray(Zb,[1 1],'symmetric','both');
For example:
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN 2 5 39 55 44 8 NaN NaN NaN
NaN NaN NaN NaN 7 33 48 31 66 17 NaN NaN
NaN NaN NaN NaN 28 NaN 89 NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Becomes:
NaN NaN 2 2 5 39 55 44 8 8 NaN NaN
NaN NaN 2 2 5 39 55 44 8 17 17 NaN
NaN NaN 2 2 7 33 48 31 66 17 17 NaN
NaN NaN NaN 28 28 33 89 31 66 17 17 NaN
NaN NaN NaN 28 28 28 89 89 NaN NaN NaN NaN
(Not sure how to handle convex corners with two adjacent values since I need to control edge effects).
This post follows on an earlier question today in which I was able to extract the locations of these padded cells (buffers) into a dilated logical. However using fillmissing with nearest did not create the effect I expected (what padarray does).
Zb_ext(logical(ZbDilated)) = fillmissing(Zb_ext(logical(ZbDilated)),'nearest');
I might be able to reverse what I did to find the padcells to find the adjacent values and use those to replace the pad cell NaNs. But I thought I would first see if there was a simpler solution?
You can use two 2D convolutions to achieve this, where conv2 is within the core MATLAB library so nothing external is needed, and it should be fast.
However, you noted this:
Not sure how to handle convex corners with two adjacent values since I need to control edge effects
I've taken the liberty of defining a "sensible" output for convex corners which is to take the average value, because from your example it seems undefined how these cases, and more complicated ones like cell (5,6), should be handled.
I've added detailed comments to the below code for explanation
% Example matrix
A = [
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN 2 5 39 55 44 8 NaN NaN NaN
NaN NaN NaN NaN 7 33 48 31 66 17 NaN NaN
NaN NaN NaN NaN 28 NaN 89 NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
];
% Track the "inner" indices, where values are defined
inner = (~isnan(A));
B = A; % Copy A so we don't change it
B(~inner) = 0; % Replace NaN with 0 so that convolutions work OK
% First dilate the inner region by one element, taking the average of
% neighbours which are up/down/left/right (no diagonals). This is required
% to avoid including interior points (which only touch diagonally) in the
% averaging. These can be considered the "cardinal neighbours"
kernel = [0 1 0 ; 1 0 1; 0 1 0]; % Cardinal directions in 3x3 stencil
s = conv2(B,kernel,'same'); % 2D convolution to get sum of neighbours
n = conv2(inner,kernel,'same'); % 2D convolution to get count of neighbours
s(inner) = 0; % Zero out the inner region
s = s./n; % Get the mean of neighbours
% Second, dilate the inner region but including the mean from all
% directions. This lets us handle convex corners in the image
s2 = conv2(B,ones(3),'same'); % Sum of neighbours (and self, doesn't matter)
n = conv2(inner,ones(3),'same'); % Count of neighbours (self=0 for dilated elems)
s2 = s2./n; % Get the mean of neighbours
% Finally piece together the 3 matrices:
out = s2; % Start with outmost dilation inc. corners
out(~isnan(s)) = s(~isnan(s)); % Override with inner dilation for cardinal neighbours
out(inner) = A(inner); % Override with original inner data
So for this example, the output would be the same as your example output, except for corners as mentioned:
NaN NaN 2 2 5 39 55 44 8 8 NaN NaN
NaN NaN 2 2 5 39 55 44 8 12.5 17 NaN
NaN NaN 2 4.5 7 33 48 31 66 17 17 NaN
NaN NaN NaN 28 28 50 89 60 66 17 17 NaN
NaN NaN NaN 28 28 58.5 89 89 NaN NaN NaN NaN
Related (and utilised): MATLAB/Octave: Calculate the sum of adjacent/neighboring elements in a matrix
I use the following code with different data files, without any problem. However, there is one file that creates troubles, and when I run the code with that file, I get an error:
'Index exceeds matrix dimensions.'
I think that is because 'i' is equal to 2546, but when the code runs the line: i=i+1, instead of stopping at 2546, it keeps going and stops (giving the error) at 2547 - which of course exceeds the matrix dimensions. In fact, when the code stops working, producing the error, I can see in the Workspace that 'i' is equal to 2547, and 'j' to 2 (instead of 5, if the loop would have worked fine).
As the exact same code works perfectly fine with other files, I assume there is something to do with this specific file. Any insight on how to solve the issue?
Here is the code:
for i=1:size(colInd,1)
for j=1:size(colInd,2)
if colInd(i,j)>0 && colInd(i,j)<=13
M1(i,j)=Windowsdata(i,colInd(i,j));
elseif colInd(i,j)==0 | colInd(i,j)==14
M1(i,j)=NaN;
elseif colInd(i,j)==-1 | colInd(i,j)==15
M1(i,:)=NaN;
i=i+1;
end
end
end
Example lines from colInd, which is 2546 x 5 double
4 5 6 7 8
-1 0 1 2 3
2 3 4 5 6
11 12 13 14 15
0 1 2 3 4
5 6 7 8 9
3 4 5 6 7
5 6 7 8 9
-1 0 1 2 3
11 12 13 14 15
Example lines from Windowsdata, which is 2546 x 13 double
-4.37370443344116 -1.64714550971985 0.569347918033600 1.62668454647064 3.73541021347046 5.15196514129639 4.04361486434937 1.77491927146912 0.702701866626740 -0.354207783937454 1.18695282936096 2.82701897621155 4.01644039154053
3.72757863998413 1.44241857528687 -1.15181946754456 -2.97936320304871 -5.16328191757202 -4.25508642196655 -2.47518587112427 0.287524074316025 -1.17596077919006 -2.04023623466492 -2.78539514541626 -2.96725606918335 -5.59557294845581
-5.52127933502197 -1.69257545471191 3.61181259155273 4.46472501754761 0.345008432865143 -4.78608989715576 -7.80892658233643 -8.83082866668701 -5.61083126068115 -4.40270948410034 -3.05102157592773 -4.67261123657227 -5.50971889495850
1.24733197689056 0.692575275897980 0.549045324325562 1.33569169044495 2.26527953147888 3.19271230697632 1.92626762390137 -0.00543282041326165 -1.76812970638275 -3.55482935905457 -2.28071475028992 2.58129334449768 6.07476711273193
2.17950797080994 2.73428583145142 1.63492679595947 -0.256836771965027 -0.773400425910950 -1.04227805137634 -1.82435607910156 -2.64025163650513 -1.53338134288788 -2.29410648345947 -4.26442241668701 -4.76120758056641 -4.47712421417236
-0.246993020176888 0.157185763120651 0.250829964876175 -0.986824631690979 1.40918886661530 5.03370332717896 8.15515422821045 6.41663646697998 2.43448591232300 -2.98093175888062 -3.53510475158691 -1.89243125915527 1.47953033447266
4.36318445205688 5.06837177276611 5.78645181655884 6.97499608993530 7.49895095825195 5.27076244354248 4.75153970718384 4.35132837295532 2.37539553642273 0.0745598822832108 0.782306909561157 1.98255372047424 1.82295107841492
0.393009424209595 0.348423480987549 -0.0242169145494699 -0.451373100280762 0.792472958564758 3.95410203933716 6.95971775054932 6.07247447967529 4.61793804168701 2.25326156616211 1.17793440818787 -1.02191674709320 -1.40514099597931
2.97367334365845 2.56695508956909 -0.0324615947902203 -0.512259364128113 -0.169182881712914 1.99416732788086 2.05820631980896 1.26427924633026 -0.107465483248234 -1.26579785346985 -2.51656532287598 -2.19553661346436 -1.86673855781555
-5.92374515533447 -4.78130531311035 -5.02523994445801 -4.12971973419189 -2.56698751449585 -2.16855669021606 -2.66882371902466 -3.24165868759155 -4.10617780685425 -4.71752023696899 -4.63748264312744 -3.33325529098511 -2.00388121604919
If I understand you correctly, colInd is a NxM matrix with N=2546 and you have the expectation, that in your last elseif, by incrementing i by one you end up in the next outer for loop iteration, starting j from 1 to M in the (i+1)th iteration.
If this is the behaviour you want to achieve, you need to use the break statement to break out of the inner for loop. Else, if j<size(colInd,2), j will be incremented and the inner loop continues.
If I understood your desired behaviour correctly, it is not a problem with the file, but rather of your algorithm. But as #excaza pointed out, you should really give an example that is mcve, including the necessary variables (such as Windowsdata). Otherwise it is really difficult to make sense of what you're trying to do.
Edit: Give this a try:
for i=1:size(colInd,1)
for j=1:size(colInd,2)
if colInd(i,j)>0 && colInd(i,j)<=13
M1(i,j)=Windowsdata(i,colInd(i,j));
elseif colInd(i,j)==0 | colInd(i,j)==14
M1(i,j)=NaN;
elseif colInd(i,j)==-1 | colInd(i,j)==15
M1(i,:)=NaN;
break;
end
end
end
Edit2: To clarify: The problem you describe occurs if in the iteration where i=size(colInd,1) (i.e. i=2546) and j<size(colInd,2) (i.e. j<5, let's assume j=1 for simplicity) the last elseif holds. Thus, causing your i to be incremented to i=2546+1=2547, and the inner loop goes through the next iteration. With j=2 now, in the first if you attempt to access colInd(2547,2), which exceeds the dimensions of colInd.
Edit3: If you want a more Matlab-y implementation for this, because for loops are not very good Matlab coding style, I also append this solution that uses vectorization (albeit not that great).
colInd = [ 4 5 6 7 8;
-1 0 1 2 3;
2 3 4 5 6;
11 12 13 14 15;
0 1 2 3 4;
5 6 7 8 9;
3 4 5 6 7;
5 6 7 8 9;
-1 0 1 2 3;
11 12 13 14 15;
10 11 12 13 14];
Windowsdata = reshape([1:size(colInd,1)*13],[size(colInd,1) 13]);
M1 = zeros(size(colInd));
M2 = zeros(size(colInd));
c1 = find(colInd>0&colInd<=13);
c2 = find(colInd==0|colInd==14);
c3 = find(colInd==-1|colInd==15);
[x1,~] = ind2sub(size(colInd),c1);
[x3,~] = ind2sub(size(colInd),c3);
M2(c1) = Windowsdata(sub2ind(size(Windowsdata),x1,colInd(c1)));
M2(c2) = NaN;
M2(x3,:) = NaN;
It runs about 3-5 times as fast as your for loop implementation.
Edit: Added the missing term to the sub2ind call, and it gives the same result as the for loop:
M1 =
34 45 56 67 78
NaN NaN NaN NaN NaN
14 25 36 47 58
NaN NaN NaN NaN NaN
NaN 5 16 27 38
50 61 72 83 94
29 40 51 62 73
52 63 74 85 96
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
110 121 132 143 NaN
M2 =
34 45 56 67 78
NaN NaN NaN NaN NaN
14 25 36 47 58
NaN NaN NaN NaN NaN
NaN 5 16 27 38
50 61 72 83 94
29 40 51 62 73
52 63 74 85 96
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
110 121 132 143 NaN
all,
I have a large dataset with a lot of continuous NAs, is there any fast way to replace the NAs with the average of previous and next non-missing value by column?
Thanks a lot
Lou
Interesting question... if only you explained clearly what you want. Maybe it's this?
data = [1 3 NaN 7 6 NaN NaN 2].'; %'// example data: column vector
isn = isnan(data); %// determine which values are NaN
inum = find(~isn); %// indices of numbers
inan = find(isn); %// indices of NaNs
comp = bsxfun(#lt,inan.',inum); %'// for each (number,NaN): 1 if NaN precedes num
[~, upper] = max(comp); %// next number to each NaN (max finds *first* maximum)
data(isn) = (data(inum(upper))+data(inum(upper-1)))/2; %// fill with average
In this example: original data:
>> data.'
ans =
1 3 NaN 7 6 NaN NaN 2
Result:
>> data.'
ans =
1 3 5 7 6 4 4 2
If you have a 2D array and want to work by columns, a for loop over columns is probably the best option.
And of course, if there can be NaN's at the beginning or end of a column, the problem is undefined.
Assuming NaNs are not in the first/last row in any column, here is how I would do it:
(If there are multiple consecutive NaNs, it searches for previous ann next non-missing values and averages them).
% Creating A
A=magic(7);
newA=A; %Result will be in newA
A(3,4)=NaN;
A(2,1)=NaN;
A(5,6)=NaN;
A(6,6)=NaN;
A(4,6)=NaN;
% Finding NaN position and calculating positions where we have to average numbers
ind=find(isnan(A));
otherInd=setdiff(1:numel(A(:)),ind);
for i=1:size(ind,1)
temp=otherInd(otherInd<ind(i));
prevInd(i,1)=temp(end);
temp=otherInd(otherInd>ind(i));
nextInd(i,1)=temp(1);
end
% For faster processing purposes
allInd(1:2:2*length(prevInd))=prevInd;
allInd(2:2:2*length(prevInd))=nextInd;
fun=#(block_struct) mean(block_struct.data)
prevNextNums=A(allInd);
A
newA(ind)=blockproc(prevNextNums,[1 2],fun)
%-----------------------Answer--------------------------
A =
30 39 48 1 10 19 28
NaN 47 7 9 18 27 29
46 6 8 NaN 26 35 37
5 14 16 25 34 NaN 45
13 15 24 33 42 NaN 4
21 23 32 41 43 NaN 12
22 31 40 49 2 11 20
newA =
30 39 48 1 10 19 28
38 47 7 9 18 27 29
46 6 8 17 26 35 37
5 14 16 25 34 23 45
13 15 24 33 42 23 4
21 23 32 41 43 23 12
22 31 40 49 2 11 20
I have a data :
minval = NaN 7 8 9 9 9 10 10 10 10
NaN NaN 10 10 10 10 10 10 10 10
NaN NaN NaN 10 10 9 10 10 10 9
NaN NaN NaN NaN 9 9 10 9 10 10
NaN NaN NaN NaN NaN 9 10 10 10 10
NaN NaN NaN NaN NaN NaN 10 11 10 10
NaN NaN NaN NaN NaN NaN NaN 10 10 10
NaN NaN NaN NaN NaN NaN NaN NaN 10 10
NaN NaN NaN NaN NaN NaN NaN NaN NaN 10
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
and I do this following :
C=size(minval,2);
D1(1,2:end) = minval(1,2:C);
D2 = bsxfun(#plus,minval(2:C-1,3:C),D1(1,1:C-2)');
D2 = [zeros(1,size(D2,2)) ;D2];
D2(D2==0) = NaN;
D1(2,3:end) = nanmin(D2);
D3 = bsxfun(#plus,minval(3:C-1,4:C),D1(2,2:C-2)');
D3 = [zeros(2,size(D3,2)) ;D3];
D3(D3==0) = NaN;
D1(3,4:end)= nanmin(D3);
Then, I want to backtrack the path which D1(end,end)comes from.
Is there any help? Thank you.
In MATLAB you can index out parts of matrices directly. There's no need for loops here:
C=size(minval,2);
D1(2:C) = minval(1,2:C);
For these ones you are not doing what you hoped, I suspect:
for e=3:C
for b=2:e-1
D2(e)=min(minval(b,e)+D1(b-1));
end
end
In the inner loop, for each value of b (from 2 to e-1), you are overwriting the value of D2 at each step. Only the result for the last value of b will be recorded. There may well be a much simpler way of getting the result you want. min and other functions do not just work on two single values but on entire matrices - e.g. you can do:
min(minval)
ans =
NaN 7 8 9 9 9 10 9 10 9
Currently, I need to write a program using matlab to transformate a matrix using homogeneous coordinates like this
% for translation
T = [1 0 dx; 0 1 dy; 0 0 1];
For example:
A =
92 99 1 8 15 67 74 51 58 40
98 80 7 14 16 73 55 57 64 41
4 81 88 20 22 54 56 63 70 47
85 87 19 21 3 60 62 69 71 28
86 93 25 2 9 61 68 75 52 34
17 24 76 83 90 42 49 26 33 65
23 5 82 89 91 48 30 32 39 66
79 6 13 95 97 29 31 38 45 72
10 12 94 96 78 35 37 44 46 53
11 18 100 77 84 36 43 50 27 59
>> I = translate(A, 4, 4)
I =
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN 92 99 1 8 15 67
NaN NaN NaN NaN 98 80 7 14 16 73
NaN NaN NaN NaN 4 81 88 20 22 54
NaN NaN NaN NaN 85 87 19 21 3 60
NaN NaN NaN NaN 86 93 25 2 9 61
NaN NaN NaN NaN 17 24 76 83 90 42
Where NaN cells means 'empty spaces'. As you can see, A matrix was translate 4 units on x axis and 4 units on y axis, leaving NaN values. The output matrix I must be the same size like A.
However, my current program don't work fine using images (It does not put 'NaN' values on empty spaces, it puts '1'):
So, this is my program:
function t_matrix = translate(input_matrix, dx, dy)
[rows cols] = size(input_matrix);
t_matrix = input_matrix;
t_matrix(:) = NaN;
T = [1 0 dx; 0 1 dy; 0 0 1];
for n = 1:numel(input_matrix)
[x y] = ind2sub([rows cols], n);
v = [x y 1]';
v = T*v;
a = floor(v(1));
b = floor(v(2));
if a > 0 && b > 0
t_matrix(a, b) = input_matrix(x,y);
end
end
t_matrix = t_matrix(1:rows, 1:cols);
How can I implement homogeneous transformation using matlab in a easier way?
Only restriction: keep using this matrix:
% for translation
T = [1 0 dx; 0 1 dy; 0 0 1];
And keep NaN values for empty spaces.
The problem with you code might be that you operate on integers, and NaN is a double value. You can not assign input_matrix to t_matrix. You should create t_matrix using nan function:
t_matrix = nan(size(input_matrix));
The following is a direct translation of your code, I just removed the loop
function I = translate(input_matrix, dx, dy)
% get matrix dimensions
[rows cols] = size(input_matrix);
T = [1 0 dx; 0 1 dy; 0 0 1];
% create a nan's output matrix
I = nan(size(input_matrix));
% create row-column index pairs
[R C] = meshgrid(1:cols, 1:rows);
% append 1 at the end
IDX = [R(:) C(:) ones(numel(input_matrix),1)]';
% transform coordinates
V = floor(T*IDX);
% find indices that fall into [rows, cols] range
keep = find(V(1,:)>0 & V(1,:)<=rows & V(2,:)>0 & V(2,:)<=cols);
% assign output only to the correct indices
I(sub2ind([rows cols], V(1,keep), V(2,keep))) = input_matrix(sub2ind([rows cols], R(keep), C(keep)))
end
On the other hand, you can obtain the same result as in the question just by running the following function (no T matrix though..)
function I = translate(A, dx, dy)
I = nan(size(A));
I(dx+1:end, dy+1:end) = A(1:end-dx, 1:end-dy);
end
The easiest way achieving it, if you have the image processing toolbox, is to use the built-in functions maketform and imtransform:
I = imread('cameraman.tif');
dx = 40;
dy = 100;
tform = maketform('affine',[1 0 0; 0 1 0; dx dy 1]); %#Create a translation matrix
J = imtransform(I,tform,'XData',[0 size(I,2)+dx],'YData',[0 size(I,1)+dy]);
imshow(I), figure, imshow(J)
The matrix given as input to maketform is a transpose of yours matrix
It is important to set the XData and YData, otherwise you will not get the "translation effect', since imtransform finds the smallest output range.
If you want to get the same size as inital image, use the following syntax:
.
J = imtransform(I,tform,'XData',[0 size(I,2)],'YData',[0 size(I,1)]);
Image Before:
Image After:
Image After (Keeping the same size):