I have a 3D matrix with NaN padded to obtain equal number of rows in each 2D matrix i.e each (:,:,ind). Now i need to find the number of actual non-NaN rows in each (:,:,ind).
A simple example of what I need:
% Input:
A(:,:,1) = [ 1 1;
2 2;
NaN NaN];
A(:,:,2) = [ 2 2;
NaN NaN;
NaN NaN];
% Function call:
B = callingfunction(A);
% Output:
B = [2 1] % Number of Non-NaN rows in each 2D Matrix
Approach #1
B = squeeze(sum(all(~isnan(A),2),1))
Here's the build-up process to get a hang of it -
Start>>> Given A:
>> A
A(:,:,1) =
1 1
2 2
NaN NaN
A(:,:,2) =
2 2
NaN NaN
NaN NaN
1) Detect all non-NaN positions:
>> ~isnan(A)
ans(:,:,1) =
1 1
1 1
0 0
ans(:,:,2) =
1 1
0 0
0 0
2) Find rows with all non-Nan elements:
>> all(~isnan(A),2)
ans(:,:,1) =
1
1
0
ans(:,:,2) =
1
0
0
3) Sum up the number of all such rows:
>> sum(all(~isnan(A),2),1)
ans(:,:,1) =
2
ans(:,:,2) =
1
4) Get the result as a 1D array:
>> squeeze(sum(all(~isnan(A),2),1))
ans =
2
1
Approach #2
B = squeeze(sum(~any(isnan(A),2),1))
Use the same break-up-my-code-into-pieces process as listed earlier here and in all your future MATLAB codes and all past MATLAB codes that didn't make sense to do so now!
Related
May be I don't get a basic thing, but I recently discovered this behaviour, and by now I don't understand what's happening:
A = [3 NaN .12 NaN NaN 9]
A =
3 NaN 0.12 NaN NaN 9
>> nansA = isnan(A)
nansA =
0 1 0 1 1 0
>> nnansA = ~isnan(A)
nnansA =
1 0 1 0 0 1
>> nnansA1 = ~isnan(A(1:end))
nnansA1 =
1 0 1 0 0 1
>> nnansA2 = ~isnan(A(2:end))
nnansA2 =
0 1 0 0 1
>> AnnansA1 = A(~isnan(A(1:end)))
AnnansA1 =
3 0.12 9
>> **AnnansA2 = A(~isnan(A(2:end)))
AnnansA2 =
NaN NaN
What is happening here?
Does this happen in Matlab too?
I would expect something like ... AnnansA2 = 0.12 9
What is happening here is that you're indexing A with a logical array of a different size and expect the indexing to not start at the beginning.
Let's deconstruct, from the inside:
>> A = [3 NaN .12 NaN NaN 9]
A =
3.0000 NaN 0.1200 NaN NaN 9.0000
>> # B a new array, with 5 elements (A had 6 elements)
>> B = A(2:end)
B =
NaN 0.1200 NaN NaN 9.0000
>> # mask is a logical array with 5 elements, like B, and not 6, like A.
>> # mask knows nothing about A or B. It simply "selects" (indexes) the
>> # 1st, 3rd, and 4th element of any array.
>> mask = isnan (B)
mask =
1 0 1 1 0
>> # Invert the mask, now "selects" the 2nd and 5th element of any array.
>> not_mask = ! mask
not_mask =
0 1 0 0 1
>> # Return the 2nd and 5th element of A.
>> A(not_mask)
ans =
NaN NaN
I think you're surprised by the behaviour because you expect that A(2:end) "remembers" that it comes from A to index the right "region" of A. This does not happen, it's just a logical array that "remembers" nothing from which array it came (and often used to index different arrays).
As a side note, and answering one of your questions, Matlab behaviour is the same as Octave.
Anyway, what you're doing looks a bit odd, maybe do this instead:
A(! isnan (A))(2:end)
You're off by one.
You need to do AnnansA2 = A(~isnan(A(1:end)))
If you want to return only the last two non-nans, index the result like;
ananIdxs = ~isnan(A)
AnnansA2 = A(ananIdxs(2:end))
I need help with the following functions: histc and numel in either a for loop or vectorized code. I have a matrix which could be of any dimension. The code needs to output the number of occurrences an element occurs in an interval until the the end of each row. So for the following example, I want to find how many occurrences the number 1 occurs in row 1. So in row 1, the number one occurs two times before being interrupted by two 0's. Then it occurs once more in the last column. So the output would be 2 1.
I appreciate any help. Thank you
x = hist( data, numel(unique(data)) );
y = histc( data, unique(data) );
data (input) 5x5
1 1 0 0 1
1 1 1 1
0 0 1
0 0 1 1 1
1 0 1 1 1
y (output)
2 1
2 2
1
3
1 3
Assuming x to be the input array, this could be one approach -
[nrows,ncols] = size(x);
new_ncols = ncols + 2;
%// Pad one column of zeros on left and right sides of x
x_pz = [zeros(nrows,1) x zeros(nrows,1)]
%// Flatten padded x
x_pzf = reshape(x_pz',[],1)'
%// Start & end indices of islands of ones for flattened padded x
starts = strfind(x_pzf,[1 0]);
ends = strfind(x_pzf,[0 1])
row_ids = ceil(starts/new_ncols); %// row IDs for each island of ones
%// Start & end indices of islands of ones for flattened non-padded (corrected) x
starts_cor = ends - 2*(row_ids-1)
ends_cor = starts - (2*row_ids-1)
%// Get number of elements in each island of ones
counts = ends_cor - starts_cor + 1
%// Bin row_ids for each row of input array
counts_per_row = histc(row_ids,1:nrows)
%// Now setup output array with conts for each island corresponding to each
%// row ending up in its each row and setting the blank spaces as NaNs
mask = bsxfun(#ge,counts_per_row,(1:max(counts_per_row))') %//'
y = NaN(size(mask))
y(mask) = counts
y = y'
Code run -
>> x (modified from the original one to test out more varied situations)
x =
1 1 0 0 1 1
1 1 0 1 1 0
0 0 0 0 1 0
0 0 1 1 1 0
1 0 1 1 1 1
1 1 1 1 1 1
0 0 0 0 0 0
0 1 0 1 0 1
>> y
y =
2 2 NaN
2 2 NaN
1 NaN NaN
3 NaN NaN
1 4 NaN
6 NaN NaN
NaN NaN NaN
1 1 1
If you are looking for a more intuitive and concise way to obtain and display the final output, you can use cell array for that. So, you can do something like this -
>> ycell = arrayfun(#(n) counts(row_ids==n),1:nrows,'Uni',0);
>> celldisp(ycell)
ycell{1} =
2 2
ycell{2} =
2 2
ycell{3} =
1
ycell{4} =
3
ycell{5} =
1 4
ycell{6} =
6
ycell{7} =
[]
ycell{8} =
1 1 1
I have an incomplete dataset,
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]'
I wish to identify a cluster of Nans, that is, if the subsequent number of them exceeds 2. how do i do that?
You could do something like this:
aux = diff([0; isnan(N); 0]);
clusters = [find(aux == 1) find(aux == -1) - 1];
Then clusters will be a Nx2 matrix, where N is the number of NaN clusters (all of them), and each row gives you the start and end index of the cluster.
In this example, that would be:
clusters =
1 1
5 5
8 9
15 19
It means you have 4 NaN clusters, and cluster one ranges from index 1 to index 1, cluster two ranges from 5 to 5, cluster three ranges from 8 to 9 and cluster four ranges from 15 to 19.
If you want only the clusters with at least K NaNs, you could do it like this (for example, with K = 2):
K = 2;
clusters(clusters(:,2) - clusters(:,1) + 1 >= K, :)
That would give you this:
ans =
8 9
15 19
That is, clusters 8-9 and 15-19 have 2 or more NaNs.
Explanation:
Finding the clusters
isnan(N) gives you a logical vector containing the NaNs as ones:
N --------> NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
isnan(N) -> 1 0 0 0 1 0 0 1 1 0 0 0 0 0 1 1 1 1 1
We want to know where each sequence of ones start, so we use diff, which calculates each value minus the previous one, and gives us this:
aux = diff(isnan(N));
N ----> NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
aux --> -1 0 0 1 -1 0 1 0 -1 0 0 0 0 1 0 0 0 0
Where a 1 indicates the group start and a -1 indicates a group end. But it misses the first group start and the last group end, because the first 1 element is absent (it doesn't have a previous on N because it is the first) and the last -1 is absent too (because there is nothing after the last 1 on N). A common fix is to add a zero before and after the array, which gives us this:
aux = diff([0; isnan(N); 0]);
N ----> NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
aux --> 1 -1 0 0 1 -1 0 1 0 -1 0 0 0 0 1 0 0 0 0 -1
Notice two things:
If the diff at index i is 1, N(i) is the start of the NaN block.
If the diff at index i is -1, N(i - 1) is the end of the NaN block.
To get the start and end, we use find to get the indexes where aux == 1 and aux == -1. Hence, we call find twice, and concatenate both calls using [ and ]:
aux = diff([0; isnan(N); 0]);
clusters = [find(aux == 1) find(aux == -1) - 1];
Filtering the clusters whick have K or more elements
The last step is to find clusters which have K or more elements. To do that, we first take the cluster matrix and subtract the first column from the first, and add 1, like this:
clusters(:,2) - clusters(:,1) + 1
ans =
1
1
2
5
It means clusters 1 and 2 have 1 NaN, cluster 3 have 3 NaNs and cluster 4 have 5 NaNs. If we ask which values are greather than or equal K, we get this:
clusters(:,2) - clusters(:,1) + 1 >= K
ans =
0
0
1
1
It's a logical array. We can use that to index only the 1 (true) rows of the cluster matrix, like this:
clusters(clusters(:,2) - clusters(:,1) + 1 >= K, :)
ans =
8 9
15 19
It's like asking: give us only the clusters where the rows match the ones on this logical vector, and give us all columns (denoted by the :).
Here is a modular solution:
% the number of NaN you consider as a cluster
num = 3;
% moving average filter
Z = filter(ones(num,1),1,isnan(N));
x = arrayfun(#(x) find(Z == num) - num + x, 1:num,'uni',0)
y = unique(cell2mat(x))
(UPDATE: faster version below)
gives for num = 1:
y = 1 5 8 9 15 16 17 18 19
for num = 2:
y = 8 9 15 16 17 18 19
for num = 3, num = 4 and num = 5:
y = 15 16 17 18 19
and finally for num = 6 ... and more
y = Empty matrix: 1-by-0
Explanation
isnan(N) returns a logical array with ones at the positions of NaN.
Z = filter(ones(num,1),1,isnan(N)); is a implementation for a moving average filter with a filter window of ones(num,1) = [1 1 1] (for num = 3). So the filter of size 3 glides of the array and just reaches the value num = 3 when there are 3 NaN in a row.
So it basicall looks like:
%// N isnan(N) Z
NaN 1 1
1 0 1
2 0 1
3 0 0
NaN 1 1
5 0 1
6 0 1
NaN 1 1
NaN 1 2
7 0 2
8 0 1
10 0 0
12 0 0
20 0 0
NaN 1 1
NaN 1 2
NaN 1 3
NaN 1 3
NaN 1 3
Now it is easy to find all elements which are 3: find(Z == num) - but you also need all 2 right before: find(Z == num) - num + 2 and all 1 right before: find(Z == num) - num + 1. Instead of a loop arrayfun is used, which is basically the same. As result you get a matrix with a lot of indices, lot of them mulitple, but you just need the unique ones. I hope everything is clear now.
Actually it would be much faster to get find out of arrayfun, which can then even be substituted by bsxfun and you can get rid of cell2mat, which leads to the following form:
Faster:
Z = find( filter(ones(num,1),1,isnan(N)) == num ) - num;
y = unique( bsxfun(#plus, Z,1:num) );
or faster the obligatory fancy one-liner:
y = unique(bsxfun(#plus,find(filter(ones(num,1),1,isnan(N))==num)-num,1:num));
STRFIND Approach
I. Fancy One Liner:
%%// Given input N
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]
out = [strfind(num2str(isnan([ 0 N 0]),'%1d'),'011');strfind(num2str(isnan([ 0 N 0]),'%1d'),'110')]'
Output
out =
8 9
15 19
II. Detailed one with explanation:
Basically you are trying to do sliding window checks, for which there is no direct method when working with double arrays, but after converting to strings, one can use strfind. This trick is used here.
I would suggest following the comments used in the code and the output numbers to understand it. Please note that for this particular case a cluster means a group of two or more consecutive NaNs
Code
%%// Given input N
N = [NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN]
%%// Set the locations where NaNs are present and then
%%// append at the start and end with zeros
N2 = isnan([ 0 N 0])
%%// Find the start indices of all NaN clusters
start_ind = strfind(num2str(N2,'%1d'),'011')
%%// Find the stop indices of all NaN clusters
stop_ind = [strfind(num2str(N2,'%1d'),'110')]
%%// Put start and stop indices into a Mx2 matrix
out = [start_ind' stop_ind']
Output
N =
NaN 1 2 3 NaN 5 6 NaN NaN 7 8 10 12 20 NaN NaN NaN NaN NaN
N2 =
0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 1 1 1 1 1 0
start_ind =
8 15
stop_ind =
9 19
out =
8 9
15 19
This uses diff, as Rafael Monteiro's answer, but seems to be simpler:
ind = diff([0; isnan(N(:))]);
result = find(ind(1:end-1)==1 & ind(2:end)==0);
In your example, this gives [8 15].
How it works: ind takes the values:
1 where a run of (one or more) NaN values starts;
0 where there is no change between NaN and numeric with respect to the previous value;
-1 where a run of (one o more) numeric values starts.
The second line selects the positions at which a run of NaN starts and such that the next position is also NaN. Thus it gives the start of each run of more than one NaN, as desired.
This is just a very simple example to show my problem.
a=ones(5)
How can i insert NaN after every two rows like:
I know the way to do this simple example is:
b(:,1:5)=NaN
[a(1:2,:);b;a(3:4,:);b;a(end,:)]
But the problem is if the martrix is 60000-by-200 (may be more larger), so how can i insert 'NaN' after every two rows.
Thanks so much.
a = ones(5); %// example data
n = 2; %// number of rows
N = floor(size(a,1)*(1+1/n)); %// final number of rows
ind = mod(1:N, n+1) ~= 0; %// logical index for non-NaN rows
b = NaN(N,size(a,2)); %// initiallize result to NaN
b(ind,:) = a; %// fill in non-NaN rows
I can't think of an easy, one-line type solution. It can be done in a pretty tight loop though.
a = ones(5);
a_with_nans = nan(floor(size(a,1)*(3/2)), size(a,2)); %Start with all nans in a larger matrix
for ix = 1:2:size(a,1)
a_with_nans(ix*3/2-(1/2),:) = a(ix,:);
if ix+1<=size(a,1)
a_with_nans(ix*3/2-(1/2)+1,:) = a(ix+1,:);
end
end
Then:
a_with_nans =
1 1 1 1 1
1 1 1 1 1
NaN NaN NaN NaN NaN
1 1 1 1 1
1 1 1 1 1
NaN NaN NaN NaN NaN
1 1 1 1 1
You can do it like this:
>> a= [ 1 2 3 4 5 6 7 8 9]
a =
1 2 3 4 5 6 7 8 9
>> b = nan(floor(length(a)/2),1)'
b =
NaN NaN NaN NaN
>> a_new = zeros(1, length(a)+length(b))
a_new =
0 0 0 0 0 0 0 0 0 0 0 0 0
>> b_i = 3:2:length(a)
b_i =
3 5 7 9
>> a_new(b_i+(0:length(b_i)-1)) = b
a_new =
0 0 NaN 0 0 NaN 0 0 NaN 0 0 NaN 0
>> a_new(~isnan(a_new))=a
a_new =
1 2 NaN 3 4 NaN 5 6 NaN 7 8 NaN 9
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Dealing with NaN’s in matlab functions
Is there a one line command that allows you to take the elementwise average of a matrix (ignoring NaN's) in Matlab? For example,
>> A = [1 0 NaN; 0 3 4; 0 NaN 2]
A =
1 0 NaN
0 3 4
0 NaN 2
So the mean(A) should equal (1+3+2+4+0+0+0)/7 = 1.4286
Also, I don't have access to the stats toolbox so I cannot use nanmean()
You can use isnan() to filter out the unwanted elements:
mean(A(~isnan(A)))
nanmean
Performs just like mean, but ignoring nans.
For example:
>> A = [1 0 NaN; 0 3 4; 0 NaN 2]
A =
1 0 NaN
0 3 4
0 NaN 2
>> nanmean(A)
ans =
0.333333333333333 1.5 3
>> nanmean(A,2)
ans =
0.5
2.33333333333333
1
>> nanmean(A(:))
ans =
1.42857142857143