finding indeces of similar group elements

finding indeces of similar group elements - matlab

I have a vector test2 that includes NaN 0 and 1 in random order (we cannot make any assumption).
test2 = [NaN 1 1 1 0 0 0 NaN NaN NaN 0 0 0 1 1 1 0 1 1 1 ];
I would like to group the elements containing consecutive 1 and to have in the separte vectors start and finish the first and last index of the groups.
In this case start and finish should be:
start = [2 14 18];
finish = [4 16 20];
I tried to adapt the code provided here coming up with this solution that is not working...could you help me with the right solution and tell me why the one I tried doesn't work?
a = (test2 ==1);
d = diff(a);
start = find([a(1) d]==1); % Start index of each group
finish = find([d - a(end)]==-1); % Last index of each group
start =
2 14 18
finish =
2 3 5 6 7 8 9 10 11 12 14 15 18 19
I am using MATLAB R2013b running on Windows.
I tried also using MATLAB R2013a running on ubuntu.

a = (test2 ==1)
d=diff([0 a 0])
start=find(d==1)
finish=find(d==-1)-1
Padding a zero at the beginning and end is the easiest possibility. Then the special cases where a group starts at index 1 or ends at last index don't cause problems.
Full output:
>> test2 = [NaN 1 1 1 0 0 0 NaN NaN NaN 0 0 0 1 1 1 0 1 1 1 ]
test2 =
Columns 1 through 16
NaN 1 1 1 0 0 0 NaN NaN NaN 0 0 0 1 1 1
Columns 17 through 20
0 1 1 1
>> a = (test2 ==1)
a =
Columns 1 through 16
0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1
Columns 17 through 20
0 1 1 1
>> d=diff([0 a 0])
d =
Columns 1 through 16
0 1 0 0 -1 0 0 0 0 0 0 0 0 1 0 0
Columns 17 through 21
-1 1 0 0 -1
>> start=find(d==1)
start =
2 14 18
>> finish=find(d==-1)-1
finish =
4 16 20
>>

The problem is the line finish = find([d - a(end)]==-1);, in particular that a(end) == 1. There are two steps to correcting this. First, change the problem line to finish = find(d==-1); This tells MATLAB, "Look for the elements where the difference between adjacent elements is -1". In other words, the vector shifts from 1 to 0 or NaN. If you run the code, you'll get
start = 2 14 18
finish = 4 16
Now, you'll notice the last element isn't detected (i.e. we should get finish(3) == 20. This is because the length of d is one less than the length of test2; the function diff cannot calculate the difference between the last element and the non-existant last+1 element!
To remedy this, we should modify a:
a = [(test2 == 1) 0];
And you will get the right output for start and finish.

Related

How to normalize matrix setting 0 for minimum values and 1 for maximum values?

I need to transform a neural network output matrix with size 2 X N in zeros and ones, where 0 will represent the minimum value of the column and 1 contrariwise. This will be necessary in order to calculate the confusion matrix.
For example, consider this matrix 2 X 8:
2 33 4 5 6 7 8 9
1 44 5 4 7 5 2 1
I need to get this result:
1 0 0 1 0 1 1 1
0 1 1 0 1 0 0 0
How can I do this in MATLAB without for loops? Thanks in advance.

>> d = [ 2 33 4 5 6 7 8 9;
1 44 5 4 7 5 2 1];
>> bsxfun(#rdivide, bsxfun(#minus, d, min(d)), max(d) - min(d))
ans =
1 0 0 1 0 1 1 1
0 1 1 0 1 0 0 0
The bsxfun function is necessary to broadcast the minus and division operations to matrices of different dimensions (min and max have only 1 row each).
Other solution is the following (works only for 2 rows):
>> [d(1,:) > d(2,:); d(1,:) < d(2,:)]
ans =
1 0 0 1 0 1 1 1
0 1 1 0 1 0 0 0

If it's just 2xN, then this will work:
floor(A./[max(A); max(A)])
In general:
floor(A./repmat(max(A),size(A,1),1))

find longest sequence of non nan values but allow for threshold

Is it possible to find the non nan values of a vector but also allowing n number of nans? For example, if I have the following data:
X = [18 3 nan nan 8 10 11 nan 9 14 6 1 4 23 24]; %// input array
thres = 1; % this is the number of nans to allow
and I would like to only keep the longest sequence of values with non nans but allow 'n' number of nans to be kept in the data. So, say that I am willing to keep 1 nan I would have an output of
X_out = [8 10 11 nan 9 14 6 1 4 23 24]; %// output array
Thats is, the two nans at the beginning have been removed becuase they exceed the values in 'thres' above, but the third nan is on its own thus can be kept in the data. I would like to develop a method where thres can be defined as any value.
I can find the non nan values with
Y = ~isnan(X); %// convert to zeros and ones
Any ideas?

In order to find the longest sequence containing at most threshold times NaN we must find the start and the end of said sequence(s).
To generate all possible start points, we can use hankel:
H = hankel(X)
H =
18 3 NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24
3 NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24 0
NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24 0 0
NaN 8 10 11 NaN 9 14 6 1 4 23 24 0 0 0
8 10 11 NaN 9 14 6 1 4 23 24 0 0 0 0
10 11 NaN 9 14 6 1 4 23 24 0 0 0 0 0
11 NaN 9 14 6 1 4 23 24 0 0 0 0 0 0
NaN 9 14 6 1 4 23 24 0 0 0 0 0 0 0
9 14 6 1 4 23 24 0 0 0 0 0 0 0 0
14 6 1 4 23 24 0 0 0 0 0 0 0 0 0
6 1 4 23 24 0 0 0 0 0 0 0 0 0 0
1 4 23 24 0 0 0 0 0 0 0 0 0 0 0
4 23 24 0 0 0 0 0 0 0 0 0 0 0 0
23 24 0 0 0 0 0 0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Now we need to find the last valid element in each row.
To do so, we can use cumsum:
C = cumsum(isnan(H),2)
C =
0 0 1 2 2 2 2 3 3 3 3 3 3 3 3
0 1 2 2 2 2 3 3 3 3 3 3 3 3 3
1 2 2 2 2 3 3 3 3 3 3 3 3 3 3
1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The end point for each row is the one, where the corresponding element in C is at most threshold:
threshold = 1;
T = C<=threshold
T =
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
The last valid element is found using:
[~,idx]=sort(T,2);
lastone=idx(:,end)
lastone =
3 2 1 4 15 15 15 15 15 15 15 15 15 15 15
We must make sure that the actual length of each row is respected:
lengths = length(X):-1:1;
real_length = min(lastone,lengths);
[max_length,max_idx] = max(real_length)
max_length =
11
max_idx =
5
In case there are more sequences of equal maximum length, we just take the first and display it:
selected_max_idx = max_idx(1);
H(selected_max_idx, 1:max_length)
ans =
8 10 11 NaN 9 14 6 1 4 23 24
full script
X = [18 3 nan nan 8 10 11 nan 9 14 6 1 4 23 24];
H = hankel(X);
C = cumsum(isnan(H),2);
threshold = 1;
T = C<=threshold;
[~,idx]=sort(T,2);
lastone=idx(:,end)';
lengths = length(X):-1:1;
real_length = min(lastone,lengths);
[max_length,max_idx] = max(real_length);
selected_max_idx = max_idx(1);
H(selected_max_idx, 1:max_length)

Approach 1: convolution
One possible approach is to convolve Y = double(~isnan(X)); with a window of n ones, where n is decreased by until an acceptable subsequence is found. "Acceptable" means that the subsequence contains at least n-thres ones, that is, the convolution gives at least n-thres.
Y = double(~isnan(X));
for n = numel(Y):-1:1 %// try all possible sequence lengths
w = find(conv(Y,ones(1,n),'valid')>=n-thres); %// is there any acceptable subsequence?
if ~isempty(w)
break
end
end
result = X(w:w+n-1);
Aproach 2: cumulative sum
Convolving Y with a window of n ones (as in approach 1) is equivalent to computing a cumulative sum of Y and then taking differences with n spacing. This is more efficient in terms of number of operations.
Y = double(~isnan(X));
Z = cumsum(Y);
for n = numel(Y):-1:1
w = find([Z(n) Z(n+1:end)-Z(1:end-n)]>=n-thres);
if ~isempty(w)
break
end
end
result = X(w:w+n-1);
Approach 3: 2D convolution
This essentially computes all iterations of the loop in approach 1 at once.
Y = double(~isnan(X));
z = conv2(Y, tril(ones(numel(Y))));
[nn, ww] = find(bsxfun(#ge, z, (1:numel(Y)).'-thres)); %'
[n, ind] = max(nn);
w = ww(ind)-n+1;
result = X(w:w+n-1);

Let's try my favorite tool: RLE. Matlab doesn't have a direct function, so use my "seqle" posted to exchange central. Seqle's default is to return run length encoding. So:
>> foo = [ nan 1 2 3 nan nan 4 5 6 nan 5 5 5 ];
>> seqle(isnan(foo))
ans =
run: [1 3 2 3 1 3]
val: [1 0 1 0 1 0]
The "run" indicates the length of the current run; "val" indicates the value. In this case, val==1 indicates the value is nan and val==0 indicates numeric values. You can see it'll be relatively easy to extract the longest sequence of "run" values meeting the condition val==0 | run < 2 to get no more than one nan in a row. Then just grab the cumulative indices of that run and that's the subset of foo you want.
EDIT:
sadly, what's trivial to find by eye may not be so easy to extract via code. I suspect there's a much faster way to use the indices identified by longrun to get the desired subsequence.
>> foo = [ nan 1 2 3 nan nan 4 5 6 nan nan 5 5 nan 5 nan 4 7 4 nan ];
>> sfoo= seqle(isnan(foo))
sfoo =
run: [1 3 2 3 2 2 1 1 1 3 1]
val: [1 0 1 0 1 0 1 0 1 0 1]
>> longrun = sfoo.run<2 |sfoo.val==0
longlong =
run: [2 1 1 1 6]
val: [1 0 1 0 1]
% longrun identifies which indices might be part of a run
% longlong identifies the longest sequence of valid run
>> longlong = seqle(longrun)
>> lfoo = find(sfoo.run<2 |sfoo.val==0);
>> sbar = seqle(lfoo,1);
>> maxind=find(sbar.run==max(sbar.run),1,'first');
>> getlfoo = lfoo( sum(sbar.run(1:(maxind-1)))+1 );
% first value in longrun , which is part of max run
% getbar finds end of run indices
>> getbar = getlfoo:(getlfoo+sbar.run(maxind)-1);
>> getsbar = sfoo.run(getbar);
% retrieve indices of input vector
>> startit = sum(sfoo.run(1:(getbar(1)-1))) +1;
>> endit = startit+ ((sum(sfoo.run(getbar(1):getbar(end ) ) ) ) )-1;
>> therun = foo( startit:endit )
therun =
5 5 NaN 5 NaN 4 7 4 NaN

Hmmm, who doesn't like challenges, my solution is not as good as m.s.'s, but it is an alternative.
X = [18 3 nan nan 8 10 11 nan 9 14 6 1 4 23 24]; %// input array
thresh =1;
X(isnan(X))= 0 ;
for i = 1:thresh
Y(i,:) = circshift(X',-i); %//circular shift
end
For some reason, the Matlab invert " ' " makes the formatting looks weird.
D = X + sum(Y,1);
Discard = find(D==0)+thresh; %//give you the index of the part that needs to be discarded
chunk = find(X==0); %//Segment the Vector into segments delimited by NaNs
seriesOfZero = circshift(chunk',-1)' - chunk;
bigchunk =[1 chunk( find(seriesOfZero ~= 1)) size(X,2)]; %//Convert series of NaNs into 1 chunk
[values,DiscardChunk] = intersect(bigchunk,Discard);
DiscardChunk = sort(DiscardChunk,'descend')
for t = 1:size(DiscardChunk,2)
X(bigchunk(DiscardChunk(t)-1):bigchunk(DiscardChunk(t))) = []; %//Discard the data
end
X(X == 0) = NaN
%//End of Code
8 10 11 NaN 9 14 6 1 4 23 24
When:
X = [18 3 nan nan nan 8 10 11 nan nan 9 14 6 1 nan nan nan 4 23 24]; %// input array
thresh =2;
8 10 11 NaN 4 23 24

Matlab:How to find the indices of rows without any zero in a matrix?

How to find the indices of rows without any zero in a matrix?
Example:
A = [
14 0 6 9 8 17
85 14 1 3 0 99
0 0 0 0 0 0
29 4 5 8 7 46
0 0 0 0 0 0
17 0 5 0 0 49
]
the desired result :
V =[4]

Since Adiel did not post an answer, I'll make their comment a CW: the command
V = find(all(A,2))
does the job, because all(A,2) processes every row, returning 1 if there are any nonzero entries. Then find returns the indices of nonzero entries, which are the desired row numbers.
Similarly, V = find(all(A,1)) works column-wise.

Reshape / Transform an upper triangular matrix in MATLAB

I have an upper triangular matrix (without the diagonal) given by:
M = [0 3 2 2 0 0; 0 0 8 6 3 2; 0 0 0 3 2 1; 0 0 0 0 2 1; 0 0 0 0 0 0]
The resulting matrix should look like this:
R = [0 0 0 0 0 0; 0 2 0 0 0 0; 2 3 1 0 0 0; 2 6 2 1 0 0; 3 8 3 2 0 0]
Since I couldn't find a simple explanation which describes my goal I tried to visualize it with an image:
I already tried lots of different combinations of rot90, transpose, flipud etc., but I could't find the right transformation that gives me the matrix R
EDIT:
The rows of the matrix M are not always sorted as in the example above. For another matrix M_2:
M_2 = [0 2 3 1 0 0; 0 0 3 6 3 9; 0 0 0 1 2 4; 0 0 0 0 2 6; 0 0 0 0 0 0]
the resulting matrix R_2 need to be the following:
R_2 = [0 0 0 0 0 0; 0 9 0 0 0 0; 1 3 4 0 0 0; 3 6 2 6 0 0; 2 3 1 2 0 0]
Again the visualization below:

EDIT:
Inspired by the tip from #Dan's comment, it can be further simplified to
R = reshape(rot90(M), size(M));
Original Answer:
This should be a simple way to do this
F = rot90(M);
R = F(reshape(1:numel(M), size(M)))
which returns
R =
0 0 0 0 0 0
0 2 0 0 0 0
2 3 1 0 0 0
2 6 2 1 0 0
3 8 3 2 0 0
The idea is that when you rotate the matrix you get
>> F = rot90(M)
F =
0 2 1 1 0
0 3 2 2 0
2 6 3 0 0
2 8 0 0 0
3 0 0 0 0
0 0 0 0 0
which is a 6 by 5 matrix. If you consider the linear indexing over F the corresponding indices are
>> reshape(1:30, size(F))
1 7 13 19 25
2 8 14 20 26
3 9 15 21 27
4 10 16 22 28
5 11 17 23 29
6 12 18 24 30
where elements 6, 11, 12, 16, 17, 18 , and ... are zero now if you reshape this to a 5 by 6 matrix you get
>> reshape(1:30, size(M))
1 6 11 16 21 26
2 7 12 17 22 27
3 8 13 18 23 28
4 9 14 19 24 29
5 10 15 20 25 30
Now those elements corresponding to zero values are on top, exactly what we wanted. So by passing this indexing array to F we get the desired R.

Without relying on order (just rotating the colored strips and pushing them to the bottom).
First solution: note that it doesn't work if there are zeros between the "data" values (for example, if M(1,3) is 0 in the example given). If there may be zeros please see second solution below:
[nRows nCols]= size(M);
R = [flipud(M(:,2:nCols).') zeros(nRows,1)];
[~, rowSubIndex] = sort(~~R);
index = sub2ind([nRows nCols],rowSubIndex,repmat(1:nCols,nRows,1));
R = R(index);
Second solution: works even if there are zeros within the data:
[nRows nCols]= size(M);
S = [flipud(M(:,2:nCols).') zeros(nRows,1)];
mask = 1 + fliplr(tril(NaN*ones(nRows, nCols)));
S = S .* mask;
[~, rowSubIndex] = sort(~isnan(S));
index = sub2ind([nRows nCols],rowSubIndex,repmat(1:nCols,nRows,1));
R = S(index);
R(isnan(R)) = 0;

Alternate option, using loops:
[nRows nCols]= size(M);
R = zeros(nRows,nCols);
for n = 1:nRows
R((n+1):nCols,n)=fliplr(M(n,(n+1):nCols));
end

Split an array in MATLAB

I have an array of integer numbers, and I want to split this array where 0 comes and a function that give me points of split.
Example: Array : 0 0 0 1 2 4 5 6 6 0 0 0 0 0 22 4 5 6 6 0 0 0 4 4 0
The function must return these numbers:
[ 3 10 ;14 20 ;22 25 ]
These numbers are index of start and end of nonzero numbers.

Here's a simple vectorized solution using the functions DIFF and FIND:
>> array = [0 0 0 1 2 4 5 6 6 0 0 0 0 0 22 4 5 6 6 0 0 0 4 4 0]; %# Sample array
>> edgeArray = diff([0; (array(:) ~= 0); 0]);
>> indices = [find(edgeArray > 0)-1 find(edgeArray < 0)]
indices =
3 10
14 20
22 25
The above code works by first creating a column array with ones indicating non-zero elements, padding this array with zeroes (in case any of the non-zero spans extend to the array edges), and taking the element-wise differences. This gives a vector edgeArray with 1 indicating the start of a non-zero span and -1 indicating the end of a non-zero span. Then the function FIND is used to get the indices of the starts and ends.
One side note/nitpick: these aren't the indices of the starts and ends of the non-zero spans like you say. They are technically the indices just before the starts and just after the ends of the non-zero spans. You may actually want the following instead:
>> indices = [find(edgeArray > 0) find(edgeArray < 0)-1]
indices =
4 9
15 19
23 24

Try this
a = [0 0 0 1 2 4 5 6 6 0 0 0 0 0 22 4 5 6 6 0 0 0 4 4 0];
%#Places where value was zero and then became non-zero
logicalOn = a(1:end-1)==0 & a(2:end)~=0;
%#Places where value was non-zero and then became zero
logicalOff = a(1:end-1)~=0 & a(2:end)==0;
%#Build a matrix to store the results
M = zeros(sum(logicalOn),2);
%#Indices where value was zero and then became non-zero
[~,indOn] = find(logicalOn);
%#Indices where value was non-zero and then became zero
[~,indOff] = find(logicalOff);
%#We're looking for the zero AFTER the transition happened
indOff = indOff + 1;
%#Fill the matrix with results
M(:,1) = indOn(:);
M(:,2) = indOff(:);
%#Display result
disp(M);

On the theme, but with a slight variation:
>>> a= [0 0 0 1 2 4 5 6 6 0 0 0 0 0 22 4 5 6 6 0 0 0 4 4 0];
>>> adjust= [0 1]';
>>> tmp= reshape(find([0 diff(a== 0)])', 2, [])
tmp =
4 15 23
10 20 25
>>> indices= (tmp- repmat(adjust, 1, size(tmp, 2)))'
indices =
4 9
15 19
23 24
As gnovice already pointed out on the positional semantics related to indices, I'll just add that, with this solution, various schemes can be handled very straightforward manner, when calculating indices. Thus, for your request:
>>> adjust= [1 0]';
>>> tmp= reshape(find([0 diff(a== 0)])', 2, []);
>>> indices= (tmp- repmat(adjust, 1, size(tmp, 2)))'
indices =
3 10
14 20
22 25

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

finding indeces of similar group elements - matlab

Related

How to normalize matrix setting 0 for minimum values and 1 for maximum values?

find longest sequence of non nan values but allow for threshold

Matlab:How to find the indices of rows without any zero in a matrix?

Reshape / Transform an upper triangular matrix in MATLAB

Split an array in MATLAB

Categories

Resources