Count length and frequency of island of consecutive numbers

Count length and frequency of island of consecutive numbers - matlab

I have a sequence of ones and zeros and I would like to count how often islands of consecutive ones appear.
Given:
S = [1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1 0 1]
By counting the islands of consecutive ones I mean this:
R = [4 3 1]
…because there are four single ones, three double ones and a single triplet of ones.
So that when multiplied by the length of the islands [1 2 3].
[4 3 1] * [1 2 3]’ = 13
Which corresponds to sum(S), because there are thirteen ones.
I hope to vectorize the solution rather than loop something.
I came up with something like:
R = histcounts(diff( [0 (find( ~ (S > 0) ) ) numel(S)+1] ))
But the result does not make much sense. It counts too many triplets.
All pieces of code I find on the internet revolve around diff([0 something numel(S)]) but the questions are always slightly different and don’t really help me
Thankful for any advice!

The following should do it. Hopefully the comments are clear.
S = [1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1 0 1];
% use diff to find the rising and falling edges, padding the start and end with 0
edges = diff([0,S,0]);
% get a list of the rising edges
rising = find(edges==1);
% and falling edges
falling = find(edges==-1);
% and thereby get the lengths of all the runs
SRuns = falling - rising;
% The longest run
maxRun = max(SRuns);
% Finally make a histogram, putting the bin centres
R = hist(SRuns,1:maxRun);

You could also obtain the same result with:
x = find(S==1)-(1:sum(S)) %give a specific value to each group of 1
h = histc(x,x) %compute the length of each group, you can also use histc(x,unique(x))
r = histc(h,1:max(h)) %count the occurence of each length
Result:
r =
4,3,1

Related

Replace repeated value based on sequence size - Matlab

I have a 2D matrix composed of ones and zeros.
mat = [0 0 0 0 1 1 1 0 0
1 1 1 1 1 0 0 1 0
0 0 1 0 1 1 0 0 1];
I need to find all consecutive repetitions of ones in each row and replace all ones with zeros only when the sequence size is smaller than 5 (5 consecutive ones):
mat = [0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0];
Any suggestion on how to approach this problem would be very welcome.

You can use diff to find the start and end points of the runs of 1, and some logic based on that to zero out the runs which are too short. Please see the below code with associated comments
% Input matrix of 0s and 1s
mat = [0 0 0 0 1 1 1 0 0
1 1 1 1 1 0 0 1 0
0 0 1 0 1 1 0 0 1];
% Minimum run length of 1s to keep
N = 5;
% Get the start and end points of the runs of 1. Add in values from the
% original matrix to ensure that start and end points are always paired
d = [mat(:,1),diff(mat,1,2),-mat(:,end)];
% Find those start and end points. Use the transpose during the find to
% flip rows/cols and search row-wise relative to input matrix.
[cs,r] = find(d.'>0.5); % Start points
[ce,~] = find(d.'<-0.5); % End points
c = [cs, ce]; % Column number array for start/end
idx = diff(c,1,2) < N; % From column number, check run length vs N
% Loop over the runs which didn't satisfy the threshold and zero them
for ii = find(idx.')
mat(r(ii),c(ii,1):c(ii,2)-1) = 0;
end
If you want to throw legibility out of the window, this can be condensed for a slightly faster and denser version, based on the exact same logic:
[c,r] = find([mat(:,1),diff(mat,1,2),-mat(:,end)].'); % find run start/end points
for ii = 1:2:numel(c) % Loop over runs
if c(ii+1)-c(ii) < N % Check if run exceeds threshold length
mat(r(ii),c(ii):c(ii+1)-1) = 0; % Zero the run if not
end
end

The vectorized solution by #Wolfie is nice and concise, but a bit hard to understand and far from the wording of the problem. Here is a direct translation of the problem using loops. It has the advantage of being easier to understand and is slightly faster with less memory allocations, which means it will work for huge inputs.
[m,n] = size(mat);
for i = 1:m
j = 1;
while j <= n
seqSum = 1;
if mat(i,j) == 1
for k = j+1:n
if mat(i,k) == 1
seqSum = seqSum + 1;
else
break
end
end
if seqSum < 5
mat(i,j:j+seqSum-1) = 0;
end
end
j = j + seqSum;
end
end

Count the number of the first zero elements

I would line to find the number of the first consecutive zero elements. For example in [0 0 1 -5 3 0] we have two zero consecutive elements that appear first in the vector.
could you please suggest a way without using for loops?

V=[0 0 1 -5 3 0] ;
k=find(V);
Number_of_first_zeros=k(1)-1;
Or,
Number_of_first_zeros=find(V,1,'first')-1;
To solve #The minion comment (if that was the purpose):
Number_of_first_zeros=find(V(find(~V,1,'first'):end),1,'first')-find(~V,1,'first');

Use a logical array to find the zeros and then look at where the zeros and ones are alternating.
V=[1 2 0 0 0 3 5123];
diff(V==0)
ans =
0 1 0 0 -1 0
Create sample data
V=[1 2 0 0 0 3 5123];
Find the zeros. The result will be a logical array where 1 represents the location of the zeros
D=V==0
D =
0 0 1 1 1 0 0
Take the difference of that array. 1 would then represent the start and -1 would represent the end.
T= diff(D)
ans =
0 1 0 0 -1 0
find(T==1) would give you the start and find(T==-1) would give you the end. The first index+1 of T==1 would be the start of the first set of zeros and the first index of T==-1 would be the end of the first set of zeros.

You could find position the first nonzero element using find.
I=find(A, 1);
The number of leading zeros is then I-1.

My solution is quite complex yet it doesn't use the loops and it does the trick. I am pretty sure, that there is a more direct approach.
Just in case no one else posts a working solution here my idea.
x=[1 2 4 0 20 0 10 1 23 45];
x1=find(x==0);
if numel(x1)>1
x2=[x1(2:end), 0];
x3=x2-x1;
y=find(x3~=1);
y(1)
elseif numel(x1)==1
display(1)
else
display('No zero found')
end
x is the dataset. x1 contains the index of all zero elements. x2 contains all those indices except the first one (because matrix dimensions must agree, one zero is added. x3 is the difference between the index and the previous index of zeros in your dataset. Now I find all those differences which are not 1 (do not correspond to sequences of zeros) and the first index (of this data is the required result. The if case is needed in case you have only one or no zero at all.

I'm assuming your question is the following: for the following vector [0 0 1 -5 3 0], I would like to find the index of the first element of a pair of 0 values. Is this correct? Therefore, the desired output for your vector would be '1'?
To extend the other answers to find any such pairs, not just 0 0 (eg. 0 1, 0 2, 3 4 etc), then this might help.
% define the pattern
ptrn = [ 0 0 ];
difference = ptrn(2) - ptrn(1)
V = [0 0 1 -5 3 0 0 2 3 4 0 0 1 0 0 0]
x = diff(V) == difference
indices = find(x)
indices =
1 6 11 14 15

How to replace non-zero elements randomly with zero?

I have a matrix including 1 and 0 elements like below which is used as a network adjacency matrix.
A =
0 1 1 1
1 1 0 1
1 1 0 1
1 1 1 0
I want to simulate an attack on the network, so I must replace some specific percent of 1 elements randomly with 0. How can I do this in MATLAB?
I know how to replace a percentage of elements randomly with zeros, but I must be sure that the element that is replaced randomly, is one of the 1 elements of matrix not zeros.

If you want to change each 1 with a certain probability:
p = 0.1%; % desired probability of change
A_ones = find(A); % linear index of ones in A
A_ones_change = A_ones(rand(size(A_ones))<=p); % entries to be changed
A(A_ones_change) = 0; % apply changes in those entries
If you want to randomly change a fixed fraction of the 1 entries:
f = 0.1; % desired fraction
A_ones = find(A);
n = round(f*length(A_ones));
A_ones_change = randsample(A_ones,n);
A(A_ones_change) = 0;
Note that in this case the resulting fraction may be different to that intended, because of the need to round to an integer number of entries.

#horchler's point is a good one. However, if we keep it simple, then you can just multiple your input matrix to a mask matrix.
>> a1=randint(5,5,[0 1]) #before replacing 1->0
a1 =
1 1 1 0 1
0 1 1 1 0
0 1 0 0 1
0 0 1 0 1
1 0 1 0 1
>> a2=random('unif',0,1,5,5) #Assuming frequency distribution is uniform ('unif')
a2 =
0.7889 0.3200 0.2679 0.8392 0.6299
0.4387 0.9601 0.4399 0.6288 0.3705
0.4983 0.7266 0.9334 0.1338 0.5751
0.2140 0.4120 0.6833 0.2071 0.4514
0.6435 0.7446 0.2126 0.6072 0.0439
>> a1.*(a2>0.1) #And the replacement prob. is 0.1
ans =
1 1 1 0 1
0 1 1 1 0
0 1 0 0 1
0 0 1 0 1
1 0 1 0 0
And other trick can be added to the mask matrix (a2). Such as a different freq. distribution, or a structure (e.g. once a cell is replaced, the adjacent cells become less likely to be replaced and so on.)
Cheers.

The function find is your friend:
indices = find(A);
This will return an array of the indices of 1 elements in your matrix A and you can use your method of replacing a percent of elements with zero on a subset of this array. Then,
A(subsetIndices) = 0;
will replace the remaining indices of A with zero.

Effiicient ways to count a streak of consecutive integers in MATLAB

Say I have a vector containing only logical values, such as
V = [1 0 1 0 1 1 1 1 0 0]
I would like to write a function in MATLAB which returns a 'streak' vector S for V, where S(i) represents the number of consecutive 1s in V up to but not including V(i). For the example above, the streak vector would be
S = [0 1 0 1 0 1 2 3 4 0]
Given that I have to do this for a very large matrix, I would very much appreciate any solution that is vectorized / efficient.

This should do the trick:
S = zeros(size(V));
for i=2:length(V)
if(V(i-1)==1)
S(i) = 1 + S(i-1);
end
end
The complexity is only O(n), which I guess should be good enough.
For your sample input:
V = [1 0 1 0 1 1 1 1 0 0];
S = zeros(size(V));
for i=2:length(V)
if(V(i-1)==1)
S(i) = 1 + S(i-1);
end
end
display(V);
display(S);
The result would be:
V =
1 0 1 0 1 1 1 1 0 0
S =
0 1 0 1 0 1 2 3 4 0

You could also do it completely vectorized with a couple intermediate steps:
V = [1 0 1 0 1 1 1 1 0 0];
Sall = cumsum(V);
stopidx = find(diff(V)==-1)+1;
V2=V;
V2(stopidx) = -Sall(stopidx)+[0 Sall(stopidx(1:end-1))];
S2 = cumsum(V2);
S = [0 S2(1:end-1)];
Afaik the only thing that can take a while is the find call; you can't use logical indexing everywhere and bypass the find call, because you need the absolute indices.

It's outside the box - but have you considered using text functions? Since strings are just vectors for Matlab it should be easy to use them.
Regexp contains some nice functions for finding repeated values.

find non-overlapping sequences of zeros in matlab arrays

This is related to:
Finding islands of zeros in a sequence.
However, the problem is not exactly the same:
Let's take the same vector with the above postfor the purpose of comparison:
sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
What I am trying to find are the starting indices of islands of n consecutive zeros; however, overlapping is not allowed. For example for n=2, I want the result:
v=[3, 5, 14, 25];
I found the solution of Amro brilliant as a starting point (especially with regards to strfind), but the second part of his answer does not give me the result that I expect. This is a non-vectorized solution that I have so far:
function v=findIslands(sig, n)
% Finds indices of unique islands
% sig --> target vector
% n --> This is the length of the island
% This will find the starting indices for all "islands" of ones
% but it marks long strings multiple times
startIndex = strfind(sig, zeros(1,n));
L=length(startIndex);
% ongoing gap counter
spc=0;
if L>0 % Check if empty
v=startIndex(1);
for i=2:L
% Count the distance
spc=spc+(startIndex(i)-startIndex(i-1));
if spc>=n
v=[v,startIndex(i)];
% Reset odometer
spc=0;
end
end
else
v=[];
display('No Islands Found!')
end
I was wondering if someone has a faster vectorized solution to the above problem.

You can convert everything into strings and use regular expressions:
regexp(sprintf('%d', sig(:)), sprintf('%d', zeros(n, 1)))
Example
>> sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];
>> n = 2;
>> regexp(sprintf('%d', sig(:)), sprintf('%d', zeros(n, 1)))
ans =
3 5 14 25

Do this:
As an example let's look at the case where the run length you want is 2.
Convert vector to binary number
Set index = size-1, set starting = []
Loop until n < 4:
Is n divisible by 4?
Yes? Append index to starting. Set n = n / 4
No? Set n = n / 2
Goto 3
For any other run length replace 4 with 2**run.

Use gnovice's answer from the same linked question. It's vectorized, and the runs where duration == n are the ones you want.
https://stackoverflow.com/a/3274416/105904
Take the runs with duration >= n, and then divide duration by n, and that'll tell you how many consecutive runs you have at each position and how to expand the index list. This could end up faster than the regexp version, if your island density isn't too high.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Count length and frequency of island of consecutive numbers - matlab

You could also obtain the same result with: x = find(S==1)-(1:sum(S)) %give a specific value to each group of 1 h = histc(x,x) %compute the length of each group, you can also use histc(x,unique(x)) r = histc(h,1:max(h)) %count the occurence of each length Result: r = 4,3,1

Related

Replace repeated value based on sequence size - Matlab

Count the number of the first zero elements

How to replace non-zero elements randomly with zero?

Effiicient ways to count a streak of consecutive integers in MATLAB

find non-overlapping sequences of zeros in matlab arrays

Categories

Resources