Count number of bouts separated by zeros - matlab

I have a vector like this:
A = [1 2 1 1 1 4 5 0 0 1 2 0 2 3 2 2 2 0 0 0 0 33]
I would like to count how many GROUPS of non zero elements it contains and save them.
so I want to isolate:
[1 2 1 1 1 4 5]
[1 2]
[2 3 2 2 2]
[33]
and then count the groups (they should be 4) :)
Can you help me please?
Thanks

To count your groups, a fast vectorized method using logical indexing is:
count = sum(diff([A 0]==0)==1)
This assumes that A is a row vector as in your example. This works with no zeros, all zeros, the empty vector, and several other test cases I tried.
To obtain your groups of values themselves, you can use a variation to my answer to a similar question:
a0 = (A~=0);
d = diff(a0);
start = find([a0(1) d]==1) % Start index of each group
len = find([d -a0(end)]==-1)-start+1 % Length, number of indexes in each group
In your case it might make sense to replace len with
finish = find([d -a0(end)]==-1) % Last index of each group
The length of start, len, and finish should be the same as the value of count so you could just use this if you need to do the breaking up. You can then use start and len (or finish) to store your groups in a cell array or struct or some other ragged array. For example:
count = length(start);
B = cell(count,1);
for i = 1:count
B{i} = A(start(i):finish(i));
end

Related

MATLAB: How to count number frequency columnwise, in continuous blocks

I have a matrix a in Matlab that looks like the following:
a = zeros(10,3);
a(3:6,1)=2; a(5:9,3)=1; a(5:7,2)=3; a(8:10,1)=2;
a =
0 0 0
0 0 0
2 0 0
2 0 0
2 3 1
2 3 1
0 3 1
2 0 1
2 0 1
2 0 0
I would like to obtain a cell array with the number of times that each number appears in a column. Also, it should be ordered depending on the element value, regardless of the column number. In the example above I would like to obtain the cell:
b = {[5],[4,3],[3]}
Because the number 1 appears once for 5 times, the number 2 twice in blocks of 4 and 3, and the number 3 once for 3 times. As you can see the recurrences are ordered according to the element value and not to the number of the column where the elements appear.
Since you're not concerned with the column, you can string all the columns into a single column vector, padding with zeroes on either end to prevent spans at the start and end of columns from running together:
v = reshape(padarray(a, [1 0]), [], 1);
% Or if you don't have the Image Processing Toolbox function padarray...
v = reshape([zeros(1, size(a, 2)); a; zeros(1, size(a, 2))], [], 1);
Now, assuming spans are always separated by 1 or more zeroes, you can find the length of each span as follows:
endPoints = find(diff(v) ~= 0); % Find where transitions to or from 0 occur
spans = endPoints(2:2:end)-endPoints(1:2:end); % Index of transitions to 0 minus
% index of transitions from 0
And finally you can accumulate the spans based on the value present in those spans:
b = accumarray(v(endPoints(1:2:end)+1), spans, [], #(v) {v(:).'}).';
And for your example:
b =
1×3 cell array
[5] [1×2 double] [3]
Note:
The ordering of values in the resulting cell array is not guaranteed to match the order in spans (i.e. b{2} above is [3 4] instead of [4 3]). If order matters, you'll need to sort your subscripts as per this section of the documentation. Here's how you would change the computation of b:
[vals, index] = sort(v(endPoints(1:2:end)+1));
b = accumarray(vals, spans(index), [], #(v) {v(:).'}).';
The hard part is finding and separating the blocks. diff will find the starting point of any run of numbers, which is the starting point for this solution:
b = [zeros(1,size(a,2)); a; zeros(1,size(a,2))];
idx = diff(b)~=0;
block_values = b(idx);
block_lengths = diff([0; find(idx)]);
Now we have two vectors of the values of each block, and how long they are, and they just need to be captured in the cell array, ignoring the zero blocks
c = accumarray(block_values(block_values~=0), block_lengths(block_values~=0), [], #(x) {x}).';
b = {}
for i = 1:ncolumns
for n = 1:nnumbers
b{i}(n) = sum(a(:,i) == n)
end
end
(Note that this places zeros for numbers at which the count is 0, but otherwise I don't see how else you would be able to recognize which value is being counted)

Comparing Vectors of Different Length

I am trying to compare two vectors of different size. For instance when I run the code below:
A = [1 4 3 7 9];
B = [1 2 3 4 5 6 7 8 9];
myPadded = [A zeros(1,4)];
C = ismember(myPadded,B)
I get the following output:
C = 1 1 1 1 1 0 0 0 0
However, I want an output that will reflect the positions of the compared values, hence, I would like an output that is displayed as follows:
C = 1 0 1 1 0 0 1 0 1
Please, I need some help :)
There are 2 points. First, you are writing the inputs of ismember in the wrong order. Additionally, you do not need to grow your matrix. Simply try ismember(B, A) and you will get what you expect.
The function ismember(myPadded, B) returns a vector the same size of myPadded, indicating if the i-th element of myPadded is present in B.
To get what you want, just invert parameter order: ismember(B, myPadded).
A quick way of doing this is to use logical indexing. This will only work if the last digit of B is included in A.
A = [1 4 3 7 9];
c(A) = 1; % or true.
An assumption here is that you want to subindex a vector 1:N, so that B always is B = 1:N. In case the last digit is not one this is easy to fix. Just remember to return all to its previous state after you are done. It will be 2 rows extra though.
This solution is meant as a special case working on a very common problem.

MATLAB: Creating a matrix with all possible group combinations

I'm running an experiment with lots of conditions, and particular numbers of groups in each condition.
A. 3 groups
B. 3 groups
C. 2 groups
D. 3 groups
E. 3 groups
I've worked out that there are 3×3×2×3×3 = 162 possible combinations of groups.
I want to create a MATLAB matrix with 162 rows and 5 columns. That is, one row for each combination and one column to indicate the value for each group.
So, for instance, the first row would be [1 1 1 1 1], indicating that this combination is group 1 for all conditions. The second row would be [1 1 1 1 2], indicating that it's group 1 for all conditions except for the last which is group 2. The 162nd and final row would be [3 3 2 3 3].
M = 1 1 1 1 1
1 1 1 1 2
.........
3 3 2 3 3
What's the most efficient way to achieve this? I realise I could use a loop, but feel sure there's a better way. I thought maybe the perms function would work but I can't see how.
You can use combvec (see last line, the rest is only generating test data):
% A. 3 groups
% B. 3 groups
% C. 2 groups
% D. 3 groups
% E. 3 groups
ngroups = zeros(5, 1);
ngroups(1) = 3;
ngroups(2) = 3;
ngroups(3) = 2;
ngroups(4) = 3;
ngroups(5) = 3;
v = {};
for i = 1:length(ngroups)
v{i} = 1:ngroups(i) % generate a vector of valid group indices
end
% get all possible combinations
x = combvec( v{:} )
As this will return a 5 x 162 double you need to transpose the resulting matrix x:
x.'

Filtering sequences in MATLAB

Is it possible to do something like regular expressions with MATLAB to filter things out? Basically I'm looking for something that will let me take a vector like:
[1 2 1 1 1 2 1 3 3 3 3 1 1 4 4 4 1 1]
and will return:
[3 3 3 3 4 4 4]
These are the uninterrupted sequences (where there's no interspersion).
Is this possible?
Using regular expressions
Use MATLAB's built-in regexp function for regular expression matching. However, you have to convert the input array to a string first, and only then feed it to regexp:
C = regexp(sprintf('%d ', x), '(.+ )(\1)+', 'match')
Note that I separated the values with spaces so that regexp can match multiple digit numbers as well. Then convert the result back to a numerical array:
res = str2num([C{:}])
The dot (.) in the pattern string represents any character. To find sequences of certain digits only, specify them in brackets ([]). For instance, the pattern to find only sequences of 3 and 4 would be:
([34]+ )(\1)+
A simpler approach
You can filter out successively repeating values by checking the similarity between adjacent elements using diff:
res = x((diff([NaN; x(:)])' == 0) | (diff([x(:); NaN])' == 0))
Optionally, you can keep only certain values from the result, for example:
res(res == 3 | res == 4)
You can do it like this:
v=[1 2 1 1 1 2 1 3 3 3 3 1 1 4 4 4 1 1];
vals=unique(v); % find all unique values in the vector
mask=[]; % mask of values to retain
for i=1:length(vals)
indices=find(v==vals(i)); % find indices of each unique value
% if the maximum difference between indices containing
% a given value is 1, it is contiguous
% --> add this value to the mask
if max(indices(2:end)-indices(1:end-1))==1
mask=[mask vals(i)];
end
end
% filter out what's necessary
vproc=v(ismember(v,mask))
Result:
vproc =
3 3 3 3 4 4 4
This can be another approach, although a little bit too elaborated.
If you see the plot of your array, you want to retain its level sets (i.e. a == const) which are topologically connected (i.e. made by one piece).
Coherently, such level sets are exactly the ones corresponding to a==3 and a==4.
Here is a possible implementation
a = [1 2 1 1 1 2 1 3 3 3 3 1 1 4 4 4 1 1]
r = []; % result
b = a; % b will contain the union of the level sets not parsed yet
while ~isempty(b)
m = a == b(1); % m is the current level set
connected = sum(diff([0 m 0]).^2) == 2; % a condition for being a connected set:
% the derivative must have 1 positive and 1
% negative jump
if connected == true % if the level set is connected we add it to the result
r = [r a(m)];
end
b = b(b~=b(1));
end
If you try something like
a = [1 2 1 1 1 2 1 3 3 3 3 1 1 4 4 4 1 1] %initial vector
b = a>=3 %apply filter condition
a = a(b) %keep values that satisfy filter
a will output
a = [3 3 3 3 4 4 4]

MATLAB - Efficient methods for populating matrices using information in other (sparse) matrices?

Apologies for the awkward title, here is a more specific description of the problem. I have a large (e.g. 10^6 x 10^6) sparse symmetric matrix which defines bonds between nodes.
e.g. The matrix A = [0 1 0 0 0; 1 0 0 2 3; 0 0 0 4 0; 0 2 4 0 5; 0 3 0 5 0] would describe a 5-node system, such that nodes 1 and 2 are connected by bond number A(1,2) = 1, nodes 3 and 4 are connected by bond number A(3,4) = 4, etc.
I want to form two new matrices. The first, B, would list the nodes connected to each node (i.e. each row i of B has elements given by find(A(i,:)), and padded with zeros at the end if necessary) and the second, C, would list the bonds connected to that node (i.e. each row i of C has elements given by nonzeros(A(i,:)), again padded if necessary).
e.g. for the matrix A above, I would want to form B = [2 0 0; 1 4 5; 4 0 0; 2 3 5; 2 4 0] and C = [1 0 0; 1 2 3; 4 0 0; 2 4 5; 3 5 0]
The current code is:
B=zeros(length(A), max(sum(spones(A))))
C=zeros(length(A), max(sum(spones(A))))
for i=1:length(A)
B(i,1:length(find(A(i,:)))) = find(A(i,:));
C(i,1:length(nonzeros(A(i,:)))) = nonzeros(A(i,:));
end
which works, but is slow for large length(A). I have tried other formulations, but they all include for loops and don't give much improvement.
How do I do this without looping through the rows?
Hmm. Not sure how to vectorize (find returns linear indices when given a matrix, which is not what you want), but have you tried this:
B=zeros(length(A), 0);
C=zeros(length(A), 0);
for i=1:length(A)
Bi = find(A(i,:));
B(i,1:length(Bi)) = Bi;
Ci = nonzeros(A(i,:));
C(i,1:length(Ci)) = Ci;
end
I made two changes:
removed call to spones (seems unnecessary; the performance hit needed to expand the # of columns in B and C is probably minimal)
cached result of find() and nonzeros() so they're not called twice
I know it's hard to read, but that code is a vectorized version of your code:
[ i j k ] = find(A);
A2=(A~=0);
j2=nonzeros(cumsum(A2,2).*A2);
C2=accumarray([i,j2],k)
k2=nonzeros(bsxfun(#times,1:size(A,2),A2));
B2=accumarray([i,j2],k2);
Try it and tell me if it works for you.