matlab return every second occurrence of value in vector - matlab

I have a vector with ID numbers that repeat an even number of times. I am only interested in every second time each number appears. I want to create a boolean mask that gives a true/1 for each second occurance of a number. I have already done this with a loop, but the actual vector will contain millions of elements, so the loop is too slow. I need a "vectorized" solution.
Here is an example Vector:
101
102
103
101
104
102
101
103
101
104
This should output the following mask:
0 (first occurrence of 101)
0 (first occurrence of 102)
0 (first occurrence of 103)
1 (second occurrence of 101)
0 (first occurrence of 104)
1 (second occurrence of 102)
0 (third occurrence of 101)
1 (second occurrence of 103)
1 (fourth occurrence of 101)
1 (second occurrence of 104)

You can do this very easily with a combination of unique and accumarray. First assign each value a unique ID, then bin all array locations together that are part of the same ID. You'll need to sort them as accumarray doesn't guarantee an order when you are binning things together. The output of this will be a cell array where each cell gives you the array locations that occurred for a particular index.
Once you do this, extract out every second element from each cell generated from accumarray, then use these to set all of the corresponding locations in a mask to 1. You can use a combination of cellfun, which can be used to process each cell individually and extracting every second element to create a new cell array, and vertcat which can be used to stack all of the cell arrays together into one final index array. This index array can be used to accomplish setting the locations in your mask to true:
%// Your data
V = [101,102,103,101,104,102,101,103,101,104];
%// Get list of unique IDs
[~,~,id] = unique(V,'stable');
%// Bin all of the locations in V together that belong to the
%// same bin
out = accumarray(id, (1:numel(V)).',[], #(x) {sort(x)}); %'
%// Extract out every second value that is for each bin
out2 = cellfun(#(x) x(2:2:end), out, 'uni', 0);
%// Create a mask and set the corresponding locations to true
mask = false(numel(V), 1);
mask(vertcat(out2{:})) = 1;
We get:
>> mask
mask =
0
0
0
1
0
1
0
1
1
1

Let's bsxfun it for a vectorized solution -
%// Assuming A as the input vector
M = bsxfun(#eq,A(:),unique(A(:).')) %//'
out = any(M - mod(cumsum(M,1).*M,2),2)

Here is one approach:
A = [101,102,103,101,104,102,101,103,101,104];
IDs = unique(A); % Find all the IDs present
test = arrayfun(#(x) find(A==x), IDs, 'UniformOutput', false); % Per ID, find where A == ID
repeatidx = cellfun(#(x) x(2:2:length(x)), test, 'UniformOutput', false); % Dump out the second match
repeatidx = cell2mat(repeatidx); % Flatten the cell array
B = false(size(A)); % Intialize output boolean array
B(repeatidx) = true; % Set true values based on repeatidx
Which returns:
B =
0 0 0 1 0 1 0 1 1 1

Related

Any way for matlab to sum an array according to specified bins NOT by for iteration? Best if there is buildin function for this

For example, if
A = [7,8,1,1,2,2,2]; % the bins (or subscripts)
B = [2,1,1,1,1,1,2]; % the array
then the desired function "binsum" has two outputs, one is the bins, and the other is the sum. It is just adding values in B according to subscripts in A. For example, for 2, the sum is 1 + 1 + 2 = 4, for 1 it is 1 + 1 = 2.
[bins, sums] = binsum(A,B);
bins = [1,2,7,8]
sums = [2,4,2,1]
The elements in "bins" need not be ordered but must correspond to elements in "sums". This can surely be done by "for" iterations, but "for" iteration is not desired, because there is a performance concern. It is best if there is a build in function for this.
Thanks a lot!
This is another job for accumarray
A = [7,8,1,1,2,2,2]; % the bins (or subscripts)
B = [2,1,1,1,1,1,2]; % the array
sums = accumarray(A.', B.').';
bins = unique(A);
Results:
>> bins
bins =
1 2 7 8
sums =
2 4 0 0 0 0 2 1
The index in sums corresponds to the bin value, so sums(2) = 4. You can use nonzeros to remove the unused bins so that bins(n) corresponds to sums(n)
sums = nonzeros(sums).';
sums =
2 4 2 1
or, to generate this form of sums in one line:
sums = nonzeros(accumarray(A.', B.')).';
Another possibility is to use sparse and then find.
Assuming A contains positive integers,
[bins, ~, sums] = find(sparse(A, 1, B));
This works because sparse automatically adds values (third input) for matching positions (as defined by the first two inputs).
If A can contain arbitrary values, you also need a call to unique, and find can be replaced by nonzeros:
[bins, ~, labels]= unique(A);
sums = nonzeros(sparse(labels, 1, B));
Here is a solution using sort and cumsum:
[s,I]=sort(A);
c=cumsum(B(I));
k= [s(1:end-1)~=s(2:end) true];
sums = diff([0 c(k)])
bins = s(k)

Remove zeros column and rows from a matrix matlab

I would like to remove some columns and rows from a big matrix. Those are the columns and the rows which have all zeros values. Is there any function in MATLAB that can do it for you quite fast? My matrices are sparse. I am doing this way:
% To remove all zero columns from A
ind = find(sum(A,1)==0) ;
A(:,ind) = [] ;
% To remove all zeros rows from A
ind = find(sum(A,2)==0) ;
A(ind,:) = [] ;
It would be nice to have a line of code for this as I may do this kind of task repeatedly. Thanks
A single line of code would be:
A=A(any(A,2),any(A,1))
There is no need to use find like you did, you can directly index using logical vectors. The any function finds the rows or columns with any non-zero elements.
1 Dimension:
I'll first show a simpler example based on another duplicate question, asking to to remove only the rows containing zeros elements.
Given the matrix A=[1,2;0,0];
To remove the rows of 0, you can:
sum the absolute value of each rows (to avoid having a zero sum from a mix of negative and positive numbers), which gives you a column vector of the row sums.
keep the index of each line where the sum is non-zero.
in code:
A=[1,2;0,0];
% sum each row of the matrix, then find rows with non-zero sum
idx_nonzerolines = sum(abs(A),2)>0 ;
% Create matrix B containing only the non-zero lines of A
B = A(idx_nonzerolines,:) ;
will output:
>> idx_nonzerolines = sum(abs(A),2)>0
idx_nonzerolines =
1
0
>> B = A(idx_nonzerolines,:)
B =
1 2
2 Dimensions:
The same method can be used for 2 dimensions:
A=[ 1,2,0,4;
0,0,0,0;
1,3,0,5];
idx2keep_columns = sum(abs(A),1)>0 ;
idx2keep_rows = sum(abs(A),2)>0 ;
B = A(idx2keep_rows,idx2keep_columns) ;
outputs:
>> B = A(idx2keep_rows,idx2keep_columns)
B =
1 2 4
1 3 5
Thanks to #Adriaan in comments for spotting the edge case ;)

Condition execute for different columns in each row

A = [0,0,1,0,1,0,1,0,0,0;
0,0,0,0,1,0,1,0,0,0;
0,0,1,0,1,0,1,0,0,0];
B = [2,5;
1,6;
3,10];
Expected Output Cell Array:
C = [1,1,1,1; %// 2-3-4, 3-4-5, 4-5, 5
0,0,1,1,1,0; %// 1-2-3, 2-3-4, 3-4-5, 4-5-6, 5-6, 6
1,1,1,1,1,0,0,0]; %// 3-4-5, 4-5-6, 5-6-7, 7-8-9, 8-9-10, 9-10, 10
Matrix B includes which columns should be used to execute the condition on Matrix A. For example, first row of B is 2 and 5; so elements between 2nd 5th column of matrix A should be used to execute the condition. Second row of B is 1 and 6; so elements between 1st 6th column should be used to execute the condition. And so on...
The condition: if sum of successive 3 elements is bigger than or equal to 1 then write 1 to matrix C; otherwise write 0. For example, A includes 0,1,0 as three successive elements (sum is 0+1+0=1), so write 1 to matrix C. Another example, first three elements of A in second row are 0,0,0 (sum is 0), so write 0 to matrix C. And so on...
"Sometimes it can be considered only 1 or 2 successive elements."
For example, condition execution of first row of A ends with 5th column, so only value 5th column should be considered; which is 1. So 1 is written to matrix C.
Explaining the first row of C:
1, since (sum of 2,3,4 elements of A(1,:)) >= 1
1, since (sum of 3,4,5 elements of A(1,:)) >= 1
since max limit is 5, only 2 successive elements are taken here
1, since (sum of 4,5 elements alone of A(1,:)) >= 1
since max limit is 5, only 1 successive element is taken here
1, since (sum of 5th element alone of A(1,:)) >= 1
Without for loop, only with matrix operations, how can I do this complex task? or any trick?
Using mat2cell, cellfun, im2col and any
subMatLen = 3;
%// Converting both A & B matrix to Cell Arrays to perform operations Row-wise
AC = mat2cell(A,ones(1,size(A,1)),size(A,2));
BC = mat2cell(B,ones(1,size(B,1)),size(B,2));
%// Getting only the columns of each rows within the limits specified by Matrix B
%// Also appended with zeros for my own convenience as it wont affect the 'summing' process
out = cellfun(#(x,y) [x(y(1):y(2)),zeros(1,subMatLen-1)],AC, BC, 'uni', 0);
%// Finally taking each 1x3 sliding sub-matrix and returning 1 if `any` of it is non-zero
%// which is equivalent to summing and checking whether they are >= 1
out = cellfun(#(x) any(im2col(x, [1,subMatLen], 'sliding')), out, 'uni', 0);
Your Sample Input:
A = [0,0,1,0,1,0,1,0,0,0;
0,0,0,0,1,0,1,0,0,0;
0,0,1,0,1,0,1,0,0,0];
B = [2,5;
1,6;
3,10];
Output:
>> celldisp(out)
out{1} =
1 1 1 1
out{2} =
0 0 1 1 1 0
out{3} =
1 1 1 1 1 0 0 0
If you want them as a single row or column matrix, you could add this to the bottom of the code:
out = cat(2,out{:})
or
out = (cat(2,out{:})).'

How to compare columns of a binary matrix and compare elements in matlab?

i have [sentences*words] matrix as shown below
out = 0 1 1 0 1
1 1 0 0 1
1 0 1 1 0
0 0 0 1 0
i want to process this matrix in a way that should tell W1 & W2 in "sentence number 2" and "sentence number 4" occurs with same value i.e 1 1 and 0 0.the output should be as follows:
output{1,2}= 2 4
output{1,2} tells word number 1 and 2 occurs in sentence number 2 and 4 with same values.
after comparing W1 & W2 next candidate should be W1 & W3 which occurs with same value in sentence 3 & sentence 4
output{1,3}= 3 4
and so on till every nth word is compared with every other words and saved.
This would be one vectorized approach -
%// Get number of columns in input array for later usage
N = size(out,2);
%// Get indices for pairwise combinations between columns of input array
[idx2,idx1] = find(bsxfun(#gt,[1:N]',[1:N])); %//'
%// Get indices for matches between out1 and out2. The row indices would
%// represent the occurance values for the final output and columns for the
%// indices of the final output.
[R,C] = find(out(:,idx1) == out(:,idx2))
%// Form cells off each unique C (these will be final output values)
output_vals = accumarray(C(:),R(:),[],#(x) {x})
%// Setup output cell array
output = cell(N,N)
%// Indices for places in output cell array where occurance values are to be put
all_idx = sub2ind(size(output),idx1,idx2)
%// Finally store the output values at appropriate indices
output(all_idx(1:max(C))) = output_vals
You can get a logical matrix of size #words-by-#words-by-#sentences easily using bsxfun:
coc = bsxfun( #eq, permute( out, [3 2 1]), permute( out, [2 3 1] ) );
this logical array is occ( wi, wj, si ) is true iff word wi and word wj occur in sentence si with the same value.
To get the output cell array from coc you need
nw = size( out, 2 ); %// number of words
output = cell(nw,nw);
for wi = 1:(nw-1)
for wj = (wi+1):nw
output{wi,wj} = find( coc(wi,wj,:) );
output{wj,wi} = output{wi,wj}; %// you can force it to be symmetric if you want
end
end

Matlab: Getting Random values from each column w/o zeros

I have a 2d matrix as follows:
possibleDirections =
1 1 1 1 0
0 0 2 2 0
3 3 0 0 0
0 4 0 4 4
5 5 5 5 5
I need from every column to get a random number from the values that are non-zero in to a vector. The value 5 will always exist so there won't be any columns with all zeros.
Any ideas how this can be achieved with the use of operations on the vectors (w/o treating each column separately)?
An example result would be [1 1 1 1 5]
Thanks
You can do this without looping directly or via arrayfun.
[rowCount,colCount] = size(possibleDirections);
nonZeroCount = sum(possibleDirections ~= 0);
index = round(rand(1,colCount) .* nonZeroCount +0.5);
[nonZeroIndices,~] = find(possibleDirections);
index(2:end) = index(2:end) + cumsum(nonZeroCount(1:end-1));
result = possibleDirections(nonZeroIndices(index)+(0:rowCount:(rowCount*colCount-1))');
Alternative solution:
[r,c] = size(possibleDirections);
[notUsed, idx] = max(rand(r, c).*(possibleDirections>0), [], 1);
val = possibleDirections(idx+(0:c-1)*r);
If the elements in the matrix possibleDirections are always either zero or equal to the respective row number like in the example given in the question, the last line is not necessary as the solution would already be idx.
And a (rather funny) one-liner:
result = imag(max(1e05+rand(size(possibleDirections)).*(possibleDirections>0) + 1i*possibleDirections, [], 1));
Note, however, that this one-liner only works if the values in possibleDirections are much smaller than 1e5.
Try this code with two arrayfun calls:
nc = size(possibleDirections,2); %# number of columns
idx = possibleDirections ~=0; %# non-zero values
%# indices of non-zero values for each column (cell array)
tmp = arrayfun(#(x)find(idx(:,x)),1:nc,'UniformOutput',0);
s = sum(idx); %# number of non-zeros in each column
%# for each column get random index and extract the value
result = arrayfun(#(x) tmp{x}(randi(s(x),1)), 1:nc);