Given a vector A of group numbers (such as the one returned by findgroups), how to return a vector B of the same length containing indices of elements within groups of A?
For example, if A = [1 1 1 2 2 2 1 1 2 2] then B = [1 2 3 1 2 3 4 5 4 5].
Add 1
My own solution to this is
s = splitapply(#(x) {x, [1:numel(x)]'}, [1:numel(A)]', A(:))
B(vertcat(s{:,1})) = vertcat(s{:,2})
but it seems somewhat convoluted.
A solution using sort and accumarray:
[s is]=sort(A);
idx = accumarray(s(:),1,[],#(x){1:numel(x)});
B(is)=[idx{:}];
Another solution using the image processing toolbox:
p=regionprops(A,'PixelIdxList');
B = zeros(size(A));
for k = 1: numel(p)
B(p(k).PixelIdxList) = 1:numel(p(k).PixelIdxList);
end
Here's a solution that may not look as compact, but it's quite fast since it uses cumsum and indexing:
mA = max(A);
nA = numel(A);
ind = false(mA, nA);
ind(mA.*(0:(nA-1))+A) = true;
B = cumsum(ind, 2);
B = B(ind).';
And here are some timing results for the solutions thus far:
A = [1 1 1 2 2 2 1 1 2 2];
rahnema1: 6.51343e-05
Luis: 3.00891e-05
OmG: 2.36826e-05
gnovice: 4.93539e-06 % <---
A = randi(20, 1, 1000);
rahnema1: 0.000274138
Luis: 0.000257126
OmG: 0.000233348
gnovice: 9.95673e-05 % <---
A = randi(20, 1, 10000);
rahnema1: 0.00162955
Luis: 0.00163943
OmG: 0.00126571
gnovice: 0.00107134 % <---
My solution is the fastest for the above test cases (moderate size, moderate number of unique values). For larger cases, other solutions gain the edge. The solution from rahnema1 seems to do better as the number of unique values in A increases, whereas the basic for loop from OmG does better when the number of elements in A increases (with relatively fewer unique values):
>> A = randi(200, 1, 100000);
rahnema1: 0.0108024 % <---
Luis: 0.0931876
OmG: 0.0427542
gnovice: 0.0815516
>> A = randi(20, 1, 1000000);
rahnema1: 0.131256
Luis: 0.171415
OmG: 0.106548 % <---
gnovice: 0.124446
A solution can be using a loop to replace the values in A:
c = unique(A);
B= A;
for (idx = c)
f = find(A == idx);
B(f) = 1:length(f);
end
Here's a way that avoids loops:
s = bsxfun(#eq, A(:).', unique(A(:))); % Or, in recent versions, s = A==unique(A).';
t = cumsum(s,2);
B = reshape(t(s), size(A));
Related
How can I remove any number that has duplicate from an array.
for example:
b =[ 1 1 2 3 3 5 6]
becomes
b =[ 2 5 6]
Use unique function to extract unique values then compute histogram of data for unique values and preserve those that have counts of 1.
a =[ 1 1 2 3 3 5 6];
u = unique(a)
idx = hist(a, u) ==1;
b = u(idx)
result
2 5 6
for multi column input this can be done:
a = [1 2; 1 2;1 3;2 1; 1 3; 3 5 ; 3 6; 5 9; 6 10] ;
[u ,~, uid] = unique(a,'rows');
idx = hist(uid,1:size(u,1))==1;
b= u(idx,:)
You can first sort your elements and afterwards remove all elements which have the same value as one of its neighbors as follows:
A_sorted = sort(A); % sort elements
A_diff = diff(A_sorted)~=0; % check if element is the different from the next one
A_unique = [A_diff true] & [true A_diff]; % check if element is different from previous and next one
A = A_sorted(A_unique); % obtain the unique elements.
Benchmark
I will benchmark my solution with the other provided solutions, i.e.:
using diff (my solution)
using hist (rahnema1)
using sum (Jean Logeart)
using unique (my alternative solution)
I will use two cases:
small problem (yours): A = [1 1 2 3 3 5 6];
larger problem
rng('default');
A= round(rand(1, 1000) * 300);
Result:
Small Large Comments
----------------|------------|------------%----------------
using `diff` | 6.4080e-06 | 6.2228e-05 % Fastest method for large problems
using `unique` | 6.1228e-05 | 2.1923e-04 % Good performance
using `sum` | 5.4352e-06 | 0.0020 % Only fast for small problems, preserves the original order
using `hist` | 8.4408e-05 | 1.5691e-04 % Good performance
My solution (using diff) is the fastest method for somewhat larger problems. The solution of Jean Logeart using sum is faster for small problems, but the slowest method for larger problems, while mine is almost equally fast for the small problem.
Conclusion: In general, my proposed solution using diff is the fastest method.
timeit(#() usingDiff(A))
timeit(#() usingUnique(A))
timeit(#() usingSum(A))
timeit(#() usingHist(A))
function A = usingDiff (A)
A_sorted = sort(A);
A_unique = [diff(A_sorted)~=0 true] & [true diff(A_sorted)~=0];
A = A_sorted(A_unique);
end
function A = usingUnique (A)
[~, ia1] = unique(A, 'first');
[~, ia2] = unique(A, 'last');
A = A(ia1(ia1 == ia2));
end
function A = usingSum (A)
A = A(sum(A==A') == 1);
end
function A = usingHist (A)
u = unique(A);
A = u(hist(A, u) ==1);
end
I am trying to implement the paper detection of copy move forgery using histogram of oriented gradients.
The algorithm is:
Divide the image into overlapping blocks.
Calculate feature vectors for each block and store them in a matrix.
Sorting the matrix lexicographically
Using block matching to identify forged regions.
https://www.researchgate.net/publication/276518650_Detection_of_copy-move_image_forgery_using_histogram_of_orientated_gradients
I am stuck with the 3rd step and can't proceed.
The code I have implemented is:
clc;
clear all;
close all;
%read image
img = imread('006_F.png');
img=rgb2gray(img);
img=imresize(img, 1/4);
figure(1);
imshow(img);
b=16; %block size
nrc=5; %no. of rows to check
td=416; %threshold
[r, c]=size(img);%Rows and columns;
column=(r-b+1)*(c-b+1);
M= zeros(column,4);
Mi = zeros(1,2);
i=1;
disp('starting extraction of features');
for r1 = 1:r-b+1
for c1 = 1:c-b+1
% Extract each block
B = img(r1:r1+b-1,c1:c1+b-1);
features = extractHOGFeatures(B);%extracting features
M(i, :) = features;
Mi(i,:) = [r1 c1];
i=i+1;
end
end
[S, index] = sortrows(M , [ 1 2 3 4]);
P= zeros(1,6);
b2=r-b+1;
disp('Finding Duplicates');
for i = 1:column
iv = index(i);
xi=mod(iv,b2) + 1;
yi=ceil(iv/b2);
j = i+1;
while j < column && abs(i - j) < 5
jv=index(j);
xj=mod(jv,b2) + 1;
yj=ceil(jv/b2);
z=sqrt(power(xi-xj,2) + power(yi-yj,2));
% only process those whose size is above Nd
if z > 16
offset = [xi-xj yi-yj];
P = [P;[xi yi xj yj xi-xj yi-yj]];
end
j = j + 1;
end
end
rows = size(P,1);
P(:,6) = P(:,6) - min(P(:,6));
P(:,5) = P(:,5) - min(P(:,5));
maxValP = max(P(:,6)) + 1;
P(:,5) = maxValP .* P(:,5) + P(:,6);
mostfrequentval = mode(P(:,5));
disp('Creating Image');
idx = 2;
% Create a copy of the image and mask it
RI = img;
while idx < rows
x1 = P(idx,1);
y1 = P(idx,2);
x2 = P(idx,3);
y2 = P(idx,4);
if (P(idx,5) == mostfrequentval)
RI(y1:y1,x1:x1) = 0;
RI(y2:y2,x2:x2) = 0;
end
idx = idx + 1;
end;
After going through some references indicated in the paper you are working on (ref. [8] and [20]):
The lexicographic sorting is the equivalent of the alphabetical one, for numbers i.e., [1 1 1 1] < [1 1 2 1] < [2 3 4 5] < [2 4 4 5]
So, in your case, you case use the function sortrows() in the following way:
A = [1 1 1 1;1 1 1 2;1 1 1 4;1 2 2 2; 1 2 2 1; 1 4 6 3; 2 3 4 5; 2 3 6 6]; % sample matrix
[B,idx] = sortrows(A,[1 2 3 4]); % Explicit notation but it is the Matlab default setting so equivalent to sortrows(A)
It means: Sort the rows of A by first looking at the first column and, in case of equality, looking at the second one, and so on.
If your are looking for a reverse order, you specify '-' before the number of the column.
So in the end, your code is good and if the results are not as expected it has to come from another step of the implementation...
Edit: the parameter idx records the original index of the sorted rows.
I have A matrix which is 16x16x155460. I have a B vector which is 12955x1. I want to multiply each 1:16x1:16x1+12*n:12+12*nwith the elements of B(n). So my goal is to find the weighted sum of the A according to B. My way to do this as follows (I don't want to use for-loop and my method gives wrong answer, I could not obtain the 1:12 vectors which is consecutive) :
B = repmat(B,[1 16 16]);
B = permute(B,[2 3 1]);
B = repmat(B,[1 1 12]);
result = B.*(A);
As a small example n=2 :
A(:,:,1)=[1 2; 3 4]
A(:,:,2)=[1 2; 3 4]
A(:,:,3)=[1 2; 3 4]
A(:,:,4)=[1 2; 3 4]
B = [2,3]
Result would be:
result(:,:,1)=A(:,:,1)*B(1);
result(:,:,2)=A(:,:,2)*B(1);
result(:,:,3)=A(:,:,1)*B(2);
result(:,:,4)=A(:,:,2)*B(2);
If I understood the problem correctly, you can use the powerful trio of bsxfun, permute and reshape to solve it, like so -
[M,N,R] = size(A);
mult_out = bsxfun(#times,reshape(A,M,N,numel(B),[]),permute(B(:),[4 3 1 2]))
out = reshape(mult_out,M,N,[])
I have a vector a=[1 2 3 1 4 2 5]'
I am trying to create a new vector that would give for each row, the occurence number of the element in a. For instance, with this matrix, the result would be [1 1 1 2 1 2 1]': The fourth element is 2 because this is the first time that 1 is repeated.
The only way I can see to achieve that is by creating a zero vector whose number of rows would be the number of unique elements (here: c = [0 0 0 0 0] because I have 5 elements).
I also create a zero vector d of the same length as a. Then, going through the vector a, adding one to the row of c whose element we read and the corresponding number of c to the current row of d.
Can anyone think about something better?
This is a nice way of doing it
C=sum(triu(bsxfun(#eq,a,a.')))
My first suggestion was this, a not very nice for loop
for i=1:length(a)
F(i)=sum(a(1:i)==a(i));
end
This does what you want, without loops:
m = max(a);
aux = cumsum([ ones(1,m); bsxfun(#eq, a(:), 1:m) ]);
aux = (aux-1).*diff([ ones(1,m); aux ]);
result = sum(aux(2:end,:).');
My first thought:
M = cumsum(bsxfun(#eq,a,1:numel(a)));
v = M(sub2ind(size(M),1:numel(a),a'))
on a completely different level, you can look into tabulate to get info about the frequency of the values. For example:
tabulate([1 2 4 4 3 4])
Value Count Percent
1 1 16.67%
2 1 16.67%
3 1 16.67%
4 3 50.00%
Please note that the solutions proposed by David, chappjc and Luis Mendo are beautiful but cannot be used if the vector is big. In this case a couple of naïve approaches are:
% Big vector
a = randi(1e4, [1e5, 1]);
a1 = a;
a2 = a;
% Super-naive solution
tic
x = sort(a);
x = x([find(diff(x)); end]);
for hh = 1:size(x, 1)
inds = (a == x(hh));
a1(inds) = 1:sum(inds);
end
toc
% Other naive solution
tic
x = sort(a);
y(:, 1) = x([find(diff(x)); end]);
y(:, 2) = histc(x, y(:, 1));
for hh = 1:size(y, 1)
a2(a == y(hh, 1)) = 1:y(hh, 2);
end
toc
% The two solutions are of course equivalent:
all(a1(:) == a2(:))
Actually, now the question is: can we avoid the last loop? Maybe using arrayfun?
I know this is a simple question but difficult to formulate in one sentence to google the answer.So, I have a 3d matrix with size 2x2x3 like this
A(:,:,1) =[1 1; 1 1];
A(:,:,2) =[2 2; 2 2];
A(:,:,3) =[4 4; 4 4];
and matrix B with size 2x2
B = [ 1 2; 2 3];
What i need is to chose from each third dimension in A just one number using matrix B:
for i=1:2,
for j=1:2,
C(i,j) = A(i,j,B(i,j));
end
end
How to that in one line without a loop?
Not really a single line, but without a loop:
[I J] = ind2sub (size(B), 1:numel(B));
linInd = sub2ind (size (A), I, J, B(:)');
C = reshape (A(linInd), size(B));
Here is another variation:
[r,c,~] = size(A);
[J,I] = meshgrid(1:size(B,1), 1:size(B,2));
idx = reshape(I(:) + r*(J(:)-1) + r*c*(B(:)-1), size(B));
C = A(idx)