Cumulative count of unique element in Matlab array - matlab

Working with Matlab 2019b.
x = [10 10 10 20 20 30]';
How do I get a cumulative count of unique elements in x, which should look like:
y = [1 2 3 1 2 1]';
EDIT:
My real array is actually much longer than the example given above. Below are the methods I tested:
x = randi([1 100], 100000, 1);
x = sort(x);
% method 1: check neighboring values in one loop
tic
y = ones(size(x));
for ii = 2:length(x)
if x(ii) == x(ii-1)
y(ii) = y(ii-1) + 1;
end
end
toc
% method 2 (Wolfie): count occurrence of unique values explicitly
tic
u = unique(x);
y = zeros(size(x));
for ii = 1:numel(u)
idx = (x == u(ii));
y(idx) = 1:nnz(idx);
end
toc
% method 3 (Luis Mendo): triangular matrix
tic
y = sum(triu(x==x'))';
toc
Results:
Method 1: Elapsed time is 0.016847 seconds.
Method 2: Elapsed time is 0.037124 seconds.
Method 3: Elapsed time is 10.350002 seconds.

EDIT:
Assuming that x is sorted:
x = [10 10 10 20 20 30].';
x = sort(x);
d = [1 ;diff(x)];
f = find(d);
d(f) = f;
ic = cummax(d);
y = (2 : numel(x) + 1).' - ic;
When x is unsorted use this:
[s, is] = sort(x);
d = [1 ;diff(s)];
f = find(d);
d(f) = f;
ic = cummax(d);
y(is) = (2 : numel(s) + 1).' - ic;
Original Answer that only works on GNU Octave:
Assuming that x is sorted:
x = [10 10 10 20 20 30].';
x = sort(x);
[~, ic] = cummax(x);
y = (2 : numel(x) + 1).' - ic;
When x is unsorted use this:
[s, is] = sort(x);
[~, ic] = cummax(s);
y(is) = (2 : numel(s) + 1).' - ic;

You could loop over the unique elements, and set their indices to 1:n each time...
u = unique(x);
y = zeros(size(x));
for ii = 1:numel(u)
idx = (x == u(ii));
y(idx) = 1:nnz(idx);
end

This is a little inefficient because it generates an intermediate matrix, when actually only a triangular half is needed:
y = sum(triu(x==x.')).';

Here's a no-for-loop version. On my machine it's a bit faster than the previous working methods:
% if already sorted, can omit this first and last line
[s, is] = sort(x);
[u,~,iu] = unique(s);
c = accumarray(iu,1);
cs = cumsum([0;c]);
z = (1:numel(x))'-repelem(cs(1:end-1),c);
y(is) = z;

Related

How can I do vectorization for this matlab "for loop"?

I have some matlab code as follow, constructing KNN similarity weight matrix.
[D,I] = pdist2(X, X, 'squaredeuclidean', 'Smallest', k+1);
D = D < threshold;
W = zeros(n, n);
for i=1:size(I,2)
W(I(:,i), i) = D(:,i);
W(i, I(:,i)) = D(:,i)';
end
I want to vectorize the for loop. I have tried
W(I) = D;
but failed to get the correct value.
I add test case here:
n = 5;
D = [
1 1 1 1 1
0 1 1 1 1
0 0 0 0 0
];
I = [
1 2 3 4 5
5 4 5 2 3
3 1 1 1 1
];
There are some undefined variables that makes it hard to check what it is doing, but this should do the same as your for loop:
D,I] = pdist2(X, X, 'squaredeuclidean', 'Smallest', k+1);
D = D < threshold;
W = zeros(n);
% set the diagonal values
W(sub2ind(size(X), I(1, :), I(1, :))) = D(1,:);
% set the other values
W(sub2ind(size(W), I(2, :), 1:size(I, 2))) = D(2, :);
W(sub2ind(size(W), 1:size(I, 2), I(2, :))) = D(2, :).';
I splited the directions, it works now with your test case.
A possible solution:
idx1 = reshape(1:n*n,n,n).';
idx2 = bsxfun(#plus,I,0:n:n*size(I,2)-1);
W=zeros(n,n);
W(idx2) = D;
W(idx1(idx2)) = D;
Here assumed that you repeatedly want to compute D and I so compute idx only one time and use it repeatedly.
n = 5;
idx1 = reshape(1:n*n,n,n).';
%for k = 1 : 1000
%[D,I] = pdist2(X, X, 'squaredeuclidean', 'Smallest', k+1);
%D = D < threshold;
idx2 = bsxfun(#plus,I,0:n:n*size(I,2)-1);
W=zeros(n,n);
W(idx2) = D;
W(idx1(idx2)) = D;
%end
But if n isn't constant and it varies in each iteration it is better to change the way idx1 is computed:
n = 5;
%for k = 1 : 1000
%n = randi([2 10]);%n isn't constant
%[D,I] = pdist2(X, X, 'squaredeuclidean', 'Smallest', k+1);
%D = D < threshold;
idx1 = bsxfun(#plus,(0:n:n^2-1).',1:size(I,2));
idx2 = bsxfun(#plus,I,0:n:n*size(I,2)-1);
W=zeros(n,n);
W(idx2) = D;
W(idx1(idx2)) = D;
%end
You can cut some corners with linear indices but if your matrices are big then you should only take the nonzero components of D. Following copies all values of D
W = zeros(n);
W(reshape(sub2ind([n,n],I,[1;1;1]*[1:n]),1,[])) = reshape(D,1,[]);

Matlab: Optimization of reducing vector size

I have developed a function to reduce the size of an initial vector X = [x,y]. But for an X of 500,000 points and points_limit = 10000, Matlab needs 16sec to complete this function.
Are there any ways to optimize this, maybe by removing the loop using matrix operations (vectorisation)?
function X = reduce_vector_size(X,points_limit)
while length(X) > points_limit
k = 1;
X2 = zeros(round(length(X(:,1))/2),2);
X = sortrows(X);
for i=1:2:length(X(:,1))-1
X2(k,1) = mean([X(i,1) ,X(i+1,1) ]);
X2(k,2) = mean([X(i,2) ,X(i+1,2) ]);
k = k + 1;
end
X = X2;
end
An other best idea is to have a new approach :
Ratio = ceil(length(X(:,1))/points_limit);
X = ceil(X);
X = sortrows(X,1);
X = sortrows(X,2);
X1=[];
for i=1:points_limit - 1
X1 = [X1; mean(X(i*Ratio:(i+1)*Ratio,1)), mean(X(i*Ratio:(i+1)*Ratio,2))];
end
X = X1;
The objective is to reduce the number of points in a vector: a form of compression function for a 2D vectors.
Do you know if I can do this new method with loop ?
What do you think about my algorithm of compression ?
you can easily vectorize the inner for loop:
k = 1;
X = rand(5e5,2);
X2 = zeros(round(length(X(:,1))/2),2);
tic
for i=1:2:length(X(:,1))-1
X2(k,1) = mean([X(i,1) ,X(i+1,1) ]);
X2(k,2) = mean([X(i,2) ,X(i+1,2) ]);
k = k + 1;
end
toc % Elapsed time is 1.988739 seconds.
tic
X3 = (X(1:2:length(X(:,1))-1,:) + X(2:2:length(X(:,1)),:))/2;
toc % Elapsed time is 0.014575 seconds.
isequal(X2,X3) % true

Vectorization in Matlab to speed up expensive loop

How can I speed up the following MATLAB code, using vectorization? Right now the single line in the loop is taking hours to run for the case upper = 1e7.
Here is the commented code with sample output:
p = 8;
lower = 1;
upper = 1e1;
n = setdiff(lower:upper,primes(upper)); % contains composite numbers between lower + upper
x = ones(length(n),p); % Preallocated 2-D array of ones
% This loop stores the unique prime factors of each composite
% number from 1 to n, in each row of x. Since the rows will have
% varying lengths, the rows are padded with ones at the end.
for i = 1:length(n)
x(i,:) = [unique(factor(n(i))) ones(1,p-length(unique(factor(n(i)))))];
end
output:
x =
1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1
2 3 1 1 1 1 1 1
2 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1
2 5 1 1 1 1 1 1
For example, the last row contains the prime factors of 10, if we ignore the ones. I have made the matrix 8 columns wide to account for the many prime factors of numbers up to 10 million.
Thanks for any help!
This is not vectorization, but this version of the loop will save about half of the time:
for k = 1:numel(n)
tmp = unique(factor(n(k)));
x(k,1:numel(tmp)) = tmp;
end
Here is a quick benchmark for this:
function t = getPrimeTime
lower = 1;
upper = 2.^(1:8);
t = zeros(numel(upper),2);
for k = 1:numel(upper)
n = setdiff(lower:upper(k),primes(upper(k))); % contains composite numbers between lower to upper
t(k,1) = timeit(#() getPrime1(n));
t(k,2) = timeit(#() getPrime2(n));
disp(k)
end
p = plot(log2(upper),log10(t));
p(1).Marker = 'o';
p(2).Marker = '*';
xlabel('log_2(range of numbers)')
ylabel('log(time (sec))')
legend({'getPrime1','getPrime2'})
end
function x = getPrime1(n) % the originel function
p = 8;
x = ones(length(n),p); % Preallocated 2-D array of ones
for k = 1:length(n)
x(k,:) = [unique(factor(n(k))) ones(1,p-length(unique(factor(n(k)))))];
end
end
function x = getPrime2(n)
p = 8;
x = ones(numel(n),p); % Preallocated 2-D array of ones
for k = 1:numel(n)
tmp = unique(factor(n(k)));
x(k,1:numel(tmp)) = tmp;
end
end
Here's another approach:
p = 8;
lower = 1;
upper = 1e1;
p = 8;
q = primes(upper);
n = setdiff(lower:upper, q);
x = bsxfun(#times, q, ~bsxfun(#mod, n(:), q));
x(~x) = inf;
x = sort(x,2);
x(isinf(x)) = 1;
x = [x ones(size(x,1), p-size(x,2))];
This seems to be faster than the other two options (but is uses more memory). Borrowing EBH's benchmarking code:
function t = getPrimeTime
lower = 1;
upper = 2.^(1:12);
t = zeros(numel(upper),3);
for k = 1:numel(upper)
n = setdiff(lower:upper(k),primes(upper(k)));
t(k,1) = timeit(#() getPrime1(n));
t(k,2) = timeit(#() getPrime2(n));
t(k,3) = timeit(#() getPrime3(n));
disp(k)
end
p = plot(log2(upper),log10(t));
p(1).Marker = 'o';
p(2).Marker = '*';
p(3).Marker = '^';
xlabel('log_2(range of numbers)')
ylabel('log(time (sec))')
legend({'getPrime1','getPrime2','getPrime3'})
grid on
end
function x = getPrime1(n) % the originel function
p = 8;
x = ones(length(n),p); % Preallocated 2-D array of ones
for k = 1:length(n)
x(k,:) = [unique(factor(n(k))) ones(1,p-length(unique(factor(n(k)))))];
end
end
function x = getPrime2(n)
p = 8;
x = ones(numel(n),p); % Preallocated 2-D array of ones
for k = 1:numel(n)
tmp = unique(factor(n(k)));
x(k,1:numel(tmp)) = tmp;
end
end
function x = getPrime3(n) % Approach in this answer
p = 8;
q = primes(max(n));
x = bsxfun(#times, q, ~bsxfun(#mod, n(:), q));
x(~x) = inf;
x = sort(x,2);
x(isinf(x)) = 1;
x = [x ones(size(x,1), p-size(x,2))];
end

How to do a ~= vector operation in matlab

I'm trying to write my own program to sort vectors in matlab. I know you can use the sort(A) on a vector, but I'm trying to code this on my own. My goal is to also sort it in the fewest amount of swaps which is kept track of by the ctr variable. I find and sort the min and max elements first, and then have a loop that looks at the ii minimum vector value and swaps it accordingly.
This is where I start to run into problems, I'm trying to remove all the ii minimum values from my starting vector but I'm not sure how to use the ~= on a vector. Is there a way do this this with a loop? Thanks!
clc;
a = [8 9 13 3 2 8 74 3 1] %random vector, will be function a once I get this to work
[one, len] = size(a);
[mx, posmx] = max(a);
ctr = 0; %counter set to zero to start
%setting min and max at first and last elements
if a(1,len) ~= mx
b = mx;
c = a(1,len);
a(1,len) = b;
a(1,posmx) = c;
ctr = ctr + 1;
end
[mn, posmn] = min(a);
if a(1,1) ~= mn
b = mn;
c = a(1,1);
a(1,1) = b;
a(1,posmn) = c;
ctr = ctr + 1;
end
ii = 2; %starting at 2 since first element already sorted
mini = [mn];
posmini = [];
while a(1,ii) < mx
[mini(ii), posmini(ii - 1)] = min(a(a~=mini))
if a(1,ii) ~= mini(ii)
b = mini(ii)
c = a(1,ii)
a(1,ii) = b
a(1,ii) = c
ctr = ctr + 1;
end
ii = ii + 1;
end

How to vectorize a sum of dot products between a normal and array of points

I need to evaluate following expression (in pseudo-math notation):
∑ipi⋅n
where p is a matrix of three-element vectors and n is a three-element vector. I can do this with for loops as follows but I can't figure out
how to vectorize this:
p = [1 1 1; 2 2 2];
n = [3 3 3];
s = 0;
for i = 1:size(p, 1)
s = s + dot(p(i, :), n)
end
Why complicate things? How about simple matrix multiplication:
s = sum(p * n(:))
where p is assumed to be an M-by-3 matrix.
I think you can do it with bsxfun:
sum(sum(bsxfun(#times,p,n)))
----------
% Is it the same for this case?
----------
n = 200; % depending on the computer it might be
m = 1000*n; % that n needs to be chosen differently
A = randn(n,m);
x = randn(n,1);
p = zeros(m,1);
q = zeros(1,m);
tic;
for i = 1:m
p(i) = sum(x.*A(:,i));
q(i) = sum(x.*A(:,i));
end
time = toc; disp(['time = ',num2str(time)]);