I have cell array A of dimension m * k.
I want to keep the rows of A unique up to an order of the k cells.
The "tricky" part is "up to an order of the k cells": consider the k cells in the ith row of A, A(i,:); there could be a row j of A, A(j,:), that is equivalent to A(i,:) up to a re-ordering of its k cells, meaning that for example if k=4it could be that:
A{i,1}=A{j,2}
A{i,2}=A{j,3}
A{i,3}=A{j,1}
A{i,4}=A{j,4}
What I am doing at the moment is:
G=[0 -1 1; 0 -1 2; 0 -1 3; 0 -1 4; 0 -1 5; 1 -1 6; 1 0 6; 1 1 6; 2 -1 6; 2 0 6; 2 1 6; 3 -1 6; 3 0 6; 3 1 6];
h=7;
M=reshape(G(nchoosek(1:size(G,1),h),:),[],h,size(G,2));
A=cell(size(M,1),2);
for p=1:size(M,1)
A{p,1}=squeeze(M(p,:,:));
left=~ismember(G, A{p,1}, 'rows');
A{p,2}=G(left,:);
end
%To find equivalent rows up to order I use a double loop (VERY slow).
indices=[];
for j=1:size(A,1)
if ismember(j,indices)==0 %if we have not already identified j as a duplicate
for i=1:size(A,1)
if i~=j
if (isequal(A{j,1},A{i,1}) || isequal(A{j,1},A{i,2}))...
&&...
(isequal(A{j,2},A{i,1}) || isequal(A{j,2},A{i,2}))...
indices=[indices;i];
end
end
end
end
end
A(indices,:)=[];
It works but it is too slow. I am hoping that there is something quicker that I can use.
I'd like to propose another idea, which has some conceptual resemblance to erfan's. My idea uses hash functions, and specifically, the GetMD5 FEX submission.
The main task is how to "reduce" each row in A to a single representative value (such as a character vector) and then find unique entries of this vector.
Judging by the benchmark vs. the other suggestions, my answer doesn't perform as well as one of the alternatives, but I think its raison d'ĂȘtre lies in the fact that it is completely data-type agnostic (within the limitations of the GetMD51), that the algorithm is very straightforward to understand, it's a drop-in replacement as it operates on A, and that the resulting array is exactly equal to the one obtained by the original method. Of course this requires a compiler to get working and has a risk of hash collisions (which might affect the result in VERY VERY rare cases).
Here are the results from a typical run on my computer, followed by the code:
Original method timing: 8.764601s
Dev-iL's method timing: 0.053672s
erfan's method timing: 0.481716s
rahnema1's method timing: 0.009771s
function q39955559
G=[0 -1 1; 0 -1 2; 0 -1 3; 0 -1 4; 0 -1 5; 1 -1 6; 1 0 6; 1 1 6; 2 -1 6; 2 0 6; 2 1 6; 3 -1 6; 3 0 6; 3 1 6];
h=7;
M=reshape(G(nchoosek(1:size(G,1),h),:),[],h,size(G,2));
A=cell(size(M,1),2);
for p=1:size(M,1)
A{p,1}=squeeze(M(p,:,:));
left=~ismember(G, A{p,1}, 'rows');
A{p,2}=G(left,:);
end
%% Benchmark:
tic
A1 = orig_sort(A);
fprintf(1,'Original method timing:\t\t%fs\n',toc);
tic
A2 = hash_sort(A);
fprintf(1,'Dev-iL''s method timing:\t\t%fs\n',toc);
tic
A3 = erfan_sort(A);
fprintf(1,'erfan''s method timing:\t\t%fs\n',toc);
tic
A4 = rahnema1_sort(G,h);
fprintf(1,'rahnema1''s method timing:\t%fs\n',toc);
assert(isequal(A1,A2))
assert(isequal(A1,A3))
assert(isequal(numel(A1),numel(A4))) % This is the best test I could come up with...
function out = hash_sort(A)
% Hash the contents:
A_hashed = cellfun(#GetMD5,A,'UniformOutput',false);
% Sort hashes of each row:
A_hashed_sorted = A_hashed;
for ind1 = 1:size(A_hashed,1)
A_hashed_sorted(ind1,:) = sort(A_hashed(ind1,:));
end
A_hashed_sorted = cellstr(cell2mat(A_hashed_sorted));
% Find unique rows:
[~,ia,~] = unique(A_hashed_sorted,'stable');
% Extract relevant rows of A:
out = A(ia,:);
function A = orig_sort(A)
%To find equivalent rows up to order I use a double loop (VERY slow).
indices=[];
for j=1:size(A,1)
if ismember(j,indices)==0 %if we have not already identified j as a duplicate
for i=1:size(A,1)
if i~=j
if (isequal(A{j,1},A{i,1}) || isequal(A{j,1},A{i,2}))...
&&...
(isequal(A{j,2},A{i,1}) || isequal(A{j,2},A{i,2}))...
indices=[indices;i];
end
end
end
end
end
A(indices,:)=[];
function C = erfan_sort(A)
STR = cellfun(#(x) num2str((x(:)).'), A, 'UniformOutput', false);
[~, ~, id] = unique(STR);
IC = sort(reshape(id, [], size(STR, 2)), 2);
[~, col] = unique(IC, 'rows');
C = A(sort(col), :); % 'sort' makes the outputs exactly the same.
function A1 = rahnema1_sort(G,h)
idx = nchoosek(1:size(G,1),h);
%concatenate complements
M = [G(idx(1:size(idx,1)/2,:),:), G(idx(end:-1:size(idx,1)/2+1,:),:)];
%convert to cell so A1 is unique rows of A
A1 = mat2cell(M,repmat(h,size(idx,1)/2,1),repmat(size(G,2),2,1));
1 - If more complicated data types need to be hashed, one can use the DataHash FEX submission instead, which is somewhat slower.
Stating the problem: The ideal choice in identifying unique rows in an array is to use C = unique(A,'rows'). But there are two major problems here, preventing us from using this function in this case. First is that you want to count in all the possible permutations of each row when comparing to other rows. If A has 5 columns, it means checking 120 different re-arrangements per row! Sounds impossible.
The second issue is related to unique itself; It does not accept cells except cell arrays of character vectors. So you cannot simply pass A to unique and get what you expect.
Why looking for an alternative? As you know, because currently it is very slow:
With nested loop method:
------------------- Create the data (first loop):
Elapsed time is 0.979059 seconds.
------------------- Make it unique (second loop):
Elapsed time is 14.218691 seconds.
My solution:
Generate another cell array containing same cells, but converted to string (STR).
Find the index of all unique elements there (id).
Generate the associated matrix with the unique indices and sort rows (IC).
Find unique rows (rows).
Collect corresponding rows of A (C).
And this is the code:
disp('------------------- Create the data:')
tic
G = [0 -1 1; 0 -1 2; 0 -1 3; 0 -1 4; 0 -1 5; 1 -1 6; 1 0 6; ...
1 1 6; 2 -1 6; 2 0 6; 2 1 6; 3 -1 6; 3 0 6; 3 1 6];
h = 7;
M = reshape(G(nchoosek(1:size(G,1),h),:),[],h,size(G,2));
A = cell(size(M,1),2);
for p = 1:size(M,1)
A{p, 1} = squeeze(M(p,:,:));
left = ~ismember(G, A{p,1}, 'rows');
A{p,2} = G(left,:);
end
STR = cellfun(#(x) num2str((x(:)).'), A, 'UniformOutput', false);
toc
disp('------------------- Make it unique (vectorized):')
tic
[~, ~, id] = unique(STR);
IC = sort(reshape(id, [], size(STR, 2)), 2);
[~, col] = unique(IC, 'rows');
C = A(sort(col), :); % 'sort' makes the outputs exactly the same.
toc
Performance check:
------------------- Create the data:
Elapsed time is 1.664119 seconds.
------------------- Make it unique (vectorized):
Elapsed time is 0.017063 seconds.
Although initialization needs a bit more time and memory, this method is extremely faster in finding unique rows with the consideration of all permutations. Execution time is almost insensitive to the number of columns in A.
It seems that G is a misleading point.
Here is result of nchoosek for a small number
idx=nchoosek(1:4,2)
ans =
1 2
1 3
1 4
2 3
2 4
3 4
first row is complement of the last row
second row is complement of one before the last row
.....
so if we extract rows {1 , 2} from G then its complement will be rows {3, 4} and so on. In the other words if we assume number of rows of G to be 4 then G(idx(1,:),:) is complement of G(idx(end,:),:).
Since rows of G are all unique then all A{m,n}s always have the same size.
A{p,1} and A{p,2} are complements of each other. and size of unique rows of A is size(idx,1)/2
So no need to any loop or further comparison:
h=7;
G = [0 -1 1; 0 -1 2; 0 -1 3; 0 -1 4; 0 -1 5; 1 -1 6; 1 0 6; ...
1 1 6; 2 -1 6; 2 0 6; 2 1 6; 3 -1 6; 3 0 6; 3 1 6];
idx = nchoosek(1:size(G,1),h);
%concatenate complements
M = [G(idx(1:size(idx,1)/2,:).',:), G(idx(end:-1:size(idx,1)/2+1,:).',:)];
%convert to cell so A1 is unique rows of A
A1 = mat2cell(M,repmat(h,size(idx,1)/2,1),repmat(size(G,2),2,1));
Update: Above method works best however if the idea is to get A1 from A other than G I suggest following method based of erfan' s. Instead of converting array to string we can directly work with the array:
STR=reshape([A.'{:}],numel(A{1,1}),numel(A)).';
[~, ~, id] = unique(STR,'rows');
IC = sort(reshape(id, size(A, 2),[]), 1).';
[~, col] = unique(IC, 'rows');
C1 = A(sort(col), :);
Since I use Octave I can not currently run mex file then I cannot test Dev-iL 's method
Result:
erfan method (string): 4.54718 seconds.
rahnema1 method (array): 0.012639 seconds.
Online Demo
I have a code as follows:
Ne = 100;
H = rand(Ne,Ne);
g = zeros(Ne,1);
for e =1:Ne
hue = H(:,e);
ss1 = bsxfun(#times, hue', hue) .* M; % M is a Ne*Ne matrix
g(e) = sum(ss1(:));
end
when Ne > 1000, it runs very slowly.
I read the matlab documents, and find permute function is a possible way to speed up.
But I tried a whole day and failed.
Here is my code, and I do not know what is wrong.
C = permute(bsxfun(#times, permute(H, [1 3 2]), permute(H', [1 3 2])), [1 3 2]);
g = sum(sum(C))
If you do the math, you'll see that all you have to do is this:
g = sum(H) .^ 2;
Run speed: 0.000681 seconds, with Ne = 1000 (the original code took 3.047315 seconds).
EDIT:
Now, for your edited code, all you have to do is this:
g = diag(H.' * M * H);
Run speed: 0.072273 seconds, with Ne = 1000.
A speedup can be obtained if you notice that if you rearrange the terms, you can avoid a second matrix multiplication (which changes to a dot product) and all you have to do is sum the columns, like this:
g = sum(M.' * H .* H);
Run speed: 0.044190 seconds, with Ne = 1000.
It's always a good idea to do the math. We spend some time, but the code gains a good speedup. :)
NOTE: Run speeds were measured by averaging the time of a hundred runs.
For your edited code, this would work -
H1 = permute(H,[1 3 2]);
H2 = permute(H,[3 1 2]);
p1 = bsxfun(#times,H2,H1);
p1 = bsxfun(#times,p1,M);
g = sum(reshape(p1,Ne*Ne,[]),1)';
There isn't a great performance improvement with this though, as it's a bit faster only on a limited range of small datasizes.
I have the following two arrays:
A = [1 2;3 4] and B = [1 5 4]
I want to do the following operation:
for each element of A(call it A(i))
for each element of B~=b do
( (A(i) - 1)/(b-1) ) * ( (A(i) - 5)/(b-5) ) * ( (A(i)- 4)/(b-4) )
end
end
It means that, sometimes the numerator equals to zero, so the product should be zeros. And I want to do the operation for the elements of B which are not equal to the b in denominator to not make it Inf.
How can I do this for the whole matrix A instead of using for loop?
Code
A = [1 2;3 4];
B = [1 5 4];
m1 = bsxfun(#minus,A,permute([1 5 4],[3 1 2]));
m2 = bsxfun(#minus,B,permute([1 5 4],[3 1 2]));
for k1=1:size(A,1)
for k2=1:size(A,2)
t2 = squeeze(bsxfun(#rdivide,m1(k1,k2,:),m2));
t2(1:size(t2,1)+1:end)=1;
A1(k1,k2) = prod(t2(:)); %%// Output
end
end
Output
A1 =
0 -0.2500
-0.1111 0
You can remove the nested loops, but at least two issues there -
You would be going to 4th and 5th dimension with it, using bsxfun. So, debugging would be tough.
bsxfun with higher dimensions to my knowledge seems to get slower.
You could just do the operation, and correct later:
C = (A-1)./(B-1) .* (A-5)./(B-5) .* (A-4)./(B-4)
C(isinf(C)) = 0;
or
C(B==b) = 0;
Possibly you'd need bsxfun, I'm not clear on the size of the output you want...
Hey there, I've been having difficulty writing the matlab equivalent of the conv(x,y) function. I cant figure out why this gives the incorrect output. For the arrays
x1 = [1 2 1] and x2 = [3 1 1].
Here's what I have
x1 = [1 2 1];
x2 = [3 1 1];
x1len = leng(x1);
x2len = leng(x2);
len = x1len + x2len - 1;
x1 = zeros(1,len);
x2 = zeros(1,len);
buffer = zeros(1,len);
answer = zeros(1,len);
for n = 1:len
buffer(n) = x(n);
answer(n) = 0;
for i = 1:len
answer(n) = answer(n) + x(i) * buffer(i);
end
end
The matlab conv(x1,x2) gives 3 7 6 3 1 as the output but this is giving me 3 5 6 6 6 for answer.
Where have I gone wrong?
Also, sorry for the formatting I am on opera mini.
Aside from not having x defined, and having all zeroes for your variables x1, x2, buffer, and answer, I'm not certain why you have your nested loops set up like they are. I don't know why you need to reproduce the behavior of CONV this way, but here's how I would set up a nested for-loop solution:
X = [1 2 1];
Y = [3 1 1];
nX = length(X);
nY = length(Y);
nOutput = nX+nY-1;
output = zeros(1,nOutput);
for indexY = 1:nY
for indexX = 1:nX
indexOutput = indexY+indexX-1;
output(indexOutput) = output(indexOutput) + X(indexX)*Y(indexY);
end
end
However, since this is MATLAB, there are vectorized alternatives to looping in this way. One such solution is the following, which uses the functions SUM, SPDIAGS, and FLIPUD:
output = sum(spdiags(flipud(X(:))*Y));
In the code as given, all vectors are zeroed out before you start, except for x which is never defined. So it's hard to see exactly what you're getting at. But a couple of things to note:
In your inner for loop you are using values of buffer which have not yet been set by your outer loop.
The inner loop always covers the full range 1:len rather than shifting one vector relative to the other.
You might also want to think about "vectorizing" some of this rather than nesting for loops -- eg, your inner loop is just calculating a dot product, for which a perfectly good Matlab function already exists.
(Of course the same can be said for conv -- but I guess you're reimplementing either as homework or to understand how it works?)