Matlab: Find row indexes with common elements - matlab

I have a Matrix:
1 2 3
4 5 6
7 8 1
How may I use matlab to find this:
for 1st row: row3
for 2nd row: ---
for 3rd row: row1
I want to have row indices for each row witch have common elements.

Consider this
A = [1 2 3; %Matrix A is a bit different from yours for testing
4 5 6;
7 8 1;
1 2 7;
4 5 6];
[row col] =size(A)
answers = zeros(row,row); %matrix of answers,...
%(i,j) = 1 if row_i and row_j have an equal element
for i = 1:row
for j = i+1:row %analysis is performed accounting for
% symmetry constraint
C = bsxfun(#eq,A(i,:),A(j,:)'); %Tensor comparison
if( any(C(:)) ) %If some entry is non-zero you have equal elements
answers(i,j) = 1; %output
end
end
end
answers = answers + answers'; %symmetric
The output here is
answers =
0 0 1 1 0
0 0 0 0 1
1 0 0 1 0
1 0 1 0 0
0 1 0 0 0
of course the answers matrix is symmetric because your relation is.

The solution proposed by Acorbe can be quite slow if you have many rows and/or long rows. I have checked that in most cases the two solutions that I present below should be considerably faster. If your matrix contains just few different values (relative to the size of the matrix) then this should work pretty fast:
function rowMatches = find_row_matches(A)
% Returns a cell array with row matches for each row
c = unique(A);
matches = false(size(A,1), numel(c));
for i = 1:numel(c)
matches(:, i) = any(A == c(i), 2);
end
rowMatches = arrayfun(#(j) ...
find(any(matches(:, matches(j,:)),2)), 1:size(A,1), 'UniformOutput', false);
This other alternative might be faster when you have short rows, i.e. when size(A,2) is small:
function answers = find_answers(A)
% Returns an "answers" matrix like in Acorbe's solution
c = unique(A);
answers = false(size(A,1), size(A,1));
idx = 1:size(A,1);
for i = 1:numel(c)
I = any(A == c(i), 2);
uMatch = idx(I);
answers(uMatch, uMatch) = true;
isReady = all(A <= c(i), 2);
if any(isReady),
idx(isReady) = [];
A = A(~isReady,:);
end
end

Depending on what you are planning to do with this output it could be redundant to have a match for "3rd row: row1".
You already have this match earlier in your output in the form of "1st row: row3"

Related

Constructing vectors of different lengths

I want to find out row and column number of zeros in 3 dimensional space. Problem is I get output vectors(e.g row) of different length each time, hence dimension error occurs.
My attempt:
a (:,:,1)= [1 2 0; 2 0 1; 0 0 2]
a (:,:,2) = [0 2 8; 2 1 0; 0 0 0]
for i = 1 : 2
[row(:,i) colum(:,i)] = find(a(:,:,i)==0);
end
You can use linear indexing:
a (:,:,1) = [1 2 0; 2 0 1; 0 0 2];
a (:,:,2) = [0 2 8; 2 1 0; 0 0 0];
% Answer in linear indexing
idx = find(a == 0);
% Transforms linear indexing in rows-columns-3rd dimension
[rows , cols , third] = ind2sub(size(a) ,idx)
More on the topic can be found in Matlab's help
Lets assume your Matrix has the format N-by-M-by-P.
In your case
N = 3;
M = 3;
P = 2;
This would mean that the maximum length of rows and coloms from your search (if all entries are zero) is N*M=9
So one possible solution would be
%alloc output
row=zeros(size(a,1)*size(a,2),size(a,3));
colum=row;
%loop over third dimension
n=size(a,3);
for i = 1 : n
[row_t colum_t] = find(a(:,:,i)==0);
%copy your current result depending on it's length
row(1:length(row_t),i)=row_t;
colum(1:length(colum_t),i)=colum_t;
end
However, when you past the result to the next function / script you have to keep in mind to operate on the non-zero elements.
I would go for the vectorized solution of Zep. As for bigger matrices a it is more memory efficient and I am sure it must be way faster.

Find unique rows of a cell array considering all possible permutations on each row

I have cell array A of dimension m * k.
I want to keep the rows of A unique up to an order of the k cells.
The "tricky" part is "up to an order of the k cells": consider the k cells in the ith row of A, A(i,:); there could be a row j of A, A(j,:), that is equivalent to A(i,:) up to a re-ordering of its k cells, meaning that for example if k=4it could be that:
A{i,1}=A{j,2}
A{i,2}=A{j,3}
A{i,3}=A{j,1}
A{i,4}=A{j,4}
What I am doing at the moment is:
G=[0 -1 1; 0 -1 2; 0 -1 3; 0 -1 4; 0 -1 5; 1 -1 6; 1 0 6; 1 1 6; 2 -1 6; 2 0 6; 2 1 6; 3 -1 6; 3 0 6; 3 1 6];
h=7;
M=reshape(G(nchoosek(1:size(G,1),h),:),[],h,size(G,2));
A=cell(size(M,1),2);
for p=1:size(M,1)
A{p,1}=squeeze(M(p,:,:));
left=~ismember(G, A{p,1}, 'rows');
A{p,2}=G(left,:);
end
%To find equivalent rows up to order I use a double loop (VERY slow).
indices=[];
for j=1:size(A,1)
if ismember(j,indices)==0 %if we have not already identified j as a duplicate
for i=1:size(A,1)
if i~=j
if (isequal(A{j,1},A{i,1}) || isequal(A{j,1},A{i,2}))...
&&...
(isequal(A{j,2},A{i,1}) || isequal(A{j,2},A{i,2}))...
indices=[indices;i];
end
end
end
end
end
A(indices,:)=[];
It works but it is too slow. I am hoping that there is something quicker that I can use.
I'd like to propose another idea, which has some conceptual resemblance to erfan's. My idea uses hash functions, and specifically, the GetMD5 FEX submission.
The main task is how to "reduce" each row in A to a single representative value (such as a character vector) and then find unique entries of this vector.
Judging by the benchmark vs. the other suggestions, my answer doesn't perform as well as one of the alternatives, but I think its raison d'être lies in the fact that it is completely data-type agnostic (within the limitations of the GetMD51), that the algorithm is very straightforward to understand, it's a drop-in replacement as it operates on A, and that the resulting array is exactly equal to the one obtained by the original method. Of course this requires a compiler to get working and has a risk of hash collisions (which might affect the result in VERY VERY rare cases).
Here are the results from a typical run on my computer, followed by the code:
Original method timing: 8.764601s
Dev-iL's method timing: 0.053672s
erfan's method timing: 0.481716s
rahnema1's method timing: 0.009771s
function q39955559
G=[0 -1 1; 0 -1 2; 0 -1 3; 0 -1 4; 0 -1 5; 1 -1 6; 1 0 6; 1 1 6; 2 -1 6; 2 0 6; 2 1 6; 3 -1 6; 3 0 6; 3 1 6];
h=7;
M=reshape(G(nchoosek(1:size(G,1),h),:),[],h,size(G,2));
A=cell(size(M,1),2);
for p=1:size(M,1)
A{p,1}=squeeze(M(p,:,:));
left=~ismember(G, A{p,1}, 'rows');
A{p,2}=G(left,:);
end
%% Benchmark:
tic
A1 = orig_sort(A);
fprintf(1,'Original method timing:\t\t%fs\n',toc);
tic
A2 = hash_sort(A);
fprintf(1,'Dev-iL''s method timing:\t\t%fs\n',toc);
tic
A3 = erfan_sort(A);
fprintf(1,'erfan''s method timing:\t\t%fs\n',toc);
tic
A4 = rahnema1_sort(G,h);
fprintf(1,'rahnema1''s method timing:\t%fs\n',toc);
assert(isequal(A1,A2))
assert(isequal(A1,A3))
assert(isequal(numel(A1),numel(A4))) % This is the best test I could come up with...
function out = hash_sort(A)
% Hash the contents:
A_hashed = cellfun(#GetMD5,A,'UniformOutput',false);
% Sort hashes of each row:
A_hashed_sorted = A_hashed;
for ind1 = 1:size(A_hashed,1)
A_hashed_sorted(ind1,:) = sort(A_hashed(ind1,:));
end
A_hashed_sorted = cellstr(cell2mat(A_hashed_sorted));
% Find unique rows:
[~,ia,~] = unique(A_hashed_sorted,'stable');
% Extract relevant rows of A:
out = A(ia,:);
function A = orig_sort(A)
%To find equivalent rows up to order I use a double loop (VERY slow).
indices=[];
for j=1:size(A,1)
if ismember(j,indices)==0 %if we have not already identified j as a duplicate
for i=1:size(A,1)
if i~=j
if (isequal(A{j,1},A{i,1}) || isequal(A{j,1},A{i,2}))...
&&...
(isequal(A{j,2},A{i,1}) || isequal(A{j,2},A{i,2}))...
indices=[indices;i];
end
end
end
end
end
A(indices,:)=[];
function C = erfan_sort(A)
STR = cellfun(#(x) num2str((x(:)).'), A, 'UniformOutput', false);
[~, ~, id] = unique(STR);
IC = sort(reshape(id, [], size(STR, 2)), 2);
[~, col] = unique(IC, 'rows');
C = A(sort(col), :); % 'sort' makes the outputs exactly the same.
function A1 = rahnema1_sort(G,h)
idx = nchoosek(1:size(G,1),h);
%concatenate complements
M = [G(idx(1:size(idx,1)/2,:),:), G(idx(end:-1:size(idx,1)/2+1,:),:)];
%convert to cell so A1 is unique rows of A
A1 = mat2cell(M,repmat(h,size(idx,1)/2,1),repmat(size(G,2),2,1));
1 - If more complicated data types need to be hashed, one can use the DataHash FEX submission instead, which is somewhat slower.
Stating the problem: The ideal choice in identifying unique rows in an array is to use C = unique(A,'rows'). But there are two major problems here, preventing us from using this function in this case. First is that you want to count in all the possible permutations of each row when comparing to other rows. If A has 5 columns, it means checking 120 different re-arrangements per row! Sounds impossible.
The second issue is related to unique itself; It does not accept cells except cell arrays of character vectors. So you cannot simply pass A to unique and get what you expect.
Why looking for an alternative? As you know, because currently it is very slow:
With nested loop method:
------------------- Create the data (first loop):
Elapsed time is 0.979059 seconds.
------------------- Make it unique (second loop):
Elapsed time is 14.218691 seconds.
My solution:
Generate another cell array containing same cells, but converted to string (STR).
Find the index of all unique elements there (id).
Generate the associated matrix with the unique indices and sort rows (IC).
Find unique rows (rows).
Collect corresponding rows of A (C).
And this is the code:
disp('------------------- Create the data:')
tic
G = [0 -1 1; 0 -1 2; 0 -1 3; 0 -1 4; 0 -1 5; 1 -1 6; 1 0 6; ...
1 1 6; 2 -1 6; 2 0 6; 2 1 6; 3 -1 6; 3 0 6; 3 1 6];
h = 7;
M = reshape(G(nchoosek(1:size(G,1),h),:),[],h,size(G,2));
A = cell(size(M,1),2);
for p = 1:size(M,1)
A{p, 1} = squeeze(M(p,:,:));
left = ~ismember(G, A{p,1}, 'rows');
A{p,2} = G(left,:);
end
STR = cellfun(#(x) num2str((x(:)).'), A, 'UniformOutput', false);
toc
disp('------------------- Make it unique (vectorized):')
tic
[~, ~, id] = unique(STR);
IC = sort(reshape(id, [], size(STR, 2)), 2);
[~, col] = unique(IC, 'rows');
C = A(sort(col), :); % 'sort' makes the outputs exactly the same.
toc
Performance check:
------------------- Create the data:
Elapsed time is 1.664119 seconds.
------------------- Make it unique (vectorized):
Elapsed time is 0.017063 seconds.
Although initialization needs a bit more time and memory, this method is extremely faster in finding unique rows with the consideration of all permutations. Execution time is almost insensitive to the number of columns in A.
It seems that G is a misleading point.
Here is result of nchoosek for a small number
idx=nchoosek(1:4,2)
ans =
1 2
1 3
1 4
2 3
2 4
3 4
first row is complement of the last row
second row is complement of one before the last row
.....
so if we extract rows {1 , 2} from G then its complement will be rows {3, 4} and so on. In the other words if we assume number of rows of G to be 4 then G(idx(1,:),:) is complement of G(idx(end,:),:).
Since rows of G are all unique then all A{m,n}s always have the same size.
A{p,1} and A{p,2} are complements of each other. and size of unique rows of A is size(idx,1)/2
So no need to any loop or further comparison:
h=7;
G = [0 -1 1; 0 -1 2; 0 -1 3; 0 -1 4; 0 -1 5; 1 -1 6; 1 0 6; ...
1 1 6; 2 -1 6; 2 0 6; 2 1 6; 3 -1 6; 3 0 6; 3 1 6];
idx = nchoosek(1:size(G,1),h);
%concatenate complements
M = [G(idx(1:size(idx,1)/2,:).',:), G(idx(end:-1:size(idx,1)/2+1,:).',:)];
%convert to cell so A1 is unique rows of A
A1 = mat2cell(M,repmat(h,size(idx,1)/2,1),repmat(size(G,2),2,1));
Update: Above method works best however if the idea is to get A1 from A other than G I suggest following method based of erfan' s. Instead of converting array to string we can directly work with the array:
STR=reshape([A.'{:}],numel(A{1,1}),numel(A)).';
[~, ~, id] = unique(STR,'rows');
IC = sort(reshape(id, size(A, 2),[]), 1).';
[~, col] = unique(IC, 'rows');
C1 = A(sort(col), :);
Since I use Octave I can not currently run mex file then I cannot test Dev-iL 's method
Result:
erfan method (string): 4.54718 seconds.
rahnema1 method (array): 0.012639 seconds.
Online Demo

Setting Table Values using Loops with Index Vectors in Matlab

I've a series of coordinates (i,j) and I want to loop through each one.
For example
A = ones(3,3);
i = [1 2 3];
j = [3 2 1];
I tried with this but it doesn't work:
for (i = i && j = j)
A(i,j) = 0;
end
I also tried this but it doens't work as expected:
for i = i
for j = j
A(i,j) = 0;
end
end
Desired result:
A =
1 1 0
1 0 1
0 1 1
Although A is a matrix in this example, I am working with table data.
The correct syntax to do what you want is:
A = ones(3,3);
i = [1 2 3];
j = [3 2 1];
for ii = 1:length( i )
A( i(ii) , j(ii) ) = 0;
end
Essentially you loop through each element and index i and j accordingly using ii. ii loops through 1..3 indexing each element.
This will give the a final result below.
>> A
A =
1 1 0
1 0 1
0 1 1
While this works and fixes your issue, I would recommend rayryeng's alternate solution with conversions if you don't have more complex operations involved.
Though this doesn't answer your question about for loops, I would avoid using loops all together and create column-major linear indices to access into your matrix. Use sub2ind to help facilitate that. sub2ind takes in the size of the matrix in question, the row locations and column locations. The output will be an array of values that specify the column-major locations to access in your matrix.
Therefore:
A = ones(3); i = [1 2 3]; j = [3 2 1]; %// Your code
%// New code
ind = sub2ind(size(A), i, j);
A(ind) = 0;
Given that you have a table, you can perhaps convert the table into an array, apply sub2ind on this array then convert the result back to a table when you're done. table2array and array2table are useful tools here. Given that your table is stored in A, you can try:
Atemp = table2array(A);
ind = sub2ind(size(Atemp), i, j);
Atemp(ind) = 0;
A = array2table(Atemp);

For each element in vector, sum previous n elements

I am trying to write a function that sums the previous n elements for each the element
v = [1 1 1 1 1 1];
res = sumLastN(v,3);
res = [0 0 3 3 3 3];
Until now, I have written the following function
function [res] = sumLastN(vec,ppts)
if iscolumn(vec)~=1
error('First argument must be a column vector')
end
sz_x = size(vec,1);
res = zeros(sz_x,1);
if sz_x > ppts
for jj = 1:ppts
res(ppts:end,1) = res(ppts:end,1) + ...
vec(jj:end-ppts+jj,1);
end
% for jj = ppts:sz_x
% res(jj,1) = sum(vec(jj-ppts+1:jj,1));
% end
end
end
There are around 2000 vectors of about 1 million elements, so I was wondering if anyone could give me any advice of how I could speed up the function.
Using cumsum should be much faster:
function [res] = sumLastN(vec,ppts)
w=cumsum(vec)
res=[zeros(1,ppts-1),w(ppts+1:end)-w(1:end-ppts)]
end
You basically want a moving average filter, just without the averaging.
Use a digital filter:
n = 3;
v = [1 1 1 1 1 1];
res = filter(ones(1,n),1,v)
res =
1 2 3 3 3 3
I don't get why the first two elements should be zero, but why not:
res(1:n-1) = 0
res =
0 0 3 3 3 3

Replacing zeros (or NANs) in a matrix with the previous element row-wise or column-wise in a fully vectorized way

I need to replace the zeros (or NaNs) in a matrix with the previous element row-wise, so basically I need this Matrix X
[0,1,2,2,1,0;
5,6,3,0,0,2;
0,0,1,1,0,1]
To become like this:
[0,1,2,2,1,1;
5,6,3,3,3,2;
0,0,1,1,1,1],
please note that if the first row element is zero it will stay like that.
I know that this has been solved for a single row or column vector in a vectorized way and this is one of the nicest way of doing that:
id = find(X);
X(id(2:end)) = diff(X(id));
Y = cumsum(X)
The problem is that the indexing of a matrix in Matlab/Octave is consecutive and increments columnwise so it works for a single row or column but the same exact concept cannot be applied but needs to be modified with multiple rows 'cause each of raw/column starts fresh and must be regarded as independent. I've tried my best and googled the whole google but coukldn’t find a way out. If I apply that same very idea in a loop it gets too slow cause my matrices contain 3000 rows at least. Can anyone help me out of this please?
Special case when zeros are isolated in each row
You can do it using the two-output version of find to locate the zeros and NaN's in all columns except the first, and then using linear indexing to fill those entries with their row-wise preceding values:
[ii jj] = find( (X(:,2:end)==0) | isnan(X(:,2:end)) );
X(ii+jj*size(X,1)) = X(ii+(jj-1)*size(X,1));
General case (consecutive zeros are allowed on each row)
X(isnan(X)) = 0; %// handle NaN's and zeros in a unified way
aux = repmat(2.^(1:size(X,2)), size(X,1), 1) .* ...
[ones(size(X,1),1) logical(X(:,2:end))]; %// positive powers of 2 or 0
col = floor(log2(cumsum(aux,2))); %// col index
ind = bsxfun(#plus, (col-1)*size(X,1), (1:size(X,1)).'); %'// linear index
Y = X(ind);
The trick is to make use of the matrix aux, which contains 0 if the corresponding entry of X is 0 and its column number is greater than 1; or else contains 2 raised to the column number. Thus, applying cumsum row-wise to this matrix, taking log2 and rounding down (matrix col) gives the column index of the rightmost nonzero entry up to the current entry, for each row (so this is a kind of row-wise "cummulative max" function.) It only remains to convert from column number to linear index (with bsxfun; could also be done with sub2ind) and use that to index X.
This is valid for moderate sizes of X only. For large sizes, the powers of 2 used by the code quickly approach realmax and incorrect indices result.
Example:
X =
0 1 2 2 1 0 0
5 6 3 0 0 2 3
1 1 1 1 0 1 1
gives
>> Y
Y =
0 1 2 2 1 1 1
5 6 3 3 3 2 3
1 1 1 1 1 1 1
You can generalize your own solution as follows:
Y = X.'; %'// Make a transposed copy of X
Y(isnan(Y)) = 0;
idx = find([ones(1, size(X, 1)); Y(2:end, :)]);
Y(idx(2:end)) = diff(Y(idx));
Y = reshape(cumsum(Y(:)), [], size(X, 1)).'; %'// Reshape back into a matrix
This works by treating the input data as a long vector, applying the original solution and then reshaping the result back into a matrix. The first column is always treated as non-zero so that the values don't propagate throughout rows. Also note that the original matrix is transposed so that it is converted to a vector in row-major order.
Modified version of Eitan's answer to avoid propagating values across rows:
Y = X'; %'
tf = Y > 0;
tf(1,:) = true;
idx = find(tf);
Y(idx(2:end)) = diff(Y(idx));
Y = reshape(cumsum(Y(:)),fliplr(size(X)))';
x=[0,1,2,2,1,0;
5,6,3,0,1,2;
1,1,1,1,0,1];
%Do it column by column is easier
x=x';
rm=0;
while 1
%fields to replace
l=(x==0);
%do nothing for the first row/column
l(1,:)=0;
rm2=sum(sum(l));
if rm2==rm
%nothing to do
break;
else
rm=rm2;
end
%replace zeros
x(l) = x(find(l)-1);
end
x=x';
I have a function I use for a similar problem for filling NaNs. This can probably be cutdown or sped up further - it's extracted from pre-existing code that has a bunch more functionality (forward/backward filling, maximum distance etc).
X = [
0 1 2 2 1 0
5 6 3 0 0 2
1 1 1 1 0 1
0 0 4 5 3 9
];
X(X == 0) = NaN;
Y = nanfill(X,2);
Y(isnan(Y)) = 0
function y = nanfill(x,dim)
if nargin < 2, dim = 1; end
if dim == 2, y = nanfill(x',1)'; return; end
i = find(~isnan(x(:)));
j = 1:size(x,1):numel(x);
j = j(ones(size(x,1),1),:);
ix = max(rep([1; i],diff([1; i; numel(x) + 1])),j(:));
y = reshape(x(ix),size(x));
function y = rep(x,times)
i = find(times);
if length(i) < length(times), x = x(i); times = times(i); end
i = cumsum([1; times(:)]);
j = zeros(i(end)-1,1);
j(i(1:end-1)) = 1;
y = x(cumsum(j));