remove rows from cell based on multiple conditions - matlab

I have a cell with dimensions of 50 x 2. A subset of the table is shown below.
code num
AAA 5
AAA 6
BBB 12
AAA 4
CCC 5
I want to find any rows where the code is equal to AAA and the num is not equal to 4. Then remove these rows to leave me with,
code num
AAA 4
BBB 12
CCC 5
I have tried the following,
indx_remove = rf_cell(:, 1) == 'AAA' && rf_cell(:, 2) ~= '4';
This line gives me undefined function eq for input arguments of type cell.

Use the following code:
A(strcmp(A(:,1),'AAA') &([A{:,2}]'~=4),:) = []

I believe I am doing this the hard way, but I hope it is not too stupid.
code num
AAA 5
AAA 6
BBB 12
AAA 4
CCC 5
%generate code vector and num vector
code = ['AAA', 'AAA', 'BBB', 'AAA','CCC']
code = AAAAAABBBAAACCC
num = [5;6;12;4;5]
k = strfind(code, 'AAA') %find your index
k = 1 2 3 4 10 %because the vector code is just a concatenation of your sub-strings, you will need to sort the index out
%here, you can do something smart, your choice, I use modulo, since your char length is 3 characters, the modulo 3 should return 1 for it to be the starting index.
b = mod(k,3)
b = 1 2 0 1 1
index = k(find(b==1)) % 1, 4, 10 returned
column1 = floor(index/3+1) %output 1 2 4, which is the rows with AAA
check = num(floor(column1/3+1)) % just a checking stage, output 5 6 4 of num.
now you have the index of your column 1 for the strings that has AAA for value. Now you find for you column 2 the value 4s
column2 = find(num==4) % output 4
you can write a if statement to remove index [number 4] if both column1 and column2 contains the same number and remove that value (which refers to the index)
Happy coding!

ind = cellfun(#(x,y) strcmp(x,'AAA') & y~=4, {A{:,1}}, {A{:,2}}) '
A(find(ind==0),:)
ans =
{
[1,1] = BBB
[2,1] = AAA
[3,1] = CCC
[1,2] = 12
[2,2] = 4
[3,2] = 5
}
Details
% // Create the values
A = {'AAA', 5;
'AAA' , 6;
'BBB' , 12;
'AAA' , 4;
'CCC' , 5};
%// Create a cell array of the values
{A{:,1}}, {A{:,2}}
ans =
{
[1,1] = AAA
[1,2] = AAA
[1,3] = BBB
[1,4] = AAA
[1,5] = CCC
}
ans =
{
[1,1] = 5
[1,2] = 6
[1,3] = 12
[1,4] = 4
[1,5] = 5
}
%// Create an anonymous function that will be applied to each element of our cell.
%// It will take the elements of the first cell (represented by `x` in the anonymous function)
%// and compare it to `AAA` and the elements of the second cell (`y`) and compare it to `4`.
%// The result is an array with the logical result of the conditions.
ind = cellfun(#(x,y) strcmp(x,'AAA') & y~=4, {A{1:size(A,1),1}}, {A{1:size(A,1),2}}) '
ind =
1
1
0
0
0
%// Then find the rows where these were zero as we wanted to exclude those values
A(find(ind==0),:)
ans =
{
[1,1] = BBB
[2,1] = AAA
[3,1] = CCC
[1,2] = 12
[2,2] = 4
[3,2] = 5
}

Related

How find rows and columns in matlab

I have variable matrix :
A = [1 2 8 8 1
4 6 8 1 1
5 3 1 1 8];
and I have variable B :
B=[2 3 1 8 8];
Question is how to find rows and columns (sort by rows) in variable A from variable B.
Example, first index in variable B is 2, and then I want to find value 2 in variable A and get to first rows and columns, and next process until index 5, but if rows and columns has been used so get second position (ex. index 4 & 5 having same value).
rows;
columns;
Result is:
rows = 1 3 1 1 1
columns = 2 2 1 3 4
Use can use find and sub2ind to achieve what you want
but for that you have to take transpose of your A first
A = [1 2 8 8 1
4 6 8 1 1
5 3 1 1 8];
B= [2 3 1 8 8];
TMP = A.';
for i = 1:length(B)
indx = find(TMP== B(i),1,'first') %Finding the element of B present in A
if(~isempty(indx )) % If B(i) is a member of A
[column(i),row(i)] = ind2sub(size(TMP),indx) % store it in row and column matrix
TMP(indx) = nan; % remove that element
end
end
column =
2 2 1 3 4
row =
1 3 1 1 1
As in one of the comments Usama suggested preallocation of memory
you can do that by using
row = zeros(1,sum(ismember(B,A)))
column= zeros(1,sum(ismember(B,A)))
The above code works even if there are some members of B not present in A
Use find. The function could return both a linear index or a row/col index.
Using linear index a solution could be
idx = zeros(size(B));
for i = 1:numel(B)
% Find all indexes
tmpIdx = find(A == B(i));
% Remove those already used
tmpIdx = setdiff(tmpIdx, idx);
% Get the first new unique
idx(i) = tmpIdx(1);
end
% Convert index to row and col
[rows, cols] = ind2sub(size(A),idx)
Giving:
rows = 1 3 1 1 2
cols = 2 2 1 3 3
Note that as the linear indexing goes down column by column, the result here differs from the one in your example (although still a correct index)
rows = 1 3 1 1 1
columns= 2 2 1 3 4
But to get this you could just transpose the A matrix (A.') and flip the rows and cols (the result from ind2sub)
Here is on solution where I use for loop, I tried to optimize the number of iteration and the computational cost. If there is no corresponding value between B and A the row/col index return NaN.
[Bu,~,ord] = unique(B,'stable');
% Index of each different values
[col,row] = arrayfun(#(x) find(A'==x),Bu,'UniformOutput',0)
% For each value in vector B we search the first "non already used" corresponding value in A.
for i = 1:length(B)
if ~isempty(row{ord(i)})
r(i) = row{ord(i)}(1);
row{ord(i)}(1) = [];
c(i) = col{ord(i)}(1);
col{ord(i)}(1) = [];
else
r(i) = NaN;
c(i) = NaN;
end
end
RESULT:
c = [2 2 1 3 4]
r = [1 3 1 1 1]

Find duplicates in a matrix

consider a matrix:
a = [1 2
1 3
2 3
4 5
6 1]
I want to find duplicates for every unique element of a and take the rows of them to different matrices. For example here lets say that the answer for number 1 is:
a1 = [1 2
1 3
6 1]
The answer for number 2 is:
a2 = [1 2
2 3]
The answer for number 3 is:
a3 = [1 3
2 3]
and so on for every unique elements of matrix a. Any suggestions?
This will do it:
temp=unique(a);
for k=1:numel(temp)
[r,~]=find(a==temp(k));
assignin('base', ['a' num2str(k)], a(sort(r),:))
end
Results:-
>> a1
a1 =
1 2
1 3
6 1
>> a2
a2 =
1 2
2 3
>> a3
a3 =
1 3
2 3
>> a4
a4 =
4 5
>> a5
a5 =
4 5
>> a6
a6 =
6 1
You can use any to check if any element of a row contains the value you want. This will return a logical array that is true where the row contained the value. You can then use this to grab the relevant rows of a.
result = a(any(a == value, 2), :);
We could create an anonymous function that does this for you.
rows_that_contain_value = #(A, value)A(any(A == value, 2), :);
Then we can use this like this
a = [1 2
1 3
2 3
4 5
6 1]
a1 = rows_that_contain_value(a, 1);
a2 = rows_that_contain_value(a, 2);
a3 = rows_that_contain_value(a, 3);
If we want to do this for all unique values in a, we can do something like the following.
groups = arrayfun(#(x)rows_that_contain_value(a, x), unique(a), 'uniformoutput', 0);

Find the position of equal elements in a matrix using Matlab

Suppose I have:
m = [1,2,3;1,4,5;6,4,7]
I want to get a list containing the positions of the elements in the matrix m so that the positions of equal elements are grouped together. The output for matrix m must be:
{{1,1},{2,1}},{{2,2},{3,2}},{1,2},{1,3},{2,3},{3,1},{3,3}
% 1 2 3 4 5 6 7
We can see here that the positions for the elements that are all equal to each other are grouped together.
The simplest way would be to loop through every unique value and determine the row and column positions that match each value. Something like this could work:
val = unique(m);
pos = cell(1, numel(val));
for ii = 1 : numel(val)
[r,c] = find(m == val(ii));
pos{ii} = [r,c];
end
pos would be a cell array containing all of the positions for each unique value. We can show what these positions are by:
>> format compact; celldisp(pos)
pos{1} =
1 1
2 1
pos{2} =
1 2
pos{3} =
1 3
pos{4} =
2 2
3 2
pos{5} =
2 3
pos{6} =
3 1
pos{7} =
3 3
This of course is not meaningful unless you specifically show each unique value per group of positions. Therefore, we can try something like this instead where we can loop through each element in the cell array as well as display the corresponding element that each set of positions belongs to:
for ii = 1 : numel(val)
fprintf('Value: %f\n', val(ii));
fprintf('Positions:\n');
disp(pos{ii});
end
What I get is now:
Value: 1.000000
Positions:
1 1
2 1
Value: 2.000000
Positions:
1 2
Value: 3.000000
Positions:
1 3
Value: 4.000000
Positions:
2 2
3 2
Value: 5.000000
Positions:
2 3
Value: 6.000000
Positions:
3 1
Value: 7.000000
Positions:
3 3
This gives you what you want, except for the fact that indices of unique elements are also wrapped in cell twice, just like the indices of repeating elements:
m = [1,2,3;1,4,5;6,4,7];
[~, idx] = ismember(m(:), unique(m(:)));
linInd = 1:numel(m);
[i,j] = ind2sub(size(m), linInd);
res = accumarray(idx, linInd, [], #(x) {num2cell([i(x);j(x)]',2)});
Result:
>> celldisp(res)
res{1}{1} =
2 1
res{1}{2} =
1 1
res{2}{1} =
1 2
res{3}{1} =
1 3
res{4}{1} =
2 2
res{4}{2} =
3 2
res{5}{1} =
2 3
res{6}{1} =
3 1
res{7}{1} =
3 3

Making a match-and-append code more efficient without 'for' loop

I am trying to match 1st column of A with 1st to 3rd columns of B, and append corresponding 4th column of B to A.
For example,
A=
1 2
3 4
B=
1 2 4 5 4
1 2 3 5 3
1 1 1 1 2
3 4 5 6 5
I compare A(:,1) and B(:, 1:3)
1 and 3 are in A(:,1)
1 is in the 1st, 2nd, 3rd rows of B(:, 1:3), so append B([1 2 3], 4:end)' to A's 1st row.
3 is in the 2nd and 4th rows of B(:,1:3), so append B([2 4], 4:end)' to A's 2nd row.
So that it becomes:
1 2 5 4 5 3 1 2
3 4 5 3 6 5 0 0
I could code this using only for and if.
clearvars AA A B mem mem2 mem3
A = [1 2 ; 3 4]
B = [1 2 4 5 4; 1 2 3 5 3; 1 1 1 1 2; 3 4 5 6 5]
for n=1:1:size(A,1)
mem = ismember(B(:,[1:3]), A(n,1));
mem2 = mem(:,1) + mem(:,2) + mem(:,3);
mem3 = find(mem2>0);
AA{n,:} = horzcat( A(n,:), reshape(B(mem3,[4,5])',1,[]) ); %'
end
maxLength = max(cellfun(#(x)numel(x),AA));
out = cell2mat(cellfun(#(x)cat(2,x,zeros(1,maxLength-length(x))),AA,'UniformOutput',false))
I am trying to make this code efficient, by not using for and if, but couldn't find an answer.
Try this
a = A(:,1);
b = B(:,1:3);
z = size(b);
b = repmat(b,[1,1,numel(a)]);
ab = repmat(permute(a,[2,3,1]),z);
row2 = mat2cell(permute(sum(ab==b,2),[3,1,2]),ones(1,numel(a)));
AA = cellfun(#(x)(reshape(B(x>0,4:end)',1,numel(B(x>0,4:end)))),row2,'UniformOutput',0);
maxLength = max(cellfun(#(x)numel(x),AA));
out = cat(2,A,cell2mat(cellfun(#(x)cat(2,x,zeros(1,maxLength-length(x))),AA,'UniformOutput',false)))
UPDATE Below code runs in almost same time as the iterative code
a = A(:,1);
b = B(:,1:3);
z = size(b);
b = repmat(b,[1,1,numel(a)]);
ab = repmat(permute(a,[2,3,1]),z);
df = permute(sum(ab==b,2),[3,1,2])';
AA = arrayfun(#(x)(B(df(:,x)>0,4:end)),1:size(df,2),'UniformOutput',0);
AA = arrayfun(#(x)(reshape(AA{1,x}',1,numel(AA{1,x}))),1:size(AA,2),'UniformOutput',0);
maxLength = max(arrayfun(#(x)(numel(AA{1,x})),1:size(AA,2)));
out2 = cell2mat(arrayfun(#(x,i)((cat(2,A(i,:),AA{1,x},zeros(1,maxLength-length(AA{1,x}))))),1:numel(AA),1:size(A,1),'UniformOutput',0));
How about this:
%# example data
A = [1 2
3 4];
B = [1 2 4 5 4
1 2 3 5 3
1 1 1 1 2
3 4 5 6 5];
%# rename for clarity & reshape for algorithm's convenience
needle = permute(A(:,1), [2 3 1]);
haystack = B(:,1:3);
data = B(:,4:end).';
%# Get the relevant rows of 'haystack' for each entry in 'needle'
inds = any(bsxfun(#eq, haystack, needle), 2);
%# Create data that should be appended to A
%# All data and functionality in this loop is local and static, so speed
%# should be optimal.
append = zeros( size(A,1), numel(data) );
for ii = 1:size(inds,3)
newrow = data(:,inds(:,:,ii));
append(ii,1:numel(newrow)) = newrow(:);
end
%# Now append to A, stripping unneeded zeros
A = [A append(:, ~all(append==0,1))]

How to concatenate two datasets if not all the variables are the same?

I have two datasets that have different number of columns:
ds1:
A B C
1 2 3
ds2:
A C D
2 3 4
Now I want to merge these two datasets:
result:
A B C D
1 2 3 0
2 3 0 4
As you can see, I just want to add 0, NaN or a blank if the variable names are not present in both datasets. I tried to use cat and join, but I can't figure out how to do it. Any hints?
Here is an ugly way to do it - and then a cleaner way below (added later). Problem is that as soon as you are working with cell arrays (since the data type is mixed - letters for the columns, then numbers) life gets hard. You can probably do better by creating a structure where column names and data are two separate arrays (see below)... but for now here is "a solution". I made life a little bit more interesting by having different numbers of rows in the two datasets as well as different numbers of columns - just to make sure that didn't break something.
ds1 = {'a','bb','c';1,2,3};
ds2 = {'aa','c','d', 'e';2,3,4,5; 5,6,7,8};
cols = unique({ds1{1,:} ds2{1,:}});
ds3 = cols;
n1 = size(ds1,1) - 1;
%%
for ii = 1:size(ds1,2)
ci = find(cellfun(#(x) isequal(x, ds1{1,ii}), cols));
if numel(ci) > 0
for jj = 1:n1
ds3{1+jj,ci} = ds1{1+jj, ii};
end
end
end
n2 = size(ds2, 1) - 1;
for ii = 1:size(ds2,2)
ci = find(cellfun(#(x) isequal(x, ds2{1,ii}), cols));
if numel(ci) > 0
for jj = 1:n2
ds3{1+n1+jj,ci} = ds2{1+jj, ii};
end
end
end
The resulting merged array:
'a' 'aa' 'bb' 'c' 'd' 'e'
[1] [] [ 2] [3] [] []
[] [ 2] [] [3] [4] [5]
[] [ 5] [] [6] [7] [8]
Not optimal, I'm sure - but it does what you asked... I hate doing this in loops but couldn't see a way around it. I hope one of the "real Matlab experts" will puke when he sees this and be spurred into giving you the clever one line answer.
EDIT I thought about this some more, and came up with a much more efficient algorithm:
% assuming column headers and data are in two separate arrays
ds1headers = {'a','bb','c'};
ds1data = [1 2 3; 2 3 4];
ds2headers = {'aa','c','d', 'e'};
ds2data = [2 3 4 5; 3 4 5 6; 4 5 6 7];
% as before, find unique column headers:
cols = unique({ds1headers{:} ds2headers{:}});
% convert to column numbers:
ds1conv = cellfun(#(x)find(ismember(cols, x)), ds1headers);
ds2conv = cellfun(#(x)find(ismember(cols, x)), ds2headers);
% now conversion is easy:
n1 = size(ds1data,1);
n2 = size(ds2data,1);
ds3data = zeros(n1+n2, numel(cols));
ds3data(1:n1, ds1conv) = ds1data;
ds3data(n1+(1:n2), ds2conv) = ds2data;
disp(cols)
disp(ds3data)
The result is
'a' 'aa' 'bb' 'c' 'd' 'e'
1 0 2 3 0 0
2 0 3 4 0 0
0 2 0 3 4 5
0 3 0 4 5 6
0 4 0 5 6 7
Looks like it would do the trick - and no ugly loops... I recognize now that this looks a little bit like #Magla's solution below (hadn't seen it when I posted my update, but it was clearly there before my latest edit) - except I still have a cell array for column names, and a few other improvements.
I would go for something like this. It fills the final matrix with zeros.
%examples (ABCD are replaced by indexes 1234)
A = [1 2 3; 11 12 13];
B = [1 3 5 8; 111 112 113 114];
%first mix the first rows of A and B
header = union(A(1,:), B(1,:))
%find the corresponding indexes in A and B
[Lia,LocbA] = ismember(A(1,:),header);
[Lia,LocbB] = ismember(B(1,:),header);
%concatenate the second rows of A and B
C = header
C(2,LocbA) = A(2,:);
C(3,LocbB) = B(2,:);
Results:
A =
1 2 3
11 12 13
B =
1 3 5 8
111 112 113 114
C =
1 2 3 5 8
11 12 13 0 0
111 0 112 113 114
EDIT: the code initially provided works with cells too (see below for an example). In this case, it fills the final cell array with empty cells. Contrary to #Floris solution, datasets to be merged are composed of both the column headers (first row) and the data (second row). I guess the data format you have will suit one of the two solutions.
%input modification (now with cells)
A = {'A' 'B' 'C'; 11 12 13};
B = {'A' 'C' 'E' 'H'; 111 112 113 114};
Results:
C =
'A' 'B' 'C' 'E' 'H'
[ 11] [12] [ 13] [] []
[111] [] [112] [113] [114]
First we create the example:
% Create test datasets:
A=1;
B=2;
C=3;
save db1
A=2;
clear B;
D=4;
save db2
clear;
Now the script would look more or less like:
% Your script starts here, replace your paths with the correct paths:
path_to_db1 = 'db1';
path_to_db2 = 'db2';
db1 = load(path_to_db1);
db2 = load(path_to_db2);
merge = db1;
for field = fieldnames(db1)'
field = field{1};
if isfield(db2,field)
merge.(field) = [merge.(field);db2.(field)];
else
merge.(field) = [merge.(field);0];
end
end
for field = fieldnames(db2)'
field = field{1};
if ~isfield(db1,field)
merge.(field) = [0;db2.(field)];
end
end
clear db1 db2;
The output:
>> merge.A
ans =
1
2
>> merge.B
ans =
2
0
>> merge.C
ans =
3
3
>> merge.D
ans =
0
4
But you may want them to be free variables on the workspace, not on the merge struct, so you may add the following code:
for field = fieldnames(merge)'
field=field{1};
eval(sprintf('%s = merge.%s;',field,field));
end