Joining data from different cell arrays in Matlab - matlab

I have data in Matlab that is in cell array format with columns representing different items. The cell arrays have different columns, as in the following example:
a = {'A', 'B', 'C' ; 1, 1, 1; 2, 2, 2 }
a =
'A' 'B' 'C'
[1] [1] [1]
[2] [2] [2]
b = {'C', 'D'; 3, 3; 4, 4}
b =
'C' 'D'
[3] [3]
[4] [4]
I would like to be able to join the different cell arrays in the following manner:
c =
'A' 'B' 'C' 'D'
[1] [1] [1] [NaN]
[2] [2] [2] [NaN]
[NaN] [NaN] [3] [3]
[NaN] [NaN] [4] [4]
In the real example I have hundreds of columns and several rows, so creating a new cell array manually is not an option for me.

If you were willing to store your data in dataset arrays (or convert them to dataset arrays for this purpose), you could do the following:
>> d1
d1 =
A B C
1 1 1
2 2 2
>> d2
d2 =
C D
3 3
4 4
>> join(d1,d2,'Keys','C','type','outer','mergekeys',true)
ans =
A B C D
1 1 1 NaN
2 2 2 NaN
NaN NaN 3 3
NaN NaN 4 4

I'm assuming you want to join the two arrays based on their first row only.
% get the list of all keys
keys = unique([a(1,:) b(1,:)]);
lena = size(a,1)-1; lenb = size(b,1)-1;
% allocate space for the joined array
joined = cell(lena+lenb+1, length(keys));
joined(1,:) = keys;
% add a
tf = ismember(keys, a(1,:));
joined(2:(2+lena-1),tf) = a(2:end,:);
% add b
tf = ismember(keys, b(1,:));
joined((lena+2):(lena+lenb+1),tf) = b(2:end,:);
This will give you the joined array except that it has empty cells instead NaNs. I hope this is OK.

Here is my solution adapted from an old another to a similar question (simply transpose rows/columns):
%# input cell arrays
a = {'A', 'B', 'C' ; 1, 1, 1; 2, 2, 2 };
b = {'C', 'D'; 3, 3; 4, 4};
%# transpose rows/columns
a = a'; b = b';
%# get all key values, and convert them to indices starting at 1
[allKeys,~,ind] = unique( [a(:,1);b(:,1)] );
indA = ind(1:size(a,1));
indB = ind(size(a,1)+1:end);
%# merge the two datasets (key,value1,value2)
c = cell(numel(allKeys), size(a,2)+size(b,2)-1);
c(:) = {NaN}; %# fill with NaNs
c(:,1) = allKeys; %# available keys from both
c(indA,2:size(a,2)) = a(:,2:end); %# insert 1st dataset values
c(indB,size(a,2)+1:end) = b(:,2:end); %# insert 2nd dataset values
Here is the result (transposed to match original orientation):
>> c'
ans =
'A' 'B' 'C' 'D'
[ 1] [ 1] [1] [NaN]
[ 2] [ 2] [2] [NaN]
[NaN] [NaN] [3] [ 3]
[NaN] [NaN] [4] [ 4]
Also here is the solution using the DATASET class from the Statistics Toolbox:
aa = dataset([cell2mat(a(2:end,:)) a(1,:)])
bb = dataset([cell2mat(b(2:end,:)) b(1,:)])
cc = join(aa,bb, 'Keys',{'C'}, 'type','fullouter', 'MergeKeys',true)
with
cc =
A B C D
1 1 1 NaN
2 2 2 NaN
NaN NaN 3 3
NaN NaN 4 4

Related

List cell contents in two columns of MATLAB cell array

I'm trying to display the contents of a cell array, which contain two columns, in a nice two column format in the command window.
tmp = [1:10]';
a{:,1} = tmp;
a{:,2} = dec2hex(tmp);
celldisp(a)
I would like the output to have the decimal values in the first column and hex values in the second column. Unfortunately I get:
celldisp(a)
a{1} =
1
2
3
4
5
6
7
8
9
10
a{2} =
1
2
3
4
5
6
7
8
9
A
I am trying to get something that looks more like this:
I also tried the table function but this gave:
Use num2cell to place each element of a into a separate cell.
disp([num2cell(a{1}) num2cell(a{2})]);
%Output:
% [ 1] '1'
% [ 2] '2'
% [ 3] '3'
% [ 4] '4'
% [ 5] '5'
% [ 6] '6'
% [ 7] '7'
% [ 8] '8'
% [ 9] '9'
% [10] 'A'

Join time series in matlab and replace missing data points with NaN [duplicate]

I have two matrices like the following ones:
'01/01/2010' 1
'02/01/2010' 2
'03/01/2010' 3
'05/01/2010' 11
'06/01/2010' 17
'01/01/2010' 4
'02/01/2010' 5
'04/01/2010' 6
'05/01/2010' 7
, and after doing a few tricky things in MATLAB, I want to create the following three matrices:
'01/01/2010' 1 4
'02/01/2010' 2 5
'03/01/2010' 3 NaN
'04/01/2010' NaN 6
'05/01/2010' 11 7
'06/01/2010' 17 NaN
'01/01/2010' 1 4
'02/01/2010' 2 5
'05/01/2010' 11 7
Any idea on how to join these tables?
Cheers.
EDIT: Really sorry for my typos, guys. I updated both the question and the input/output data. Please, feel free to provide suggestions.
I believe what you are trying to achieve are called inner join, and full outer join in the database world.
First we start with the two datasets:
d1 = {
'01/01/2010' 1
'02/01/2010' 2
'03/01/2010' 3
'05/01/2010' 11
'06/01/2010' 17
};
d2 = {
'01/01/2010' 4
'02/01/2010' 5
'04/01/2010' 6
'05/01/2010' 7
};
Here is the code to perform the two types of join:
%# get all possible dates, and convert them to indices starting at 1
[keys,~,ind] = unique( [d1(:,1);d2(:,1)] );
%# full outer join
ind1 = ind(1:size(d1,1));
ind2 = ind(size(d1,1)+1:end);
fullOuterJoin = cell(numel(keys),3);
fullOuterJoin(:) = {NaN}; %# fill with NaNs
fullOuterJoin(:,1) = keys; %# union of dates
fullOuterJoin(ind1,2) = d1(:,2); %# insert 1st dataset values
fullOuterJoin(ind2,3) = d2(:,2); %# insert 2nd dataset values
%# inner join
loc1 = ismember(ind1, ind2);
loc2 = ismember(ind2, ind1);
innerJoin = cell(sum(loc1),3);
innerJoin(:,1) = d1(loc1,1); %# intersection of dates
innerJoin(:,2) = d1(loc1,2); %# insert 1st dataset values
innerJoin(:,3) = d2(loc2,2); %# insert 2nd dataset values
Alternatively, we could have extracted the inner join from the outer join dataset by simply removing rows with any NaN values:
idx = all(~isnan(cell2mat(fullOuterJoin(:,2:end))), 2);
innerJoin = fullOuterJoin(idx,:);
Either way, the result:
>> fullOuterJoin
fullOuterJoin =
'01/01/2010' [ 1] [ 4]
'02/01/2010' [ 2] [ 5]
'03/01/2010' [ 3] [NaN]
'04/01/2010' [NaN] [ 6]
'05/01/2010' [ 11] [ 7]
'06/01/2010' [ 17] [NaN]
>> innerJoin
innerJoin =
'01/01/2010' [ 1] [4]
'02/01/2010' [ 2] [5]
'05/01/2010' [11] [7]
In MATLAB, you cannot have strings as matrix elements. For that you need to use a cell array. This is a solution using cell arrays and containers.Maps.
FirstCellArray = {
'01/01/2010', 1;
'02/01/2010', 2;
'03/01/2010', 3;
'05/01/2010', 11;
'06/01/2010', 17
};
SecondCellArray = {
'01/01/2010', 4;
'02/01/2010', 5;
'04/01/2010', 6;
'05/01/2010', 7;
};
AllDatesCellArray = union(FirstCellArray(:,1), SecondCellArray(:,1));
% Create containers.Maps for both cell arrays. containers.Maps are hash tables.
DateToFirstNumberMap = containers.Map(FirstCellArray(:,1), FirstCellArray(:,2));
DateToSecondNumberMap = containers.Map(SecondCellArray(:,1), SecondCellArray(:,2));
WithNaNsCellArray = AllDatesCellArray;
for Index = 1:size(WithNaNsCellArray, 1)
Key = AllDatesCellArray{Index, 1};
try
NumberOne = cell2mat(values(DateToFirstNumberMap, cellstr(Key)));
catch
NumberOne = NaN;
end
WithNaNsCellArray{Index, 2} = NumberOne;
try
NumberTwo = cell2mat(values(DateToSecondNumberMap, cellstr(Key)));
catch
NumberTwo = NaN;
end
WithNaNsCellArray{Index, 3} = NumberTwo;
end
WithoutNaNsCellArray = WithNaNsCellArray;
NaNIndicesVector = (isnan([WithNaNsCellArray{:,2}]) | isnan([WithNaNsCellArray{:,3}]));
WithoutNaNsCellArray(NaNIndicesVector == 1, :) = [];
Then WithNaNsCellArray contains the result with NaN rows and WithoutNaNsCellArray contains the result without NaN rows.
WithNaNsCellArray =
'01/01/2010' [ 1] [ 4]
'02/01/2010' [ 2] [ 5]
'03/01/2010' [ 3] [NaN]
'04/01/2010' [NaN] [ 6]
'05/01/2010' [ 11] [ 7]
'06/01/2010' [ 17] [NaN]
WithoutNaNsCellArray =
'01/01/2010' [ 1] [4]
'02/01/2010' [ 2] [5]
'05/01/2010' [11] [7]
The statistics toolbox contains a function called JOIN that basically does what you want.
http://www.mathworks.de/de/help/stats/dataset.join.html
Unfortunately, it probably can't handle strings and polytyped matrices. But you might be able to use JOIN to shorten the solutions proposed by the other answers.

Elegant way of stripping the 1st layer of cell array in MATLAB?

I have a 1x2 cell array a such that
a{1, 1} is a 5x1 int array containing [1 2 3 4 5]
a{1, 2} is a 5x1 cell array containing 'aa', 'bb', 'cc', 'dd', 'ee'
What is the most elegant way of stripping the first layer, producing a 5x2 cell array as follows?
1 'aa'
2 'bb'
3 'cc'
4 'dd'
5 'ee'
How about:
% original cell
a = cell(1,2);
a{1} = [1 2 3 4 5];
a{2} = {'aa', 'bb', 'cc', 'dd', 'ee'};
% flattened
aa = reshape([num2cell(a{1}) a{2}], [], 2)
I figured one solution out, but am not sure about its "elegantness".
a = cell(1, 2);
a{1, 1} = [1, 2, 3, 4, 5];
a{1, 2} = {'aa','bb','cc','dd','ee'};
result = [num2cell(a{1, 1})' a{1, 2}']
result =
[1] 'aa'
[2] 'bb'
[3] 'cc'
[4] 'dd'
[5] 'ee'
You can try this code:
a{1, 1} = [1,2,3,4,5];
a{1, 2} = {'aa','bb','cc','dd','ee'};
temp = num2cell(a{1});
b = {temp{:};a{2}{:}}.'
b =
[1] 'aa'
[2] 'bb'
[3] 'cc'
[4] 'dd'
[5] 'ee'

How to concatenate two datasets if not all the variables are the same?

I have two datasets that have different number of columns:
ds1:
A B C
1 2 3
ds2:
A C D
2 3 4
Now I want to merge these two datasets:
result:
A B C D
1 2 3 0
2 3 0 4
As you can see, I just want to add 0, NaN or a blank if the variable names are not present in both datasets. I tried to use cat and join, but I can't figure out how to do it. Any hints?
Here is an ugly way to do it - and then a cleaner way below (added later). Problem is that as soon as you are working with cell arrays (since the data type is mixed - letters for the columns, then numbers) life gets hard. You can probably do better by creating a structure where column names and data are two separate arrays (see below)... but for now here is "a solution". I made life a little bit more interesting by having different numbers of rows in the two datasets as well as different numbers of columns - just to make sure that didn't break something.
ds1 = {'a','bb','c';1,2,3};
ds2 = {'aa','c','d', 'e';2,3,4,5; 5,6,7,8};
cols = unique({ds1{1,:} ds2{1,:}});
ds3 = cols;
n1 = size(ds1,1) - 1;
%%
for ii = 1:size(ds1,2)
ci = find(cellfun(#(x) isequal(x, ds1{1,ii}), cols));
if numel(ci) > 0
for jj = 1:n1
ds3{1+jj,ci} = ds1{1+jj, ii};
end
end
end
n2 = size(ds2, 1) - 1;
for ii = 1:size(ds2,2)
ci = find(cellfun(#(x) isequal(x, ds2{1,ii}), cols));
if numel(ci) > 0
for jj = 1:n2
ds3{1+n1+jj,ci} = ds2{1+jj, ii};
end
end
end
The resulting merged array:
'a' 'aa' 'bb' 'c' 'd' 'e'
[1] [] [ 2] [3] [] []
[] [ 2] [] [3] [4] [5]
[] [ 5] [] [6] [7] [8]
Not optimal, I'm sure - but it does what you asked... I hate doing this in loops but couldn't see a way around it. I hope one of the "real Matlab experts" will puke when he sees this and be spurred into giving you the clever one line answer.
EDIT I thought about this some more, and came up with a much more efficient algorithm:
% assuming column headers and data are in two separate arrays
ds1headers = {'a','bb','c'};
ds1data = [1 2 3; 2 3 4];
ds2headers = {'aa','c','d', 'e'};
ds2data = [2 3 4 5; 3 4 5 6; 4 5 6 7];
% as before, find unique column headers:
cols = unique({ds1headers{:} ds2headers{:}});
% convert to column numbers:
ds1conv = cellfun(#(x)find(ismember(cols, x)), ds1headers);
ds2conv = cellfun(#(x)find(ismember(cols, x)), ds2headers);
% now conversion is easy:
n1 = size(ds1data,1);
n2 = size(ds2data,1);
ds3data = zeros(n1+n2, numel(cols));
ds3data(1:n1, ds1conv) = ds1data;
ds3data(n1+(1:n2), ds2conv) = ds2data;
disp(cols)
disp(ds3data)
The result is
'a' 'aa' 'bb' 'c' 'd' 'e'
1 0 2 3 0 0
2 0 3 4 0 0
0 2 0 3 4 5
0 3 0 4 5 6
0 4 0 5 6 7
Looks like it would do the trick - and no ugly loops... I recognize now that this looks a little bit like #Magla's solution below (hadn't seen it when I posted my update, but it was clearly there before my latest edit) - except I still have a cell array for column names, and a few other improvements.
I would go for something like this. It fills the final matrix with zeros.
%examples (ABCD are replaced by indexes 1234)
A = [1 2 3; 11 12 13];
B = [1 3 5 8; 111 112 113 114];
%first mix the first rows of A and B
header = union(A(1,:), B(1,:))
%find the corresponding indexes in A and B
[Lia,LocbA] = ismember(A(1,:),header);
[Lia,LocbB] = ismember(B(1,:),header);
%concatenate the second rows of A and B
C = header
C(2,LocbA) = A(2,:);
C(3,LocbB) = B(2,:);
Results:
A =
1 2 3
11 12 13
B =
1 3 5 8
111 112 113 114
C =
1 2 3 5 8
11 12 13 0 0
111 0 112 113 114
EDIT: the code initially provided works with cells too (see below for an example). In this case, it fills the final cell array with empty cells. Contrary to #Floris solution, datasets to be merged are composed of both the column headers (first row) and the data (second row). I guess the data format you have will suit one of the two solutions.
%input modification (now with cells)
A = {'A' 'B' 'C'; 11 12 13};
B = {'A' 'C' 'E' 'H'; 111 112 113 114};
Results:
C =
'A' 'B' 'C' 'E' 'H'
[ 11] [12] [ 13] [] []
[111] [] [112] [113] [114]
First we create the example:
% Create test datasets:
A=1;
B=2;
C=3;
save db1
A=2;
clear B;
D=4;
save db2
clear;
Now the script would look more or less like:
% Your script starts here, replace your paths with the correct paths:
path_to_db1 = 'db1';
path_to_db2 = 'db2';
db1 = load(path_to_db1);
db2 = load(path_to_db2);
merge = db1;
for field = fieldnames(db1)'
field = field{1};
if isfield(db2,field)
merge.(field) = [merge.(field);db2.(field)];
else
merge.(field) = [merge.(field);0];
end
end
for field = fieldnames(db2)'
field = field{1};
if ~isfield(db1,field)
merge.(field) = [0;db2.(field)];
end
end
clear db1 db2;
The output:
>> merge.A
ans =
1
2
>> merge.B
ans =
2
0
>> merge.C
ans =
3
3
>> merge.D
ans =
0
4
But you may want them to be free variables on the workspace, not on the merge struct, so you may add the following code:
for field = fieldnames(merge)'
field=field{1};
eval(sprintf('%s = merge.%s;',field,field));
end

Join Matrices in MATLAB

I have two matrices like the following ones:
'01/01/2010' 1
'02/01/2010' 2
'03/01/2010' 3
'05/01/2010' 11
'06/01/2010' 17
'01/01/2010' 4
'02/01/2010' 5
'04/01/2010' 6
'05/01/2010' 7
, and after doing a few tricky things in MATLAB, I want to create the following three matrices:
'01/01/2010' 1 4
'02/01/2010' 2 5
'03/01/2010' 3 NaN
'04/01/2010' NaN 6
'05/01/2010' 11 7
'06/01/2010' 17 NaN
'01/01/2010' 1 4
'02/01/2010' 2 5
'05/01/2010' 11 7
Any idea on how to join these tables?
Cheers.
EDIT: Really sorry for my typos, guys. I updated both the question and the input/output data. Please, feel free to provide suggestions.
I believe what you are trying to achieve are called inner join, and full outer join in the database world.
First we start with the two datasets:
d1 = {
'01/01/2010' 1
'02/01/2010' 2
'03/01/2010' 3
'05/01/2010' 11
'06/01/2010' 17
};
d2 = {
'01/01/2010' 4
'02/01/2010' 5
'04/01/2010' 6
'05/01/2010' 7
};
Here is the code to perform the two types of join:
%# get all possible dates, and convert them to indices starting at 1
[keys,~,ind] = unique( [d1(:,1);d2(:,1)] );
%# full outer join
ind1 = ind(1:size(d1,1));
ind2 = ind(size(d1,1)+1:end);
fullOuterJoin = cell(numel(keys),3);
fullOuterJoin(:) = {NaN}; %# fill with NaNs
fullOuterJoin(:,1) = keys; %# union of dates
fullOuterJoin(ind1,2) = d1(:,2); %# insert 1st dataset values
fullOuterJoin(ind2,3) = d2(:,2); %# insert 2nd dataset values
%# inner join
loc1 = ismember(ind1, ind2);
loc2 = ismember(ind2, ind1);
innerJoin = cell(sum(loc1),3);
innerJoin(:,1) = d1(loc1,1); %# intersection of dates
innerJoin(:,2) = d1(loc1,2); %# insert 1st dataset values
innerJoin(:,3) = d2(loc2,2); %# insert 2nd dataset values
Alternatively, we could have extracted the inner join from the outer join dataset by simply removing rows with any NaN values:
idx = all(~isnan(cell2mat(fullOuterJoin(:,2:end))), 2);
innerJoin = fullOuterJoin(idx,:);
Either way, the result:
>> fullOuterJoin
fullOuterJoin =
'01/01/2010' [ 1] [ 4]
'02/01/2010' [ 2] [ 5]
'03/01/2010' [ 3] [NaN]
'04/01/2010' [NaN] [ 6]
'05/01/2010' [ 11] [ 7]
'06/01/2010' [ 17] [NaN]
>> innerJoin
innerJoin =
'01/01/2010' [ 1] [4]
'02/01/2010' [ 2] [5]
'05/01/2010' [11] [7]
In MATLAB, you cannot have strings as matrix elements. For that you need to use a cell array. This is a solution using cell arrays and containers.Maps.
FirstCellArray = {
'01/01/2010', 1;
'02/01/2010', 2;
'03/01/2010', 3;
'05/01/2010', 11;
'06/01/2010', 17
};
SecondCellArray = {
'01/01/2010', 4;
'02/01/2010', 5;
'04/01/2010', 6;
'05/01/2010', 7;
};
AllDatesCellArray = union(FirstCellArray(:,1), SecondCellArray(:,1));
% Create containers.Maps for both cell arrays. containers.Maps are hash tables.
DateToFirstNumberMap = containers.Map(FirstCellArray(:,1), FirstCellArray(:,2));
DateToSecondNumberMap = containers.Map(SecondCellArray(:,1), SecondCellArray(:,2));
WithNaNsCellArray = AllDatesCellArray;
for Index = 1:size(WithNaNsCellArray, 1)
Key = AllDatesCellArray{Index, 1};
try
NumberOne = cell2mat(values(DateToFirstNumberMap, cellstr(Key)));
catch
NumberOne = NaN;
end
WithNaNsCellArray{Index, 2} = NumberOne;
try
NumberTwo = cell2mat(values(DateToSecondNumberMap, cellstr(Key)));
catch
NumberTwo = NaN;
end
WithNaNsCellArray{Index, 3} = NumberTwo;
end
WithoutNaNsCellArray = WithNaNsCellArray;
NaNIndicesVector = (isnan([WithNaNsCellArray{:,2}]) | isnan([WithNaNsCellArray{:,3}]));
WithoutNaNsCellArray(NaNIndicesVector == 1, :) = [];
Then WithNaNsCellArray contains the result with NaN rows and WithoutNaNsCellArray contains the result without NaN rows.
WithNaNsCellArray =
'01/01/2010' [ 1] [ 4]
'02/01/2010' [ 2] [ 5]
'03/01/2010' [ 3] [NaN]
'04/01/2010' [NaN] [ 6]
'05/01/2010' [ 11] [ 7]
'06/01/2010' [ 17] [NaN]
WithoutNaNsCellArray =
'01/01/2010' [ 1] [4]
'02/01/2010' [ 2] [5]
'05/01/2010' [11] [7]
The statistics toolbox contains a function called JOIN that basically does what you want.
http://www.mathworks.de/de/help/stats/dataset.join.html
Unfortunately, it probably can't handle strings and polytyped matrices. But you might be able to use JOIN to shorten the solutions proposed by the other answers.