How to copy one table column to another respecting row names? - matlab

In the following toy example, tables t1 and t2 have shapes (3 x 0) and (3 x 1), respectively. Furthermore, both tables have the same row names.
>> t1 = table('RowNames', {'a', 'b', 'c'});
>> t2 = table([3 ; 2 ; 1], ...
'RowNames', {'c', 'a', 'b'}, 'VariableNames', {'x'});
Then a copy of t2's single column is added to t1 as a new column, with the same variable name.
>> t1.('x') = t2.('x');
The resulting table t1, however, differs from t2 in the association between row names and the values in the x-column:
>> t1({'a', 'b', 'c'}, :)
ans =
x
_
a 3
b 2
c 1
>> t2({'a', 'b', 'c'}, :)
ans =
x
_
a 2
b 1
c 3
What's the simplest way to assign t2.('x') to t1.('x') "respecting rownames"? By this last condition I mean that the final t1 should look just like t2; e.g.:
>> t1({'a', 'b', 'c'}, :)
ans =
x
_
a 2
b 1
c 3

You can index the table using row names so if you extract the list of rownames from t1 you can use that as the ordering for t2:
order = t1.Properties.RowNames % cell array
intermediate = t2(order, :);
or just do it all in one go:
t2(t1.Properties.RowNames, :);

Since t1 doesn't have the x column you can concatenate t1 with column x of t2
>> t1=[t1, t2(:,'x')]
t1 =
x
_
a 2
b 1
c 3
It will automatically take care of matching rows.

OK, this is the OP here.
I found a (potential) answer to my question: instead of
t1.('x') = t2.('x');
use
t1.('x') = t2{t1.Properties.RowNames, 'x'};
I say that this is a "potential" answer because with MATLAB I never know when something that works for a particular type, or under particular circumstances, will generalize. E.g., at this point I don't know if the above will work if column x holds non-numeric values.
If someone knows of a better way, or can point to documentation in support of my lucky guess here, please post it. I'll be glad to accept it as the answer.

Related

Collapsing and averaging redundant entries in MATLAB table

I have the following MATLAB table
item_a item_b score
a b 1
a b 1
b c 3
d e 2
d e 1
d e 0
I want to average the redundant rows. The desired result is as follows:
item_a item_b score
a b (1+1)/2
b c 3
d e (2+1+0)/3
This is a classic scenario for the findgroups, split-apply workflow.
Given your table named t:
% Find mean values.
G = findgroups(t.item_a);
meanValues = splitapply(#mean,t.score,G);
% Create new table.
[~,i] = unique(G);
newTable = t(i,:);
newTable.score = meanValues
newTable contains the desired table.
See this documentation page for more examples.
This is what I got. You can tweak with the final results. There is a similar example on MATLAB documentation. Here are two key functions, accumarray and unique. Note that this solution works only for array inputs not cell data types. By manipulating data types, you can also find the solution for table and cell data types. Otherwise, I think for loop will be necessary.
items = ['a' 'b'
'a' 'b'
'b' 'c'
'd' 'e'
'd' 'e'
'd' 'e' ];
scores = [1 1 3 2 1 0]';
[items_unique,ia,ic] = unique(items,'rows');
score_mean = accumarray(ic,scores, [], #mean);
result = {items_unique score_mean};

How can using the existing fields, the value of another field achieved?

I have a Table as follows:
cl c2 c3 .....
r1 x A 4
r2 y B 5
r3 z C 2
.
.
.
r(1,2,3) are label of rows and c(1,2,3) are label of columns. I have a field of c1,c2 and I want c3. For example I have y and B, so I want achieve to '5';
I read 'Find , sub2ind' Functions but I do not know how can I use them For this case.
You can use simple logical indexing to accomplish this. You want the third column when the value of the first column is 'y' and the value of the second column is 'B'
t = table({'x'; 'y'; 'z'}, {'A'; 'B'; 'C'}, [4; 5; 2], 'VariableNames', {'c1', 'c2', 'c3'});
value = t.c3(ismember(t.c1, 'y') & ismember(t.c2, 'B'))
% 5

Why can a matlab table variable be changed using the old value, but a cell string can not?

I can rename a matlab table variable using its current value. E.g. with a table x this works ok:
x.Properties.VariableNames{'Value'} = 'New_Variable_Name'; % this works
So why doesn't the same thing work with a cell array of strings?
y = {'aa', 'bb'};
y{'bb'} = 'cc'; % this does not work
The right hand side of this assignment has too few values to satisfy the left hand side.
What is the reason? The two objects appear to have (or at least, return) the same class (cell).
>> class(x.Properties.VariableNames)
ans =
cell
>> class(y)
ans =
cell
Is this behaviour specific to matlab tables?
Yes, this is specific to tables. It doesn't work on cell arrays because they are indexed with a numeric index from 1 to numel(cellstr) rather than by their values.
To see why it might be a problem to index by values, consider what you would expect to happen in the following case -
y = {'a', 'b', 'b'};
y{'b'} = 'c';
Do you expect {'a', 'c', 'c'} or {'a', 'c', 'b'} to be the result?
Note that if you want behaviour like this, you can do the string comparison manually -
>> y = {'aa', 'bb'};
>> y{ strcmp(y, 'bb') } = 'cc';
>> y
y =
'aa' 'cc'
or, if you want to update multiple values simultaneously,
>> y = {'aa', 'bb', 'bb'};
>> y( strcmp(y, 'bb') ) = {'cc'};
>> y
y =
'aa' 'cc' 'cc'
I believe the answer lies within matlab.internal.table.parseArg, which is used in table.m. when you say:
x.Properties.VariableNames{'Value'} = 'New_Variable_Name';
A string comparison is ran on Value. I believe this is used to find the index of Value in VariableNames, enabling you to set the element to New_Variable_Name.

Reshape Matlab table

I have the following table
name = ['A' 'A' 'A' 'B' 'B' 'C' 'C' 'C' 'C' 'D' 'D' 'E' 'E' 'E']';
value = randn(14, 1);
T = table(name, value);
i,e.
T =
name value
____ _________
A 0.0015678
A -0.76226
A 0.98404
B -1.0942
B 0.71249
C 1.688
C 1.4001
C -0.9278
C -1.3725
D 0.11563
D 0.076776
E 1.0568
E 1.1972
E 0.29037
I want to transform it in the following way: take the first two cells in value corresponding to different values in name and put it in the 5x2 matrix. This matrix would have rows corresponding to different names A,B,C,D,E and columns corresponding to values, e.g. the first two rows are
0.0015678 -0.76226
-1.0942 0.71249
This can be done with accumarray using a custom function. The first step is to convert the name column of T into a numeric vector; and then accumarray can be applied.
This approach requires T being sorted according to column 1, because only in this case is accumarray guaranteed to preserve order (as indicated in its documentation). So if T may not be sorted (although it is in your example), sort it first using sortrows.
T = sortrows(T, 1); %// you can remove this line if T is guaranteed to be sorted
[~, ~, names] = unique(T(:,1)); %// names as a numeric vector
result = cell2mat(accumarray(names, T.value, [], #(x) {x([1 2]).'}));
First figure out where each name has values located in the table, then cycle through each name and place the first two values encountered for each name into individual cell arrays. Once you're done, reshape the matrix to 5 x 2 as you have said. As such, do something like this:
names = unique(T.name); %// 1
ind = arrayfun(#(x) find(T.name == x), names, 'uni', 0); %// 2
vals = cellfun(#(x) T.value(x(1:2)), ind, 'uni', 0); %// 3
m = [vals{:}].'; %// 4
Let's go through each line of code slowly.
Line #1
The first line finds all unique names through unique and we store them into names.
Line #2
The next line goes through all of the unique names and finds those locations / rows in the table that share that particular name. I use arrayfun and go through each name in names, find those rows that share the same name as one we are looking for, and place those row locations into individual cells; these are stored into ind. To find the locations of each valid name in our table, I use find and the locations are placed into a column vector. As such, we will have five column vectors where each column vector is placed into an individual cell. These column vectors will tell us which rows match a particular name located in your table.
Line #3
The next line uses cellfun to go through each of the cells in ind and extracts the first two row locations that share a particular name, indexes into the value field for your table to pull those two values, and these are placed as two-element vectors into individual cells for each name.
Line #4
The last line of code simply unrolls each two-element vector. The first two elements of each name get stored into columns. To get them into rows, I simply transpose the unrolling. The output matrix is stored into m.
If you want to see what the output looks like, this is what I get when I run the above code with your example table:
m =
0.0016 -0.7623
-1.0942 0.7125
1.6880 1.4001
0.1156 0.0768
1.0568 1.1972
Be advised that I only showed the first 5 digits of precision so there is some round-off at the end. However, this is only for display purposes and so what I got is equivalent to what your expect for the output.
Hope this helps!
If you want use the tables, you could try something like this:
count = 1;
U = unique(table2array(T(:,1)));
for ii = 1:size(U,1)
A = find(table2array(T(:,1)) == U(ii));
A = A(1:2);
B(count,1:2) = table2array(T(A,2));
count = count + 1;
end
Personally, I would find this simpler to do with your name and value arrays and forget about the table. If it is a requirement then I understand, however I will provide my solution still. It may provide some insight either way.
count = 1;
U = unique(name);
for ii = 1:size(U,1)
A = find(name == U(ii));
A = A(1:2);
B(count,1:2) = value(A);
count = count + 1;
end
Quick and dirty, but hopefully it's good enough. Good luck.
Another solution that is more manageable and easily scalable exists. Since MATLAB R2013b you can use a specialized function for pivoting a table (which is what you want to do): unstack.
In order to get exactly what you wanted, you need to add an extra variable to your table that will indicate replications:
name = ['A' 'A' 'A' 'B' 'B' 'C' 'C' 'C' 'C' 'D' 'D' 'E' 'E' 'E']';
value = randn(14, 1);
rep = [1, 2, 3, 1, 2, 1, 2, 3, 4, 1, 2, 1, 2, 3];
T = table(name, value, rep);
T =
name value rep
____ _________ ___
A 0.53767 1
A 1.8339 2
A -2.2588 3
B 0.86217 1
B 0.31877 2
C -1.3077 1
C -0.43359 2
C 0.34262 3
C 3.5784 4
D 2.7694 1
D -1.3499 2
E 3.0349 1
E 0.7254 2
E -0.063055 3
Then you just use unstack like this:
pivotTable = unstack(T, 'value','name')
pivotTable =
rep A B C D E
___ _______ _______ ________ _______ _________
1 0.53767 0.86217 -1.3077 2.7694 3.0349
2 1.8339 0.31877 -0.43359 -1.3499 0.7254
3 -2.2588 NaN 0.34262 NaN -0.063055
4 NaN NaN 3.5784 NaN NaN
Afterwards, it's a matter of re-arranging the table if you still want to.
The easiest way is to first convert the table into a matrix form and then reshape it by using the "reshape" function in Matlab.
matrix = t{:,:};% t-- your table variable
reshape_matrix = reshape(matrix ,[2,3]) % [2,3]--> the size of the matrix you desire
These two steps can be done by one line of code
reshape_matrix = reshape(t{:,:},[2,3]);

Efficient way of mapping similar inputs to similar outputs

Is there a efficient way of approaching this particular problem in matlab.
I am trying to map this matrix or possible array BeansRice (see below)
Beans={0:1,0:1,0:2,0:2,0:2,0:2,0:1,0:1,0:2,0:2}
[a b c d e f g h i j ] = ndgrid(Beans{:})
BeansRice = [a(:) b(:) c(:) d(:) e(:) f(:) g(:) h(:) i(:) j(:)]
into a matrix/array BR (see below)
BR=[abc, de, fg, hij];
where if columns a, b and c each have values 0 (ties preference), I have preference for c>b>a. If all columns a, b and c each have values 1 (ties no preference), BR(1)=1. If columns a and b have values 0 and column c has value 2, BR(1)=2. If columns a and b have values 1 and column c has value 2, BR(1)=1.
I have an if function with indexing but I was thinking if it is possible to improve it, using the rank/order of the values in the matrix to break ties. Looking for a more efficient process as this is only a sub of a large problem.
You can use logical indexing instead of if conditions. For example
BR1(a==1 & b==1 & c==1)=1
BR1(a==0 & b==0 & c==2)=2
BR1(a==1 & b==1 & c==2)=1
...
then process the other parts, BR2(d==... & e>...)=##, then concatenate to obtain what you need
BR=[BR1(:) BR2(:) ...]
etc...