Reshape Matlab table - matlab

I have the following table
name = ['A' 'A' 'A' 'B' 'B' 'C' 'C' 'C' 'C' 'D' 'D' 'E' 'E' 'E']';
value = randn(14, 1);
T = table(name, value);
i,e.
T =
name value
____ _________
A 0.0015678
A -0.76226
A 0.98404
B -1.0942
B 0.71249
C 1.688
C 1.4001
C -0.9278
C -1.3725
D 0.11563
D 0.076776
E 1.0568
E 1.1972
E 0.29037
I want to transform it in the following way: take the first two cells in value corresponding to different values in name and put it in the 5x2 matrix. This matrix would have rows corresponding to different names A,B,C,D,E and columns corresponding to values, e.g. the first two rows are
0.0015678 -0.76226
-1.0942 0.71249

This can be done with accumarray using a custom function. The first step is to convert the name column of T into a numeric vector; and then accumarray can be applied.
This approach requires T being sorted according to column 1, because only in this case is accumarray guaranteed to preserve order (as indicated in its documentation). So if T may not be sorted (although it is in your example), sort it first using sortrows.
T = sortrows(T, 1); %// you can remove this line if T is guaranteed to be sorted
[~, ~, names] = unique(T(:,1)); %// names as a numeric vector
result = cell2mat(accumarray(names, T.value, [], #(x) {x([1 2]).'}));

First figure out where each name has values located in the table, then cycle through each name and place the first two values encountered for each name into individual cell arrays. Once you're done, reshape the matrix to 5 x 2 as you have said. As such, do something like this:
names = unique(T.name); %// 1
ind = arrayfun(#(x) find(T.name == x), names, 'uni', 0); %// 2
vals = cellfun(#(x) T.value(x(1:2)), ind, 'uni', 0); %// 3
m = [vals{:}].'; %// 4
Let's go through each line of code slowly.
Line #1
The first line finds all unique names through unique and we store them into names.
Line #2
The next line goes through all of the unique names and finds those locations / rows in the table that share that particular name. I use arrayfun and go through each name in names, find those rows that share the same name as one we are looking for, and place those row locations into individual cells; these are stored into ind. To find the locations of each valid name in our table, I use find and the locations are placed into a column vector. As such, we will have five column vectors where each column vector is placed into an individual cell. These column vectors will tell us which rows match a particular name located in your table.
Line #3
The next line uses cellfun to go through each of the cells in ind and extracts the first two row locations that share a particular name, indexes into the value field for your table to pull those two values, and these are placed as two-element vectors into individual cells for each name.
Line #4
The last line of code simply unrolls each two-element vector. The first two elements of each name get stored into columns. To get them into rows, I simply transpose the unrolling. The output matrix is stored into m.
If you want to see what the output looks like, this is what I get when I run the above code with your example table:
m =
0.0016 -0.7623
-1.0942 0.7125
1.6880 1.4001
0.1156 0.0768
1.0568 1.1972
Be advised that I only showed the first 5 digits of precision so there is some round-off at the end. However, this is only for display purposes and so what I got is equivalent to what your expect for the output.
Hope this helps!

If you want use the tables, you could try something like this:
count = 1;
U = unique(table2array(T(:,1)));
for ii = 1:size(U,1)
A = find(table2array(T(:,1)) == U(ii));
A = A(1:2);
B(count,1:2) = table2array(T(A,2));
count = count + 1;
end
Personally, I would find this simpler to do with your name and value arrays and forget about the table. If it is a requirement then I understand, however I will provide my solution still. It may provide some insight either way.
count = 1;
U = unique(name);
for ii = 1:size(U,1)
A = find(name == U(ii));
A = A(1:2);
B(count,1:2) = value(A);
count = count + 1;
end
Quick and dirty, but hopefully it's good enough. Good luck.

Another solution that is more manageable and easily scalable exists. Since MATLAB R2013b you can use a specialized function for pivoting a table (which is what you want to do): unstack.
In order to get exactly what you wanted, you need to add an extra variable to your table that will indicate replications:
name = ['A' 'A' 'A' 'B' 'B' 'C' 'C' 'C' 'C' 'D' 'D' 'E' 'E' 'E']';
value = randn(14, 1);
rep = [1, 2, 3, 1, 2, 1, 2, 3, 4, 1, 2, 1, 2, 3];
T = table(name, value, rep);
T =
name value rep
____ _________ ___
A 0.53767 1
A 1.8339 2
A -2.2588 3
B 0.86217 1
B 0.31877 2
C -1.3077 1
C -0.43359 2
C 0.34262 3
C 3.5784 4
D 2.7694 1
D -1.3499 2
E 3.0349 1
E 0.7254 2
E -0.063055 3
Then you just use unstack like this:
pivotTable = unstack(T, 'value','name')
pivotTable =
rep A B C D E
___ _______ _______ ________ _______ _________
1 0.53767 0.86217 -1.3077 2.7694 3.0349
2 1.8339 0.31877 -0.43359 -1.3499 0.7254
3 -2.2588 NaN 0.34262 NaN -0.063055
4 NaN NaN 3.5784 NaN NaN
Afterwards, it's a matter of re-arranging the table if you still want to.

The easiest way is to first convert the table into a matrix form and then reshape it by using the "reshape" function in Matlab.
matrix = t{:,:};% t-- your table variable
reshape_matrix = reshape(matrix ,[2,3]) % [2,3]--> the size of the matrix you desire
These two steps can be done by one line of code
reshape_matrix = reshape(t{:,:},[2,3]);

Related

Collapsing and averaging redundant entries in MATLAB table

I have the following MATLAB table
item_a item_b score
a b 1
a b 1
b c 3
d e 2
d e 1
d e 0
I want to average the redundant rows. The desired result is as follows:
item_a item_b score
a b (1+1)/2
b c 3
d e (2+1+0)/3
This is a classic scenario for the findgroups, split-apply workflow.
Given your table named t:
% Find mean values.
G = findgroups(t.item_a);
meanValues = splitapply(#mean,t.score,G);
% Create new table.
[~,i] = unique(G);
newTable = t(i,:);
newTable.score = meanValues
newTable contains the desired table.
See this documentation page for more examples.
This is what I got. You can tweak with the final results. There is a similar example on MATLAB documentation. Here are two key functions, accumarray and unique. Note that this solution works only for array inputs not cell data types. By manipulating data types, you can also find the solution for table and cell data types. Otherwise, I think for loop will be necessary.
items = ['a' 'b'
'a' 'b'
'b' 'c'
'd' 'e'
'd' 'e'
'd' 'e' ];
scores = [1 1 3 2 1 0]';
[items_unique,ia,ic] = unique(items,'rows');
score_mean = accumarray(ic,scores, [], #mean);
result = {items_unique score_mean};

Reshape by Certain Name in Matlab

I have a cell-matrix like the below,
the first column: A, B, C
the second column: A, B, D
the third column: 1, 1, 1
Which means that A and A has 1 unit, B and B has 1 unit and C and D has one unit
How can I conveniently create the following matrix (mat) in matlab?
[Name,A,B,C,D
A,1,NA,NA,NA
mat = B,NA,1,NA,NA
C,NA,NA,NA,NA
D,NA,NA,NA,1]
I think I can use a loop to achieve that, but actually the dimension is much larger than the example able. How can I do that?
A, B, C, D here are characters, if the matrix cannot contain both numeric and character, I can remove the first column and the first row in mat. Also actually the first matrix containing the relationship of A, B, C, D is a 3*3 cell.
Desired output can be produced:
mycell = {...
'A' , 'B' , 'C'
'A' , 'B' , 'D'
1 , 1 , 1};
[U, ~, IDX] = unique(mycell(1:2,:));
IDX = reshape(IDX,2,[]).';
mat = accumarray(IDX,[mycell{3,:}],[],[],NA);
so U is names of rows/columns
IDX shows indices of rows and columns of mat that contain non NA values
accumarray is the function that can convert indices and values to desired matrix
Now you can access elements of mat by numeric indices i.e. mat(1,3) that represent relationship of A and C
If you want to work with character indices instead of numeric ones you can do this:
index=cell2struct(num2cell(1:numel(U)),U,2);
mat(index.A, index.C)
index is a struct that maps characters to numbers. Also you can use index this way:
mat(index.('A'), index.('C'))

Shifting repeating rows to a new column in a matrix

I am working with a n x 1 matrix, A, that has repeating values inside it:
A = [0;1;2;3;4; 0;1;2;3;4; 0;1;2;3;4; 0;1;2;3;4]
which correspond to an n x 1 matrix of B values:
B = [2;4;6;8;10; 3;5;7;9;11; 4;6;8;10;12; 5;7;9;11;13]
I am attempting to produce a generalised code to place each repetition into a separate column and store it into Aa and Bb, e.g.:
Aa = [0 0 0 0 Bb = [2 3 4 5
1 1 1 1 4 5 6 7
2 2 2 2 6 7 8 9
3 3 3 3 8 9 10 11
4 4 4 4] 10 11 12 13]
Essentially, each repetition from A and B needs to be copied into the next column and then deleted from the first column
So far I have managed to identify how many repetitions there are and copy the entire column over to the next column and then the next for the amount of repetitions there are but my method doesn't shift the matrix rows to columns as such.
clc;clf;close all
A = [0;1;2;3;4;0;1;2;3;4;0;1;2;3;4;0;1;2;3;4];
B = [2;4;6;8;10;3;5;7;9;11;4;6;8;10;12;5;7;9;11;13];
desiredCol = 1; %next column to go to
destinationCol = 0; %column to start on
n = length(A);
for i = 2:1:n-1
if A == 0;
A = [ A(:, 1:destinationCol)...
A(:, desiredCol+1:destinationCol)...
A(:, desiredCol)...
A(:, destinationCol+1:end) ];
end
end
A = [...] retrieved from Move a set of N-rows to another column in MATLAB
Any hints would be much appreciated. If you need further explanation, let me know!
Thanks!
Given our discussion in the comments, all you need is to use reshape which converts a matrix of known dimensions into an output matrix with specified dimensions provided that the number of elements match. You wish to transform a vector which has a set amount of repeating patterns into a matrix where each column has one of these repeating instances. reshape creates a matrix in column-major order where values are sampled column-wise and the matrix is populated this way. This is perfect for your situation.
Assuming that you already know how many "repeats" you're expecting, we call this An, you simply need to reshape your vector so that it has T = n / An rows where n is the length of the vector. Something like this will work.
n = numel(A); T = n / An;
Aa = reshape(A, T, []);
Bb = reshape(B, T, []);
The third parameter has empty braces and this tells MATLAB to infer how many columns there will be given that there are T rows. Technically, this would simply be An columns but it's nice to show you how flexible MATLAB can be.
If you say you already know the repeated subvector, and the number of times it repeats then it is relatively straight forward:
First make your new A matrix with the repmat function.
Then remap your B vector to the same size as you new A matrix
% Given that you already have the repeated subvector Asub, and the number
% of times it repeats; An:
Asub = [0;1;2;3;4];
An = 4;
lengthAsub = length(Asub);
Anew = repmat(Asub, [1,An]);
% If you can assume that the number of elements in B is equal to the number
% of elements in A:
numberColumns = size(Anew, 2);
newB = zeros(size(Anew));
for i = 1:numberColumns
indexStart = (i-1) * lengthAsub + 1;
indexEnd = indexStart + An;
newB(:,i) = B(indexStart:indexEnd);
end
If you don't know what is in your original A vector, but you do know it is repetitive, if you assume that the pattern has no repeats you can use the find function to find when the first element is repeated:
lengthAsub = find(A(2:end) == A(1), 1);
Asub = A(1:lengthAsub);
An = length(A) / lengthAsub
Hopefully this fits in with your data: the only reason it would not is if your subvector within A is a pattern which does not have unique numbers, such as:
A = [0;1;2;3;2;1;0; 0;1;2;3;2;1;0; 0;1;2;3;2;1;0; 0;1;2;3;2;1;0;]
It is worth noting that from the above intuitively you would have lengthAsub = find(A(2:end) == A(1), 1) - 1;, But this is not necessary because you are already effectively taking the one off by only looking in the matrix A(2:end).

find row indices of different values in matrix

Having matrix A (n*2) as the source and B as a vector containing a subset of elements A, I'd like to find the row index of items.
A=[1 2;1 3; 4 5];
B=[1 5];
F=arrayfun(#(x)(find(B(x)==A)),1:numel(B),'UniformOutput',false)
gives the following outputs in a cell according to this help page
[2x1 double] [6]
indicating the indices of all occurrence in column-wise. But I'd like to have the indices of rows. i.e. I'd like to know that element 1 happens in row 1 and row 2 and element 5 happens just in row 3. If the indices were row-wise I could use ceil(F{x}/2) to have the desired output. Now with the variable number of rows, what's your suggested solution? As it may happens that there's no complete inclusion tag 'rows' in ismember function does not work. Besides, I'd like to know all indices of specified elements.
Thanks in advance for any help.
Approach 1
To convert F from its current linear-index form into row indices, use mod:
rows = cellfun(#(x) mod(x-1,size(A,1))+1, F, 'UniformOutput', false);
You can combine this with your code into a single line. Note also that you can directly use B as an input to arrayfun, and you avoid one stage of indexing:
rows = arrayfun(#(x) mod(find(x==A)-1,size(A,1))+1, B(:), 'UniformOutput', false);
How this works:
F as given by your code is a linear index in column-major form. This means the index runs down the first column of B, the begins at the top of the second column and runs down again, etc. So the row number can be obtained with just a modulo (mod) operation.
Approach 2
Using bsxfun and accumarray:
t = any(bsxfun(#eq, B(:), reshape(A, 1, size(A,1), size(A,2))), 3); %// occurrence pattern
[ii, jj] = find(t); %// ii indicates an element of B, and jj is row of A where it occurs
rows = accumarray(ii, jj, [], #(x) {x}); %// group results according to ii
How this works:
Assuming A and B as in your example, t is the 2x3 matrix
t =
1 1 0
0 0 1
The m-th row of t contains 1 at column n if the m-th element of B occurs at the n-th row of B. These values are converted into row and column form with find:
ii =
1
1
2
jj =
1
2
3
This means the first element of B ocurrs at rows 1 and 2 of A; and the second occurs at row 3 of B.
Lastly, the values of jj are grouped (with accumarray) according to their corresponding value of ii to generate the desired result.
One approach with bsxfun & accumarray -
%// Create a match of B's in A's with each column of matches representing the
%// rows in A where there is at least one match for each element in B
matches = squeeze(any(bsxfun(#eq,A,permute(B(:),[3 2 1])),2))
%// Get the indices values and the corresponding IDs of B
[indices,B_id] = find(matches)
%// Or directly for performance:
%// [indices,B_id] = find(any(bsxfun(#eq,A,permute(B(:),[3 2 1])),2))
%// Accumulate the indices values using B_id as subscripts
out = accumarray(B_id(:),indices(:),[],#(x) {x})
Sample run -
>> A
A =
1 2
1 3
4 5
>> B
B =
1 5
>> celldisp(out) %// To display the output, out
out{1} =
1
2
out{2} =
3
With arrayfun,ismember and find
[r,c] = arrayfun(#(x) find(ismember(A,x)) , B, 'uni',0);
Where r gives your desired results, you could also use the c variable to get the column of each number in B
Results for the sample input:
>> celldisp(r)
r{1} =
1
2
r{2} =
3
>> celldisp(c)
c{1} =
1
1
c{2} =
2

MATLAB: Conditional summation

I have two arrays of the following form:
v1 = [ 1 2 3 4 5 6 7 8 9 ... ]
c2 = { 'a' 'a' 'a' 'b' 'b' 'c' 'c' 'c' 'c' ... }
(all values are examples only, no pattern can be assumed in the real data. v1 and c2 have the same size)
I want to obtain a vector containing the summation of the components of v1 corresponding to equal values in c2. In the example above, the first component of the resulting vector would be 1+2+3, the second 4+5, and so on.
I know I can do it in a loop of the form:
uni_c2 = unique(c2);
result = zeros(size(uni_c2));
for i = 1:numel(uni_c2)
result(i) = sum( v1(strcmp(uni_c2(i),c2)) );
end
Is there a single command or a vectorized way of doing the same operation?
You can do this in two lines:
[b, m, n] = unique(c2)
result = accumarray(n', v1)
The elements of result correspond to the strings in the cell array b.
This is vectorized but a bad idea for very large vectors. For some problems a "vectorized" solution is worse than a for loop.
>> v1 = [ 1 2 3 4 5 6 7 8 9];
>> c2 = 'aaabbcccc'-'a'
c2 =
0 0 0 1 1 2 2 2 2
>> N = repmat(c2',1,max(c2)-min(c2)+1) == repmat([min(c2):max(c2)],size(c2,2),1);
>> v1*N
ans =
6 9 30
I think a very general (and vectorized) solution is something like this:
v1 = [ 1 2 3 4 5 6 7 8 9 ]
c2 = { 'a' 'a' 'a' 'b' 'b' 'c' 'c' 'c' 'c' }
uniqueValuesInC2 = unique(c2);
conditionalSumOfV1 = #(x)(sum(v1(strcmp(c2, x))));
result = cellfun(conditionalSumOfV1, uniqueValuesInC2)
Perhaps my solution needs a bit of an explanation to the untrained eye:
So first you actually need to compute the different possible values in c2, which is done by unique.
The conditionalSumOfV1 function takes an argument x, it compares every element in c2 with x and selects the corresponding elements in v1 and sums them.
Finally cellfun is comparable to a foreach construct in some other languages: the function conditionalSum is evaluated for every value in the cell array you provide (in this case: every unique value in c2) and stores it in the output array. For other types of container variables (arrays, structs), MATLAB has equivalent foreach-like constructs: arrayfun, structfun.
This will work for contents of c2 that are longer than a single character and it does not require a large repmat operation as stardt's solution. I do however have my doubts when it comes to long arrays where c2 has only a few duplicate values., but I guess that will be a hard case for most algorithms. If you are in such a case, you might need to take a look at the extra outputs of unique or write your own alternative to unique (i.e. write for loops, preferably in a compiled language/MEX).