Join tables with non-equal rows in Matlab - matlab

I'm trying to use the relatively new data type in Matlab, table. I have a number of variables that each contains a value for a set of parameters (Rows). The rows are not (necessarily) equal for each variable, however. I want to join the variables together so the results are all displayed in a single table. E.g., I want to join these together: (drawn side by side to save space)
Var_A Var_B
________ _______
a 0.36744 b 0.88517
b 0.98798 c 0.91329
c 0.037739 d 0.79618
Is it possible to join these two tables?
Here's an example of what I'm trying to do:
A = table(rand(3,1),'VariableNames',{'Var_A'},'RowNames',{'a','b','c'})
B = table(rand(3,1),'VariableNames',{'Var_B'},'RowNames',{'b','c','d'})
try
C = join(A,B)
catch e
disp(e.identifier)
disp(e.message)
end
This results in:
MATLAB:table:join:CantInferKey
Cannot find a common table variable to use as a key variable.
Okay, so maybe join isn't intended for this -- what about outerjoin? Its documentation sounds promising:
The outer join includes the rows that match between A and B, and also unmatched rows from either A or B, all with respect to the key variables. C contains all variables from both A and B, including the key variables.
Well, outerjoin apparently can't be used with tables with row names! This is the closest I've found that does what I want, but seems to be against the idea of the table data structure to some degree:
AA = table({'a';'b';'c'},rand(3,1));
AA.Properties.VariableNames = {'param','Var_A'}
BB = table({'b';'c';'d'},rand(3,1));
BB.Properties.VariableNames = {'param','Var_B'}
CC = outerjoin(AA,BB,'Keys',1,'MergeKeys',true)
This results in
param Var_A Var_B
_____ _______ _______
'a' 0.10676 NaN
'b' 0.65376 0.77905
'c' 0.49417 0.71504
'd' NaN 0.90372
I.e., the row is just stored as a separate variable. This means it can't be indexed using "logical" notation such as CC{'a',:}.
So this can be fixed with:
CCC = CC(:,2:end);
CCC.Properties.RowNames = CC{:,1}
Which finally results in:
CCC =
Var_A Var_B
_______ ________
a 0.4168 NaN
b 0.65686 0.29198
c 0.62797 0.43165
d NaN 0.015487
But is this really the best way to go about things? Matlab is weird.

There must be a better way to do this, but here is another option:
clear;
%// Create two tables to play with.
tableA = table([.5; .6; .7 ],'variablenames',{'varA'},'rowname',{'a','b','c'});
tableB = table([.55; .62; .68],'variablenames',{'varB'},'rowname',{'b','c','d'});
%// Lets add rows to tableA so that it has the same rows as tableB
%// First, get the set difference of tableB rows and tableA rows
%// Then, make a new table with those rows and NaN for data.
%// Finally, concatenate tableA with the new table
tableAnewRows=setdiff(tableB.Properties.RowNames,tableA.Properties.RowNames);
tableAadd=table( nan(length(tableAnewRows),1) ,'variablenames',{'varA'},'rownames',tableAnewRows);
tableA=[tableA;tableAadd];
%// Lets add rows to tableB so that it has the same rows as tableA
tableBnewRows=setdiff(tableA.Properties.RowNames,tableB.Properties.RowNames);
tableBadd=table( nan(length(tableBnewRows),1) ,'variablenames',{'varB'},'rownames',tableBnewRows);
tableB=[tableB;tableBadd];
%// Form tableC from tableA and tableB. Could also use join().
tableC=[tableA tableB];

Related

How to reference a column in the select clause in the order clause in SQLAlchemy like you do in Postgres instead of repeating the expression twice

In Postgres if one of your columns is a big complicated expression you can just say ORDER BY 3 DESC where 3 is the order of the column where the complicated expression is. Is there anywhere to do this in SQLAlchemy?
As Gord Thompson observes in this comment, you can pass the column index as a text object to group_by or order_by:
q = sa.select(sa.func.count(), tbl.c.user_id).group_by(sa.text('2')).order_by(sa.text('2'))
serialises to
SELECT count(*) AS count_1, posts.user_id
FROM posts GROUP BY 2 ORDER BY 2
There are other techniques that don't require re-typing the expression.
You could use the selected_columns property:
q = sa.select(tbl.c.col1, tbl.c.col2, tbl.c.col3)
q = q.order_by(q.selected_columns[2]) # order by col3
You could also order by a label (but this will affect the names of result columns):
q = sa.select(tbl.c.col1, tbl.c.col2, tbl.c.col3.label('c').order_by('c')

Is there a way to create a table with multi-line column names?

I am attempting to create a table that has the following format of multi line heading for the columns
|Col1 Co2 Col3|
|Col1 Co2 Col3|
Tried this using the example and adding a | between 1st and 2nd line but did not work
T = table(categorical({'M';'F';'M'}),[45;32;34],...
{'NY';'CA';'MA'},logical([1;0;0]),..
'VariableNames',{'Gender|Gender2','Age|Age2','State|State2','Vote|Vote2'})
I am using R2018b student edition
The ability to have arbitrary variable names in tables was added to release R2019b of MATLAB. Using that release, your code works as expected and produces:
T =
3×4 table
Gender|Gender2 Age|Age2 State|State2 Vote|Vote2
______________ ________ ____________ __________
M 45 {'NY'} true
F 32 {'CA'} false
M 34 {'MA'} false
However, in your question you state that you want multi-line variables. You can make these in R2019b, but the display collapses the newline character into a ↵, like this:
>> T = table(1, 'VariableNames', {['a', newline, 'b']})
T =
table
a↵b
___
1
If it's just the display you're after, you could consider making nested tables, like this:
t1 = table(1);
t2 = table(2);
T = table(t1, t2)
which results in:
T =
1×2 table
t1 t2
Var1 Var1
____ ____
1 2
Note that that final approach works in R2019a and prior releases.
No can do. Valid variable names of tables are similar to other variables in Matlab. They cannot contain \n (new-line) or anything which is not letters and numbers. Underscore is the exception.

Filter Table by Range

I have a parent table and a child table. The parent table only lists ranges of attributes. I'm looking to merge the two to create a proper hierarchy, but I need a way to filter the child table by the parent range first, I believe.
Here is a sample of the parent table:
parent_item start_attribute end_attribute
A 10 120
B 130 130
C 140 200
And the child table:
child_item child_attribute
U 10
V 50
W 60
X 130
Y 140
Z 150
The output table I'd be looking for is such:
parent_item child_item
A U
A V
A W
B X
C Y
C Z
To further confuse things, the attributes are alphanumeric, which eliminates uses a List.Generate() function I believe. I think I'm looking for something similar to the EARLIER() function in DAX, but I'm not sure I'm even looking at this problem the right way. Here is my pseudo code as I'd see it working:
Table.AddColumn(
#"parent_table",
"child_item",
each
Table.SelectRows(
child_table,
each ([child_attribute] <= EARLIER(end_attribute) and [child_attribute]>= EARLIER(start_attribute) )
)
)
This is a simplification as the child table actually contains five attributes and the parent table contains five respective attribute ranges.
I found this blog post, which held the key to referencing the current row environment. The main takeaway is this:
Each is a keyword to create simple functions. Each is an abbreviation for (_) =>, in which the underscore represents (if you are in a table environment, as we are) the current row.
Using a new function C for child_table, we can write
= Table.AddColumn(#"parent_table", "child_table", each
Table.SelectRows(Child, (C) =>
C[child_attribute] >= [start_attribute] and
C[child_attribute] <= [end_attribute]))
or more explicitly as
= Table.AddColumn(#"parent_table", "child_table", (P) =>
Table.SelectRows(Child, (C) =>
C[child_attribute] >= P[start_attribute] and
C[child_attribute] <= P[end_attribute]))
Once you add this column, just expand the child_item column from your new child_table column.
One possible approach is to do a full cross join and then filter out the rows you don't want.
Create a custom column on both tables with a constant value of, say, 1.
Merge the Child table into the Parent table matching on the new column.
Expand out the Child table to get a table like this:
Create a custom column with all your desired logic. For example,
if [child_attribute] >= [start_attribute] and
[child_attribute] <= [end_attribute]
then 1
else 0
Filter out just the 1 values in this new column.
Remove all other columns except for parent_item and child_item.

Filter on parts of words in Matlab tables

Similar to Excel, I need to find out how to filter out rows of a table that do not contain a certain string.
For example, I need only rows that contain the letters "MX". Within the sheet, there are rows with strings like ZMX01, MX002, and US001. I would want the first two rows.
This seems like a simple question, so I am surprised I couldn't find any help for this!
It is similar to the question Filter on words in Matlab tables (as in Excel)
You may not find a lot of information on tables in MATLAB, as they were introduced with version R2013a, which is not that long ago. So, about your question: Let's first create a sample table:
% Create a sample table
col1 = {'ZMX01'; 'MX002'; 'US001'};
col2 = {5;7;3};
T = table(col1, col2);
T =
col1 col2
_______ ____
'ZMX01' [5]
'MX002' [7]
'US001' [3]
Now, MATLAB provides the rowfun function to apply any function to each row in a table. By default, the function you call has to be able to work on all columns of the table.
To only apply rowfun to one column, you can use the 'InputVariables' parameter, which lets you specify either the number of the column (e.g. 2 for the second column) or the name of the column (e.g. 'myColumnName').
Then, you can set 'OutputFormat' to 'uniform' to get an array and not a new table as output.
In your case, you'll want to use strfind on the column 'col1'. The return value of strfind is either an empty array (if 'MX' wasn't found), or an array of all indices where 'MX' was found.
% Apply rowfun
idx = rowfun(#(x)strfind(x,'MX'), T, 'InputVariables', 'col1', 'OutputFormat', 'uniform');
The output of this will be
idx =
[2]
[1]
[]
i.e. a 3-by-1 cell array, which is empty for 'US001' and contains a positive value for both other inputs. To create a subset of the table with this data, we can do the following:
% Create logical array, which is true for all rows to keep.
idx = ~cellfun(#isempty, idx);
% Save these rows and all columns of the table into a new table
R = T(idx,:);
And finally, we have our resulting table R:
R =
col1 col2
_______ ____
'ZMX01' [5]
'MX002' [7]

Select all values from 2 tables

I have 3 tables code first, A and B and a join table (one to many relationship to link A and B), I would like to get all the results, not duplicate of the tables A and B and return it as selectList:
var a = from s in db.A
join ss in db.B on s.ps_id equals ss.ss_id
orderby s.ps_label
select new SelectListItem
{
Text = s.ps_id.ToString(),
Value = s.ps_label
};
return a;
This only returns results from the A table, but not from B as well. What is wrong, and what is your advice for best practice and performance for this?