How can I programmatically group table column variables MATLAB?

How can I programmatically group table column variables MATLAB? - matlab

If I create a table with:
t = table(magic(3));
I get a table with a Singular Variable Name
However if I:
a = magic(3);
T = array2table(a);
Then I get a table with Three Variable Names:
If I try to group the columns by sending it only one variable name for the table:
T.Properties.VariableNames = {'OneName'};
The VariableNames property must contain one name for each variable in the table.
In the second situation, there is an option to combine the columns into one column manually by highlighting the columns and right clicking on the mouse.
How can I programmatically group the three variables to become one Variable as in the first example if I already created the matrix a ?
EDIT:
*as in the first example if I already created the table a ?
I am using R2017b
Based on the comment below, I am asking how to do mergevars prior to R2018a.
In the above example, I would be able to group them into one variable with:
t = table(a);
In other words, I hoped to create multiple multicolumn variables. In other-other words, to do mergevars prior to R2018a.

Once the table T has been created with a variable name for each column, the column values could be extracted, and then assigned back to T:
b = T{:, 1:2};
c = T{:, 3};
T = table(b, c);

Related

How to update columns based on column name

I have a table with several sets of columns that share suffixes, for example: column_foo
I need to update all columns that have the suffix "foo" at once, without having to list them after SET in my command line.
So, basically, if I were to write the command line literally, it'd be something like this:
"Update myTable set (all columns which name contains 'foo') = value where id = 1".
How do I achieve that?
Thanks in advance

Split a Matlab table in several tables dynamically

I am working in MATLAB and I did not find yet a way to split a table T in different tables {T1,T2,T3,...} dynamically. What I mean with dynamic is that it must be done based on some conditions of the table T that are not known a priori. For now, I do it in a non-dynamic way with the following code (I hard-code the number of tables I want to have).
%% Separate data of table T in tables T1,T2,T3
starting_index = 1;
T1 = T(1:counter_simulations(1),:);
starting_index = counter_simulations(1)+1;
T2 = T(starting_index:starting_index+counter_simulations(2)-1,:);
starting_index = starting_index + counter_simulations(2);
T3 = T(starting_index:starting_index+counter_simulations(3)-1,:);
Any ideas on how to do it dynamically? I would like to do something like that:
for (i=1:number_of_tables_to_create)
T{i} = ...
end
EDIT: the variable counter_simulations is an array containing the number of rows I want to extract for each table. Example: counter_simulations(1)=200 will mean that the first table will be T1= T(1:200, :). If counter_simulations(2)=300 the first table will be T1= T(counter_simulations(1)+1:300, :) and so on.
I hope I was clear.
Should I use cell arrays instead of tables maybe?
Thanks!

For the example you give, where counter_simulations contains a list of the number of rows to take from T in each of the output tables, MATLAB's mat2cell function actually implements this behaviour directly:
T = mat2cell(T,counter_simulations);
While you haven't specified the contents of counter_simulations, it's clear that if sum(counter_simulations) > height(T) the example would fail. If sum(counter_simulations) < height(T) (and so your desired output doesn't contain the last row(s) of T) then you would need to add a final element to counter_simulations and then discard the resulting output table:
counter_simulations(end+1) = height(T) - sum(counter_simulations);
T = mat2cell(T,counter_simulations);
T(end) = [];
Whether this solution applies to all examples of
some conditions of the table T that are not known a priori
you ask for in the question depends on the range of conditions you actually mean; for a broad enough interpretation there will not be a general solution but you might be able to narrow it down if mat2cell performs too specific a job for your actual problem.

How to join multiple Matlab tables stored in a structure

I have a multiple tables stored in a structure. I would like to merge all of them. The number of rows are not the same but the number of columns are the same. The common key is always in the first column.
For two tables it's easy to do the join but it's a little bit tricky with multiple ones. How can I achieve this.

Best thing I can think of is to poll your current workspace and see what variables currently exist. Then, for each variable, if it's a structure, concatenate this onto a larger structure. When you're done, you'll have one large structure that contains all of these combined. This will require the use of whos and unfortunately eval:
%// Larger structure initialization
largeStruct = [];
%// Get all variable names currently in workspace
vars = whos;
%// For each variable...
for ii = 1 : numel(vars)
%// If this is a structure, and if the variable is not any of
%// the current structure, the automatic variable answer and
%// the current variable storing our variable names...
if strcmpi(vars(ii).class, 'struct') && ~any(strcmpi(vars(ii).name, {'largeStruct', 'ans', 'vars'}))
%// Concatenate to the larger structure
largeStruct = eval(['[largeStruct ' vars(ii).name '];']);
end
end
BTW, using eval is considered bad practice. I had to use it because of your current state of the workspace. Consider using a single structure that stores all of these nested structures where the fields are the actual variable names themselves... something like stock.stockQuotes_070715, stock.stockQuotes_070815, etc. If you did it this way, we wouldn't have had to use eval to begin with.

I would poll the workspace, put all the datasets in a cell array, use cellfun to convert to tables, and then use a recursive outerjoin function like this:
tablecell = {Table1, Table2, Table3, ...}
tables = outerjoinmultiple(tablecell)
function table = outerjoinmultiple(tables)
if size(tables, 2) == 1
table = tables{1};
else
t2 = outerjoinmultiple(tables(2:end));
table = outerjoin(tables{1}, t2, 'MergeKeys', true);
end

Aggregate/sum function of a table in Matlab

In matlab I have read in a table from a csv file, then moved two columns I am interested in into a new table. These columns are "ID" (of a person, 1-400) and then another ID to represent their occupation (1-12).
What I want to do is create a simple table with 12 records and 2 columns, there is a record for each job, and the number of user IDs who have this job must be aggregated/summed, such a table could be easily bar charted. At the moment I have 400 user records, all with their IDs and one of the 12 possible job IDs.
So much like an SQL aggregate/sum function, but I want to do it in Matlab, with a table object. The problem I am having is finding how to do this without using a cell array or something similar.
Thanks!

I know that you found an answer yourself, but I would like to mention the histc function, which avoids the loop (and is faster for larger matrices):
JobCounts = histc(OccupationTable(:,2), 1:NumberOfJobs);
Combining this with the job number gives the desired result:
result = [(1:NumberOfJobs)' JobCounts];

Nevermind, solved it. Just looped through the job numbers and ran "sum" where the ID was equal to what I wanted:
for i = 1:1:NumberOfJobs;
JobCounts(i,:) = sum(OccupationTable(:,2) == i);
end

simplest example of a query by date range in cassandra 1.x

I want to store an ID and a date and I want to retrieve all entries from dateA up to dateB, what exactly do I need to be able to perform select from my_column_family where date >= dateA and date < dateB; ?

the guys at #cassandra (IRC) helped me find a way, there's many subtle details so I'd like to document that here.
first you need to declare a column family similar to this (examples from cassandra-cli):
create column family users with comparator=UTF8Type and key_validation_class=UTF8Type and column_metadata=[
{column_name: id, validation_class: LongType}
{column_name: name, validation_class: UTF8Type, index_type: KEYS}
{column_name: age, validation_class: LongType}
];
few important things about this declaration:
the comparator and key_validation_class are there to be able to use strings as key names
the first declared column is special, it's the "row key" which is used to address each row and therefore cannot contain duplicate values (the INSERT is really an UPSERT so when there's duplicates the new values overwrite the old ones)
the second column declares a "secondary index" on its values (more on that below)
the dates are stored as Long datatypes, interpretation is up to the client
now let's add some values:
set users[1][name] = john;
set users[1][age] = 19;
set users[2][name] = jane;
set users[2][age] = 21;
set users[3][name] = john;
set users[3][age] = 32;
according to this: http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/ Cassandra does not support the < operators, what it does is to manually exclude the rows that don't match but it does that AFTER there's a resultset and it also refuses to do so unless and actual filtering has taken place.
what that means is that a query like get users where age > 20; will return null but if we add a predicate that includes = it'll magically work.
here's where the secondary index is important, without it you can't use = so on this example I can do get users where name = jane; but I cannot ask for get users where age = 21;
the funny thing is that, after using = the < works so having a secondary index allows you to ask for get users where name = john and age > 20; and it'll filter correctly.

There are a few ways to solve this. The simplest is probably the secondary index solution with the equality limitation mentioned in your own answer. I've used this method, adding an additional column called 'valid', setting the value to 1. Then the queries can become where valid=1 and date>nnnn
The other solutions require additional column families and additional queries.
When loading the data, create and add to a column family which contains the timestamps as keys, and each entry would list all the user ids as column names.
If the partitioning strategy is ordered, then a single RangeSliceQuery can specify the date range as a key range and get all the columns for each key. Then iterate through the result keys, using the column values for each user id and if needed, query the original column family for the data associated with each id. Cassandra always stores the column names sorted, and can be reversed when reading.
But, as documented, the ordered partitioner is not ideal, leading to hot spots and difficulty in load balancing the nodes.
Without the ordered partitioner, still keeping the timestamp column family, you would have to create another column family while loading data where you can store all the timestamps as the columns under one or more known keys (e.g. 'created' or 'updated'). The first query would be a SliceQuery for a known key, and then the column names (as timestamps) would provide the keys for the MultigetSliceQuery to the timestamp column family.
I've used variations on this, usually adding Composite keys or columns for additional flexibility.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse