Split a Matlab table in several tables dynamically - matlab

I am working in MATLAB and I did not find yet a way to split a table T in different tables {T1,T2,T3,...} dynamically. What I mean with dynamic is that it must be done based on some conditions of the table T that are not known a priori. For now, I do it in a non-dynamic way with the following code (I hard-code the number of tables I want to have).
%% Separate data of table T in tables T1,T2,T3
starting_index = 1;
T1 = T(1:counter_simulations(1),:);
starting_index = counter_simulations(1)+1;
T2 = T(starting_index:starting_index+counter_simulations(2)-1,:);
starting_index = starting_index + counter_simulations(2);
T3 = T(starting_index:starting_index+counter_simulations(3)-1,:);
Any ideas on how to do it dynamically? I would like to do something like that:
for (i=1:number_of_tables_to_create)
T{i} = ...
end
EDIT: the variable counter_simulations is an array containing the number of rows I want to extract for each table. Example: counter_simulations(1)=200 will mean that the first table will be T1= T(1:200, :). If counter_simulations(2)=300 the first table will be T1= T(counter_simulations(1)+1:300, :) and so on.
I hope I was clear.
Should I use cell arrays instead of tables maybe?
Thanks!

For the example you give, where counter_simulations contains a list of the number of rows to take from T in each of the output tables, MATLAB's mat2cell function actually implements this behaviour directly:
T = mat2cell(T,counter_simulations);
While you haven't specified the contents of counter_simulations, it's clear that if sum(counter_simulations) > height(T) the example would fail. If sum(counter_simulations) < height(T) (and so your desired output doesn't contain the last row(s) of T) then you would need to add a final element to counter_simulations and then discard the resulting output table:
counter_simulations(end+1) = height(T) - sum(counter_simulations);
T = mat2cell(T,counter_simulations);
T(end) = [];
Whether this solution applies to all examples of
some conditions of the table T that are not known a priori
you ask for in the question depends on the range of conditions you actually mean; for a broad enough interpretation there will not be a general solution but you might be able to narrow it down if mat2cell performs too specific a job for your actual problem.

Related

How can I programmatically group table column variables MATLAB?

If I create a table with:
t = table(magic(3));
I get a table with a Singular Variable Name
However if I:
a = magic(3);
T = array2table(a);
Then I get a table with Three Variable Names:
If I try to group the columns by sending it only one variable name for the table:
T.Properties.VariableNames = {'OneName'};
The VariableNames property must contain one name for each variable in the table.
In the second situation, there is an option to combine the columns into one column manually by highlighting the columns and right clicking on the mouse.
How can I programmatically group the three variables to become one Variable as in the first example if I already created the matrix a ?
EDIT:
*as in the first example if I already created the table a ?
I am using R2017b
Based on the comment below, I am asking how to do mergevars prior to R2018a.
In the above example, I would be able to group them into one variable with:
t = table(a);
In other words, I hoped to create multiple multicolumn variables. In other-other words, to do mergevars prior to R2018a.
Once the table T has been created with a variable name for each column, the column values could be extracted, and then assigned back to T:
b = T{:, 1:2};
c = T{:, 3};
T = table(b, c);

How to properly remove NaN values from table

After reading an Excel spreadsheet in Matlab I unfortunately have NaNs included in my resulting table. So for example this Excel table:
would result in this table:
where an additional column of NaNs occurs. I tried to remove the NaNs with the following code snippet:
measurementCells = readtable('MWE.xlsx','ReadVariableNames',false,'ReadRowNames',true);
measurementCells = measurementCells(any(isstruct(measurementCells('TIME',1)),1),:);
However this results in a 0x6 table, without any values present anymore. How can I properly remove the NaNs without removing any data from the table?
Either this:
tab = tab(~any(ismissing(tab),2),:);
or:
tab = rmmissing(tab);
if you want to remove rows that contain one or more missing value.
If you want instead to replace missing values with other values, read about how fillmissing (https://mathworks.com/help/matlab/ref/fillmissing.html) and standardizeMissing (https://mathworks.com/help/matlab/ref/standardizemissing.html) functions work. The examples are exhaustive and should help you to find the solution that best fits your needs.
One last solution you have is to spot (and manipulate in the way you prefer) NaN values within the call to the readtable function using the EmptyValue parameter. But this works only against numeric data.

Creating the optimum index for my database

I have a table in postgresql with the following information:
rawData (fileID integer references otherTable, lineNum integer, data1 double, ...)
When I am searching this table, I do so with the following query:
SELECT lineNum, data1, ...other data FROM rawData WHERE
fileID = ? AND data1 < ? ORDER BY lineNum;
In general, the data in this table is a number of entries for each fileID, and each fileID has lineNum from 0 to x, with lineNum never repeating for each fileID (but it does repeat for different fileID's). Then data1 is effectively a random number that may or may not overlap.
In order to speed up the reading of this data, I am trying to create an index on it, but am having trouble figuring out the best way to index it. Currently I am looking at one of the following two index methods, and am wondering which would be better for my search, or if there is another option that I haven't thought of that would be better than either of them.
index idea 1:
CREATE INDEX searchIndex ON rawData (fileID, data1, lineNum);
index idea 2:
CREATE INDEX searchIndex ON rawData (fileID, lineNum, data1);
Note that at this time, this and a search not constrained by data1 are the only searches that I run on this table, so I'm not too concerned about this index slowing down other searches.
Lastly, would I have to change my search query to use the index, or would it automatically use that index when I search the table?
You should look at using this instead:
CREATE INDEX searchIndex ON rawData (fileID, lineNum);
A few things:
In particular, as per docs, Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized.
Since your second search query requires filtering without the data1 column, keeping the second column lineNum should be sufficient (since you mention it would be quasi-random), and in the rare occurrence that there are repeats, table fetches should ensure correctness. But what this would mean is that the Index would be 1/3rd smaller in size, which is a big win (Think index small-enough to be in memory / index-only-scans etc.)
Either index can be used. Which is faster will depend on many things, like how many rows are in the table, how many lineNum there are per fileID, how selective the data1 < ? clause is, what your hardware is, what our config settings are, which version of PostreSQL you are using, what physical order the table rows lie in, etc.
The only way to know for sure is to try it with your own data on your own system and see.
I'd just build an index on (fileID, lineNum, data1), or even just (fileID, lineNum), because that seems more natural, and then forget about it. Most likely it will be fast enough. Once there is a demonstrable performance problem, than you will have the test case at hand which is needed to come to a real conclusion.

How to join multiple Matlab tables stored in a structure

I have a multiple tables stored in a structure. I would like to merge all of them. The number of rows are not the same but the number of columns are the same. The common key is always in the first column.
For two tables it's easy to do the join but it's a little bit tricky with multiple ones. How can I achieve this.
Best thing I can think of is to poll your current workspace and see what variables currently exist. Then, for each variable, if it's a structure, concatenate this onto a larger structure. When you're done, you'll have one large structure that contains all of these combined. This will require the use of whos and unfortunately eval:
%// Larger structure initialization
largeStruct = [];
%// Get all variable names currently in workspace
vars = whos;
%// For each variable...
for ii = 1 : numel(vars)
%// If this is a structure, and if the variable is not any of
%// the current structure, the automatic variable answer and
%// the current variable storing our variable names...
if strcmpi(vars(ii).class, 'struct') && ~any(strcmpi(vars(ii).name, {'largeStruct', 'ans', 'vars'}))
%// Concatenate to the larger structure
largeStruct = eval(['[largeStruct ' vars(ii).name '];']);
end
end
BTW, using eval is considered bad practice. I had to use it because of your current state of the workspace. Consider using a single structure that stores all of these nested structures where the fields are the actual variable names themselves... something like stock.stockQuotes_070715, stock.stockQuotes_070815, etc. If you did it this way, we wouldn't have had to use eval to begin with.
I would poll the workspace, put all the datasets in a cell array, use cellfun to convert to tables, and then use a recursive outerjoin function like this:
tablecell = {Table1, Table2, Table3, ...}
tables = outerjoinmultiple(tablecell)
function table = outerjoinmultiple(tables)
if size(tables, 2) == 1
table = tables{1};
else
t2 = outerjoinmultiple(tables(2:end));
table = outerjoin(tables{1}, t2, 'MergeKeys', true);
end

Aggregate/sum function of a table in Matlab

In matlab I have read in a table from a csv file, then moved two columns I am interested in into a new table. These columns are "ID" (of a person, 1-400) and then another ID to represent their occupation (1-12).
What I want to do is create a simple table with 12 records and 2 columns, there is a record for each job, and the number of user IDs who have this job must be aggregated/summed, such a table could be easily bar charted. At the moment I have 400 user records, all with their IDs and one of the 12 possible job IDs.
So much like an SQL aggregate/sum function, but I want to do it in Matlab, with a table object. The problem I am having is finding how to do this without using a cell array or something similar.
Thanks!
I know that you found an answer yourself, but I would like to mention the histc function, which avoids the loop (and is faster for larger matrices):
JobCounts = histc(OccupationTable(:,2), 1:NumberOfJobs);
Combining this with the job number gives the desired result:
result = [(1:NumberOfJobs)' JobCounts];
Nevermind, solved it. Just looped through the job numbers and ran "sum" where the ID was equal to what I wanted:
for i = 1:1:NumberOfJobs;
JobCounts(i,:) = sum(OccupationTable(:,2) == i);
end