Filter on parts of words in Matlab tables - matlab

Similar to Excel, I need to find out how to filter out rows of a table that do not contain a certain string.
For example, I need only rows that contain the letters "MX". Within the sheet, there are rows with strings like ZMX01, MX002, and US001. I would want the first two rows.
This seems like a simple question, so I am surprised I couldn't find any help for this!
It is similar to the question Filter on words in Matlab tables (as in Excel)

You may not find a lot of information on tables in MATLAB, as they were introduced with version R2013a, which is not that long ago. So, about your question: Let's first create a sample table:
% Create a sample table
col1 = {'ZMX01'; 'MX002'; 'US001'};
col2 = {5;7;3};
T = table(col1, col2);
T =
col1 col2
_______ ____
'ZMX01' [5]
'MX002' [7]
'US001' [3]
Now, MATLAB provides the rowfun function to apply any function to each row in a table. By default, the function you call has to be able to work on all columns of the table.
To only apply rowfun to one column, you can use the 'InputVariables' parameter, which lets you specify either the number of the column (e.g. 2 for the second column) or the name of the column (e.g. 'myColumnName').
Then, you can set 'OutputFormat' to 'uniform' to get an array and not a new table as output.
In your case, you'll want to use strfind on the column 'col1'. The return value of strfind is either an empty array (if 'MX' wasn't found), or an array of all indices where 'MX' was found.
% Apply rowfun
idx = rowfun(#(x)strfind(x,'MX'), T, 'InputVariables', 'col1', 'OutputFormat', 'uniform');
The output of this will be
idx =
[2]
[1]
[]
i.e. a 3-by-1 cell array, which is empty for 'US001' and contains a positive value for both other inputs. To create a subset of the table with this data, we can do the following:
% Create logical array, which is true for all rows to keep.
idx = ~cellfun(#isempty, idx);
% Save these rows and all columns of the table into a new table
R = T(idx,:);
And finally, we have our resulting table R:
R =
col1 col2
_______ ____
'ZMX01' [5]
'MX002' [7]

Related

Creating a For loop that iterates through all the numbers in a column of a table in Matlab

I am a new user of MatlabR2021b and I have a table where the last column (with name loadings) spans multiple sub-columns (all sub-columns were added under the same variable/column and are threated as one column). I wanto to create a For loop that goes through each separate loading column and iterates through them, prior to creating a tbl that I will input into a model. The sub-columns contain numbers with rows corresponding to the number of participants.
Previously, I had a similar analogy where the loop was iterating through the names of different regions of interest, whereas now the loop has to iterate through columns that have numbers in them. First, the numbers in the first sub-column, then in the second, and so on.
I am not sure whether I should split the last column with T1 = splitvars(T1, 'loadings') first or whether I am not indexing into the table correctly or performing the right transformations. I would appreciate any help.
roi.ic = T.loadings;
roinames = roi.ic(:,1);
roinames = [num2str(roinames)];
for iroi = 1:numel(roinames)
f_roiname = roinames{iroi};
tbl = T1;
tbl.(roinames) = T1.loadings(:,roiname);
**tbl.(roinames) = T1.loadings_rsfa(:,roiname)
Unable to use a value of type cell as an index.
Error in tabular/dotParenReference (line 120)
b = b(rowIndices,colIndices)**

How to assign the same value to all table cells in a column in a table in Matlab?

I tried
myTable.MyField{:}='AAA'
myTable.MyField(:)='AAA'
myTable.MyField{:}={'AAA'}
myTable.MyField{:}=deal('AAA')
but all failed.
Is there any way?
MATLAB requires:
To assign to or create a variable in a table, the number of rows must match the height of the table.
So it would be:
myTable.MyField = repmat('AAA', length(myTable.MyField), 1);
or if you know the column number of MyField, you can do:
myTable(:,colnum) = {'AAA'}; %where colnum is the column number
or otherwise if you don't know the column number, you can directly use the column name as well:
myTable(:,'MyField') = {'AAA'};

Join tables with non-equal rows in Matlab

I'm trying to use the relatively new data type in Matlab, table. I have a number of variables that each contains a value for a set of parameters (Rows). The rows are not (necessarily) equal for each variable, however. I want to join the variables together so the results are all displayed in a single table. E.g., I want to join these together: (drawn side by side to save space)
Var_A Var_B
________ _______
a 0.36744 b 0.88517
b 0.98798 c 0.91329
c 0.037739 d 0.79618
Is it possible to join these two tables?
Here's an example of what I'm trying to do:
A = table(rand(3,1),'VariableNames',{'Var_A'},'RowNames',{'a','b','c'})
B = table(rand(3,1),'VariableNames',{'Var_B'},'RowNames',{'b','c','d'})
try
C = join(A,B)
catch e
disp(e.identifier)
disp(e.message)
end
This results in:
MATLAB:table:join:CantInferKey
Cannot find a common table variable to use as a key variable.
Okay, so maybe join isn't intended for this -- what about outerjoin? Its documentation sounds promising:
The outer join includes the rows that match between A and B, and also unmatched rows from either A or B, all with respect to the key variables. C contains all variables from both A and B, including the key variables.
Well, outerjoin apparently can't be used with tables with row names! This is the closest I've found that does what I want, but seems to be against the idea of the table data structure to some degree:
AA = table({'a';'b';'c'},rand(3,1));
AA.Properties.VariableNames = {'param','Var_A'}
BB = table({'b';'c';'d'},rand(3,1));
BB.Properties.VariableNames = {'param','Var_B'}
CC = outerjoin(AA,BB,'Keys',1,'MergeKeys',true)
This results in
param Var_A Var_B
_____ _______ _______
'a' 0.10676 NaN
'b' 0.65376 0.77905
'c' 0.49417 0.71504
'd' NaN 0.90372
I.e., the row is just stored as a separate variable. This means it can't be indexed using "logical" notation such as CC{'a',:}.
So this can be fixed with:
CCC = CC(:,2:end);
CCC.Properties.RowNames = CC{:,1}
Which finally results in:
CCC =
Var_A Var_B
_______ ________
a 0.4168 NaN
b 0.65686 0.29198
c 0.62797 0.43165
d NaN 0.015487
But is this really the best way to go about things? Matlab is weird.
There must be a better way to do this, but here is another option:
clear;
%// Create two tables to play with.
tableA = table([.5; .6; .7 ],'variablenames',{'varA'},'rowname',{'a','b','c'});
tableB = table([.55; .62; .68],'variablenames',{'varB'},'rowname',{'b','c','d'});
%// Lets add rows to tableA so that it has the same rows as tableB
%// First, get the set difference of tableB rows and tableA rows
%// Then, make a new table with those rows and NaN for data.
%// Finally, concatenate tableA with the new table
tableAnewRows=setdiff(tableB.Properties.RowNames,tableA.Properties.RowNames);
tableAadd=table( nan(length(tableAnewRows),1) ,'variablenames',{'varA'},'rownames',tableAnewRows);
tableA=[tableA;tableAadd];
%// Lets add rows to tableB so that it has the same rows as tableA
tableBnewRows=setdiff(tableA.Properties.RowNames,tableB.Properties.RowNames);
tableBadd=table( nan(length(tableBnewRows),1) ,'variablenames',{'varB'},'rownames',tableBnewRows);
tableB=[tableB;tableBadd];
%// Form tableC from tableA and tableB. Could also use join().
tableC=[tableA tableB];

Finding if values in two columns exist

I have two columns of dates and I want to run a query that returns TRUE if there is a date in existence in the first column and in existence in the second column.
I know how to do it when I'm looking for a match (if the data entry in column A is the SAME as the entry in column B), but I don't know know how to find if data entry in column A and B are in existence.
Does anyone know how to do this? Thanks!
If data in a column is present, it IS NOT NULL. You can query for that on both columns, with and AND clause to get your result:
SELECT (date1 IS NOT NULL AND date2 IS NOT NULL) AS both_dates
FROM mytable;
So, rephrasing:
For any two entries in table x with date columns a and b, is there some pair of rows x1 and x2 where x1.a = x2.b?
If that's what you're trying to do, you want a self-join, e.g, presuming the presence of a single key column named id:
SELECT x1.id, x2.id, x1.a AS x1_a_x2_b
FROM mytable x1
INNER JOIN mytable x2 ON (x1.a = x2.b);

Pixel values of raster records to be inserted in the table as columns

I have a table with following columns:
(ID, row_num, col_num, pix_centroid, pix_val1).
I have more than 1000 records. I am inserting my data using:
insert into pixelbased (row_num, col_num, pix_centroid, pix_val)
select
(ST_PixelAsPolygons(rast, 1)).x as X,
(ST_PixelAsPolygons(rast, 1)).y as Y,
(ST_Centroid((ST_PixelAsPolygons(rast, 1)).geom)) as geom,
(ST_PixelAsPolygons(rast, 1)).val as pix_val1
from mytable
where rid=1`
Now I am trying to insert all the other records as a column and _pix_val1_ column is important for me. All the other columns will remain the same. In the other word, I want the final table to have these columns:
(ID, row_num, col_num, pix_centroid, pix_val1, pix_val2, pix_val3, ....)
Is there a way to do it?
I would want to store this data as a bitmap in a bytea if possible. Here's how to take a series of byte values and turn it into a bytea:
WITH bytes(b) AS (SELECT x % 256 FROM generate_series(1,53000) x)
SELECT ('\x'||string_agg(lpad(to_hex(b),2,'0'),''))::bytea FROM bytes;
You can access fields or ranges of the byte array using the substr function. This bytea is organized as a linear pixel array, but you may find it more useful to organize it into a more traditional bitmap format. Also, if your pixels are more than one byte you may need to cope with big-endian vs little-endian. You could do that in SQL, but it's likely to be much easier in a procedural language like PL/Perl.
Failing that, a multidimensional array would be a somewhat reasonable choice.
Using a generate_series statement as a substitute for your pix_val field for convenient testing, this query produces a two-dimensional array of integers using two aggregation passes:
SELECT ('{'||string_agg(subarray, ',')||'}')::integer[] AS arr
FROM (
SELECT array_agg(x order by x)::text
FROM generate_series(1,53000) x
GROUP BY width_bucket(x, 1, 53001, 100)
) a(subarray);
The unfortunate use of the string literal form of the two dimensional array is made necessary by the fact that array_agg cannot aggregate arrays. In my view this is a real wart in PostgreSQL; in general its multidimensional arrays are odd to work with and inconsistent with how most applications and languages implement arrays.
You can get fields out of the array by indexing it. Example:
regress=> SELECT ('{'||string_agg(subarray, ',')||'}')::integer[] AS arr INTO test FROM (SELECT array_agg(x order by x)::text from generate_series(1,53000) x GROUP BY width_bucket(x, 1, 53001, 100)) a(subarray);
regress=> \d test
Table "public.test"
Column | Type | Modifiers
--------+-----------+-----------
arr | integer[] |
test contains a single array with two dimensions:
regress=> \x
regress=> select array_dims(test.arr), array_ndims(test.arr), array_length(test.arr,1), array_length(test.arr,2) FROM test;
-[ RECORD 1 ]+---------------
array_dims | [1:100][1:530]
array_ndims | 2
array_length | 100
array_length | 530
I can get elements with two-level indexing:
regress=> SELECT test.arr[4][4] FROM test;
arr
------
1594
(1 row)
or a "column" with slicing:
regress=> SELECT test.arr[4:4][1:530] FROM test;
Oddly, this is still a two-dimensional array, the top dimension is just one element deep. You can flatten it (inefficiently) with unnest and array_agg if you need to.
Two-dimensional arrays in PostgreSQL are somewhat weird, as you can see, but so is what you're trying to do.