readtable on text file ignores first row which contains the column names - matlab

I have a tab delimited text file which contains some data organised into columns with the first row acting as column names such as:
TN Stim Task RT
1 A A 500.2
2 B A 569
3 C A 654
and so on.
I am trying to read this textfile into MATLAB(r2018a) using readtable with
Data1 = readtable(filename);
I manage to get all the data in Data1 table, but the column names are showing as Var1, Var2 etc. If I use Name Value pairs to specify to read first row as column names as in:
Data1 = readtable(filename, 'ReadVariableNames', true);
then I get the column names as the first data row, i.e.
1 A A 500.2
So it just looks like it is ignoring the first row completely. How can I modify the readtable call to use the entries on the first row as column names?

I figured it out. It appears there was an additional tab in some of the rows after the last column. Because of this, readtable was reading it as an additional column, but did not have a column name to assign to it. It seems that if any of the column names are missing, it names them all as Var1, Var2, etc.

Based on the way your sample file text is formatted above, it appears that the column labels are separated by spaces instead of by tabs the way the data is. In this case, readtable will assume (based on the data) that the delimiter is a tab and treat the column labels as a header line to skip. Add tabs between them and you should be good to go.
Test with spaces between column labels:
% File contents:
TN Stim Task RT
1 A A 500.2
2 B A 569
3 C A 654
>> Data1 = readtable('sample_table.txt')
Data1 =
Var1 Var2 Var3 Var4 % Default names
____ ____ ____ _____
1 'A' 'A' 500.2
2 'B' 'A' 569
3 'C' 'A' 654
Test with tabs between column labels:
% File contents:
TN Stim Task RT
1 A A 500.2
2 B A 569
3 C A 654
>> Data1 = readtable('sample_table.txt')
Data1 =
TN Stim Task RT
__ ____ ____ _____
1 'A' 'A' 500.2
2 'B' 'A' 569
3 'C' 'A' 654

Related

Is there a way to create a table with multi-line column names?

I am attempting to create a table that has the following format of multi line heading for the columns
|Col1 Co2 Col3|
|Col1 Co2 Col3|
Tried this using the example and adding a | between 1st and 2nd line but did not work
T = table(categorical({'M';'F';'M'}),[45;32;34],...
{'NY';'CA';'MA'},logical([1;0;0]),..
'VariableNames',{'Gender|Gender2','Age|Age2','State|State2','Vote|Vote2'})
I am using R2018b student edition
The ability to have arbitrary variable names in tables was added to release R2019b of MATLAB. Using that release, your code works as expected and produces:
T =
3×4 table
Gender|Gender2 Age|Age2 State|State2 Vote|Vote2
______________ ________ ____________ __________
M 45 {'NY'} true
F 32 {'CA'} false
M 34 {'MA'} false
However, in your question you state that you want multi-line variables. You can make these in R2019b, but the display collapses the newline character into a ↵, like this:
>> T = table(1, 'VariableNames', {['a', newline, 'b']})
T =
table
a↵b
___
1
If it's just the display you're after, you could consider making nested tables, like this:
t1 = table(1);
t2 = table(2);
T = table(t1, t2)
which results in:
T =
1×2 table
t1 t2
Var1 Var1
____ ____
1 2
Note that that final approach works in R2019a and prior releases.
No can do. Valid variable names of tables are similar to other variables in Matlab. They cannot contain \n (new-line) or anything which is not letters and numbers. Underscore is the exception.

select only those columns from table have not null values in q kdb

I have a table:
q)t:([] a:1 2 3; b:```; c:`a`b`c)
a b c
-----
1 a
2 b
3 c
From this table I want to select only the columns who have not null values, in this case column b should be omitted from output.(something similar to dropna method in pandas).
expected output
a c
---
1 a
2 b
3 c
I tried many things like
select from t where not null cols
but of no use.
Here is a simple solution that does just what you want:
q)where[all null t]_t
a c
---
1 a
2 b
3 c
[all null t] gives a dictionary that checks if the column values are all null or not.
q)all null t
a| 0
b| 1
c| 0
Where returns the keys of the dictionary where it is true
q)where[all null t]
,`b
Finally you use _ to drop the columns from table t
Hopefully this helps
A modification of Sander's solution which handles string columns (or any nested columns):
q)t:([] a:1 2 3; b:```; c:`a`b`c;d:" ";e:("";"";"");f:(();();());g:(1 1;2 2;3 3))
q)t
a b c d e f g
----------------
1 a "" 1 1
2 b "" 2 2
3 c "" 3 3
q)where[{$[type x;all null x;all 0=count each x]}each flip t]_t
a c g
-------
1 a 1 1
2 b 2 2
3 c 3 3
The nature of kdb is column based, meaning that where clauses function on the rows of a given column.
To make a QSQL query produce your desired behaviour, you would need to first examine all your columns and determine which are all null, and then feed that into a functional statement. Which would be horribly inefficient.
Given that you need to fully examine all the columns data regardless (to check if all the values are null) the following will achieve that
q)#[flip;;enlist] k!d k:key[d] where not all each null each value d:flip t
a c
---
1 a
2 b
3 c
Here I'm transforming the table into a dictionary, and extracting its values to determine if any columns consist only of nulls (all each null each). I'm then applying that boolean list to the keys of the dictionary (i.e., the column names) through a where statement. We can then reindex into the original dictionary with those keys and create a subset dictionary of non-null columns and convert that back into a table.
I've generalized the final transformation back into a table by habit with an error catch to ensure that the dictionary will be converted into a table even if only a single row is valid (preventing a 'rank error)

Scenario based questions in Datastage

I have two scenario based questions here.
Question 1
Input Dataset
Col1
A
A
B
C
C
B
D
A
C
Output Dataset
Col1 Col2
A 1
A 2
A 3
B 1
B 2
C 1
C 2
C 3
D 1
Question2
Input data string
AA-BB-CC-DD-EE-FF (can be of any delimiter and string can have any length)
Output data string
string 1 -> AA
string 2 -> BB
string 3 -> CC
string 4 -> DD
Thanks & Regards,
Subhasree
Question 1: Can be solved with a transformer. Sort the data and use the lastrowingroup functionality.
For Col2 just create a counter as a stage variable and add 1 for each row - if reset it with a second stage variable if lastrowingroup is reached.
Aternatively you could use a rownumber column in SQL.
Question2: You have not provided enough information. Is string1 a column or row? If you do not know anything upfront about the structure (any delimiter) this will get hard...

Filter on parts of words in Matlab tables

Similar to Excel, I need to find out how to filter out rows of a table that do not contain a certain string.
For example, I need only rows that contain the letters "MX". Within the sheet, there are rows with strings like ZMX01, MX002, and US001. I would want the first two rows.
This seems like a simple question, so I am surprised I couldn't find any help for this!
It is similar to the question Filter on words in Matlab tables (as in Excel)
You may not find a lot of information on tables in MATLAB, as they were introduced with version R2013a, which is not that long ago. So, about your question: Let's first create a sample table:
% Create a sample table
col1 = {'ZMX01'; 'MX002'; 'US001'};
col2 = {5;7;3};
T = table(col1, col2);
T =
col1 col2
_______ ____
'ZMX01' [5]
'MX002' [7]
'US001' [3]
Now, MATLAB provides the rowfun function to apply any function to each row in a table. By default, the function you call has to be able to work on all columns of the table.
To only apply rowfun to one column, you can use the 'InputVariables' parameter, which lets you specify either the number of the column (e.g. 2 for the second column) or the name of the column (e.g. 'myColumnName').
Then, you can set 'OutputFormat' to 'uniform' to get an array and not a new table as output.
In your case, you'll want to use strfind on the column 'col1'. The return value of strfind is either an empty array (if 'MX' wasn't found), or an array of all indices where 'MX' was found.
% Apply rowfun
idx = rowfun(#(x)strfind(x,'MX'), T, 'InputVariables', 'col1', 'OutputFormat', 'uniform');
The output of this will be
idx =
[2]
[1]
[]
i.e. a 3-by-1 cell array, which is empty for 'US001' and contains a positive value for both other inputs. To create a subset of the table with this data, we can do the following:
% Create logical array, which is true for all rows to keep.
idx = ~cellfun(#isempty, idx);
% Save these rows and all columns of the table into a new table
R = T(idx,:);
And finally, we have our resulting table R:
R =
col1 col2
_______ ____
'ZMX01' [5]
'MX002' [7]

Perl + PostgreSQL-- Selective Column to Row Transpose

I'm trying to find a way to use Perl to further process a PostgreSQL output. If there's a better way to do this via PostgreSQL, please let me know. I basically need to choose certain columns (Realtime, Value) in a file to concatenate certains columns to create a row while keeping ID and CAT.
First time posting, so please let me know if I missed anything.
Input:
ID CAT Realtime Value
A 1 time1 55
A 1 time2 57
B 1 time3 75
C 2 time4 60
C 3 time5 66
C 3 time6 67
Output:
ID CAT Time Values
A 1 time 1,time2 55,57
B 1 time3 75
C 2 time4 60
C 3 time5,time6 66,67
You could do this most simply in Postgres like so (using array columns)
CREATE TEMP TABLE output AS SELECT
id, cat, ARRAY_AGG(realtime) as time, ARRAY_AGG(value) as values
FROM input GROUP BY id, cat;
Then select whatever you want out of the output table.
SELECT id
, cat
, string_agg(realtime, ',') AS realtimes
, string_agg(value, ',') AS values
FROM input
GROUP BY 1, 2
ORDER BY 1, 2;
string_agg() requires PostgreSQL 9.0 or later and concatenates all values to a delimiter-separated string - while array_agg() (v8.4+) creates am array out of the input values.
About 1, 2 - I quote the manual on the SELECT command:
GROUP BY clause
expression can be an input column name, or the name or ordinal number
of an output column (SELECT list item), or ...
ORDER BY clause
Each expression can be the name or ordinal number of an output column
(SELECT list item), or
Emphasis mine. So that's just notational convenience. Especially handy with complex expressions in the SELECT list.