Read just 6 column from 8 column in readtable instruction - matlab

I have a .txt file with 8 column . The columns with \t have taken apart. I want read just 6 column with readtable Instruction. please help me. thank you.
The instructions below read all columns table. please correct this instruction for me:
Table = readtable('D:\DataIntable.txt','Delimiter','\t','ReadVariableNames',true);
Data has 5 millions rows and hence, dropping columns after reading would be pointless time consumption

The overhead of converting to *.xls is silly. If you read the documentation for readtable you will see that it supports textscan-style format specifiers. This allows you to use * to ignore a field.
Using asdf.txt:
column1 column2 column3
a b c
d e f
And:
T = readtable('asdf.txt', 'ReadVariableNames', true, 'Delimiter', '\t', 'Format', '%s%s%*s');
We obtain:
T =
column1 column2
_______ _______
'a' 'b'
'd' 'e'

If you can save your data as an .xls file instead of a .txt file, you can use xlsread which allows you to specify the range of your data in the call.
[data,txt] = xlsread('filename',sheet,xlRange)
You would have to know the cell indices of your data in the spreadsheet (i.e. A1:C500 would be a 500x3 matrix of imported data) but it would allow you to specify only importing the desired columns. The txt output will import column titles as strings since it appears that you want the names associated with the data as well.

Related

How to convert a string to a table with `textscan`?

I'm using matlab to read in COVID-19 data provided by Johns Hopkins as a .csv-file using urlread, but I'm not sure how to use textscan in the next step in order to convert the string into a table. The first two columns of the .csv-file are strings specifying the region, followed by a large number of columns containing the registered number of infections by date.
Currently, I just save the string returned by urlread locally and open this file with importdata afterwards, but surely there should be a more elegant solution.
You have mixed-up two things: Either you want to read from the downloaded csv-file using ´textscan´ (and ´fopen´,´fclose´ of course), or you want to use ´urlread´ (or rather ´webread´ as MATLAB recommends not to use ´urlread´ anymore). I go with the latter, since I have never done this myself^^
So, first we read in the data and split it into rows
url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv";
% read raw data as single character array
web = webread(url);
% split the array into a cell array representing each row of the table
row = strsplit(web,'\n');
Then we allocate a table (pre-allocation is good for MATLAB as it stores variables on consecutive addresses in the RAM, so tell MATLAB beforehand how much space you need):
len = length(row);
% get the CSV-header as information about the number of columns
Head = strsplit(row{1},',');
% allocate table
S = strings(len,2);
N = NaN(len,length(Head)-2);
T = [table(strings(len,1),strings(len,1),'VariableNames',Head(1:2)),...
repmat(table(NaN(len,1)),1,length(Head)-2)];
% rename columns of table
T.Properties.VariableNames = Head;
Note that I did a little trick to allocate so many reparate columns of ´NaN´s by repeating a single table. However, concatnating this table with the table of strings is difficult as both contain the column-names var1 and var2. That is why I renamed the column of the first table right away.
Now we can actually fill the table (which is a bit nasty due to the person who found it nice to write ´Korea, South´ into a comma-separated file)
for i = 2:len
% split this row into columns
col = strsplit(row{i},',');
% quick conversion
num = str2double(col);
% keep strings where the result is NaN
lg = isnan(num);
str = cellfun(#string,col(lg));
T{i,1} = str(1);
T{i,2} = strjoin(str(2:end));% this is a nasty workaround necessary due to "Korea, South"
T{i,3:end} = num(~lg);
end
This should also work for the days that are about to come. Let me know what you actually gonna do with the data

read csv file in q kdb with more than 100 columns

In there any way is kdb to read a csv file which is as simple as read_csv() function in pandas.
I usually use something like below code to read a csv in kdb
("I*FS";enlist ",")0:`:a.csv / where a.csv is a csv file with Integer, String, Float and Symbol columns
Many times in practical cases, the csv file we want to read has more than 100 columns, then it is difficult to provide the column types to function.
Is there a way in kdb to read csv where kdb can understand the type of column by itself?
something like
("*";enlist ",")0:`:a.csv / this fails
Simon Garland wrote a "csv guess" script many years ago: https://github.com/simongarland/csvguess
It might still be relevant. Some IDEs (such as qStudio and Kx's analyst(?)) I believe also have this functionality built in.
Alternatively you could read the first line of the csv to get the number of columns (say n) and then n#"*" to read the entire csv as string columns:
q)(count["," vs first system"head -1 a.csv"]#"*";enlist ",")0:`:a.csv
col1 col2 col3
----------------------
,"a" ,"1" "2019-01-01"
,"b" ,"2" "2019-01-01"
,"c" ,"3" "2019-01-01"

How to split 2 or more delimited columns in a single row to multiple rows using Talend

I am trying to move data from a CSV file to DB table. There are 2 delimited columns in the CSV file (separated by ";"). I would like to create a row for each of the delimited values at matching indexes as shown below. Assumption is that both columns will contain same number of delimited items.
Example CSV Input:
Labels Values
A;B;C 1;2;3
D 4
F;G 5;6
Expected Output:
Labels Values
A 1
B 2
C 3
D 4
E 5
F 6
How can I achieve this? I have tried using tNormalize but this only works for a single column. Also I tried 2 successive tNormalize nodes but as expected it resulted in unwanted combinations.
Thanks
Read your CSV file with a tfileinputdelimited, and
define your schema for the file.
Assuming you are using MySQL , also drop a tMysqlOutput component on you desinger to save your parsed file to the DB.

Trailing rows in datastore with multiple csv files

Matlab 2015b
I have several large (100-300MB) csv files, I want to merge to one and filter out some of the columns. They are shaped like this:
timestamp | variable1 | ... | variable200
01.01.16 00:00:00 | 1.59 | ... | 0.5
01.01.16 00:00:01 | ...
.
.
For this task I am using a datastore class including all the csv files:
ds = datastore('file*.csv');
When I read all of the entries and try to write them back to a csv file using writetable, I get an error, that the input has to be a cell array.
When looking at the cell array read from the datastore in debug mode, I noticed, that there are several rows containing only a timestamp, which are not in the original files. These columns are between the last row of a file and the first rows of the following one. The timestamps of this rows are the logical continuation of the last timestamp (as you would get them using excel).
Is this a bug or intended behaviour?
Can I avoid reading this rows in the first place or do I have to filter them out afterwards?
Thanks in advance.
As it seems nobody else had this problem, I will share how I dealt with it in the end:
toDelete = strcmp(data.(2), '');
data(toDelete, :) = [];
I took the second column of the table and checked for an empty string. Afterwards I filled all faulty rows with an empty array via logical indexing. (As shown in the Matlab Documentation)
Sadly I found no method to prevent loading the faulty data, but in the end the amount of data was not to big to do this processing step in memory.

Parsing a CSV file in matlab efficiently

I am working with a CSV file that contains information in the following format:
col1 col2 col3
row1 id1 , text1 (year1) , a|b|c
row2 id2 , text2 (year2) , a|b|c|d|e
row3 id3 , text3 (year3) , a|b
...
The number of rows in the CSV is very large. The years are embedded in col2 in parentheses. Also, as can be seen col3 can have variate number of elements.
I would like to read the CSV file EFFICIENTLY and end up for each item (id) with an array as follows:
For 'item' with id#_i :
A = [id_i,text_i,year_i,101010001]
where if all possible features in col3 are [a,b,c,d,....,z], the binary vector shows its presence or absence.
I am interested in efficient implementation of this in MATLAB. Ideas are more than welcome. Thank You
I would like to add what I have found to be one of the fastest ways of reading a CSV file:
importdata()
This will allow you to read numeric and non-numeric data, but it assumes there is some number of header lines. You can either input the number of header lines as an input argument to importdata() or you can let it try on its own...to which it didn't work for my use in the past.
This was much faster than xlsread() for me, where it took 1/6th the time to read something 6 times larger!
If you are reading only numeric data, you can use csvread()--which actually uses dlmread().
Thing is, there are about 10 ways to read these files, and it is really dependent not only on your goals, but the file contents.
You can use T = readtable(filename). This has the option for 'ReadVariableNames' which takes first row as header and 'ReadRowNames' that will take first column as row variable.