In there any way is kdb to read a csv file which is as simple as read_csv() function in pandas.
I usually use something like below code to read a csv in kdb
("I*FS";enlist ",")0:`:a.csv / where a.csv is a csv file with Integer, String, Float and Symbol columns
Many times in practical cases, the csv file we want to read has more than 100 columns, then it is difficult to provide the column types to function.
Is there a way in kdb to read csv where kdb can understand the type of column by itself?
something like
("*";enlist ",")0:`:a.csv / this fails
Simon Garland wrote a "csv guess" script many years ago: https://github.com/simongarland/csvguess
It might still be relevant. Some IDEs (such as qStudio and Kx's analyst(?)) I believe also have this functionality built in.
Alternatively you could read the first line of the csv to get the number of columns (say n) and then n#"*" to read the entire csv as string columns:
q)(count["," vs first system"head -1 a.csv"]#"*";enlist ",")0:`:a.csv
col1 col2 col3
----------------------
,"a" ,"1" "2019-01-01"
,"b" ,"2" "2019-01-01"
,"c" ,"3" "2019-01-01"
Related
I have thousands of csv files and they basically have 2 formats. One type of 2 formats is that in those csv files there are 100 rows and 2 columns. The other type of csv files has 50 columns and 5 rows. The numbers are given just to provide an example.
What I want to do is to write a Matlab code that will extract the complete second row of the csv files with the first format and make it the first row of the csv files with the second format. The number of the csv files with the first and second format is equal.
Any help is appreciated.
I have a .csv file that has numbers as column names. I want to import that file to a table in PostgreSQL, but it gives an error.
I have 1024 columns so I can't manually change it in my file. Is there a way around that?
This is the Excel file that I got:
If you want a table with 1024 columns you are doing something wrong.
You should choose a different data model.
But it is possible to use numbers as column names, as long as you surround them with double quotes.
I am working with a CSV file that contains information in the following format:
col1 col2 col3
row1 id1 , text1 (year1) , a|b|c
row2 id2 , text2 (year2) , a|b|c|d|e
row3 id3 , text3 (year3) , a|b
...
The number of rows in the CSV is very large. The years are embedded in col2 in parentheses. Also, as can be seen col3 can have variate number of elements.
I would like to read the CSV file EFFICIENTLY and end up for each item (id) with an array as follows:
For 'item' with id#_i :
A = [id_i,text_i,year_i,101010001]
where if all possible features in col3 are [a,b,c,d,....,z], the binary vector shows its presence or absence.
I am interested in efficient implementation of this in MATLAB. Ideas are more than welcome. Thank You
I would like to add what I have found to be one of the fastest ways of reading a CSV file:
importdata()
This will allow you to read numeric and non-numeric data, but it assumes there is some number of header lines. You can either input the number of header lines as an input argument to importdata() or you can let it try on its own...to which it didn't work for my use in the past.
This was much faster than xlsread() for me, where it took 1/6th the time to read something 6 times larger!
If you are reading only numeric data, you can use csvread()--which actually uses dlmread().
Thing is, there are about 10 ways to read these files, and it is really dependent not only on your goals, but the file contents.
You can use T = readtable(filename). This has the option for 'ReadVariableNames' which takes first row as header and 'ReadRowNames' that will take first column as row variable.
I have a .csv file and I can't read it on Octave. On R I just use the command below and everything is read alright:
myData <- read.csv("myData.csv", stringsAsFactors = FALSE)
However, when I go to Octave it doesn't do it properly with the below command:
myData = csvread('myData.csv',1,0);
When I open the file with Notepad, the data looks something like the below. Note there isn't a comma separating the last column name (i.e. Column3) from the first value (i.e. Value1) and the same thing happens with the last value of the first row (i.e. Value3) and the first value of the second row (i.e Value4)
Column1,Column2,Column3Value1,Value2,Value3Value4,Value5,Value6
The Column1 is meant for date values (with format yyyy-mm-dd hh:mm:ss), I don't know if that has anything to do with the problem.
Alex's answers already explains why csvread does not work for your case. That function only reads numeric data and returns an array. Since your fields are all strings, you need something that reads a csv file into a cell array.
That function is named csv2cell and is part of the io package.
As a separate note, if you plan to make operation with those dates, you may want to convert those dates as strings, into serial date numbers. This will allow you to put your dates in a numeric array which will allow for faster operations and reduced memory usage. Also, the financial package has many functions to deal with dates.
csvread only reads numeric data, so a date does not qualify unfortunately.
In Octave you might want to check out the dataframe package. In Matlab you would do readtable.
Otherwise there are also more primitive functions you can use like textscan.
After running Execute query write results to file - the columns in my output file for datatype money get broken into two columns. e.g if my revenue is $500 it is displayed correctly. But, if my revenue is $1,500.00 - there is an issue. It gets broken into two columns $1 and $500.00
Can you please help me getting my results in a csv file in a single column for datatype money?
What is this command "execute query write results to file"? Do you mean COPY? If so, have a look at the FORCE QUOTE option http://www.postgresql.org/docs/current/static/sql-copy.html
Eg.
COPY yourtable to '/some/path/and/file.csv' CSV HEADER FORCE QUOTE *;
Note: if the application that is consuming the csv files still fails because of the comma, you can change the delimiter from "," to whatever works for you (eg. "|").
Additionally, if you do not want CSV, but you do want TSV, you can omit the CSV HEADER keywords and the results will output in tab-separated format.
Comma is the list separator of our computer for some regions, some region semicolon is the list separator. so I think you need to replace the comma when you write it to csv.