How to export every table to csv in a kdb+ database? - kdb

Assume my kdb+ database has a few tables. How can I export all tables to csv files where the name of each csv is same as the table name?

There may be a number of ways to approach this, one solution could be:
q)t1:([]a:1 2 3;b:1 2 3)
q)t2:([]a:1 2 3;b:1 2 3;c:1 2 3)
q){save `$(string x),".csv"} each tables[]
`:t1.csv`:t2.csv
ref: http://code.kx.com/q/ref/filewords/#save
If you wish to specify the directory of the file being saved down then you could enhance the function like so:
q){[dir;tSym] save ` sv dir,(`$ raze string tSym,`.csv)}[`:C:/Users/dhughes1/Documents;] each tables[]
`:C:/Users/dhughes1/Documents/t1.csv`:C:/Users/dhughes1/Documents/t2.csv

An alternative method to save is to use 0: to prepare text, specifying a delimiter of ",":
q)tab:([]a:1 2 3;b:`a`b`c)
q)show t:","0:tab
"a,b"
"1,a"
"2,b"
"3,c"
And again to save text:
q)`:tab 0: t
`:tab
The advantage of this method is that the delimiter can be specified before saving to disk.

Related

why are csvs copied from QPAD and csvs saved from q process so different in terms of size?

I am trying to save a csv generated from a table.
If I 'Export all as CSV' from QPAD the file is 22MB.
If I do `:path.csv 0: csv 0: table the file is 496MB.
The file contains same data.
I do have some columns which are list of dates, list of symbols which cause some issues when parsing to csv.
To get over that I use this {`$$[1=count x;string first x;`$" "sv string x]}
i.e. one of the cols is called allDates and looks like this:
someOtherCol
allDates
stackedSymCol
val1
, 2001.01.01
, `sym 1
val2
2001.01.01 2001.01.02
`sym 2`sym 3
Where is this massive difference in size coming from and how can I reduce the the size.
If I remove these 3 columns which are lists of lists, the file goes down significantly.
Doing an ungroup is not an option.
I think the important question here is why is QPAD capable to handle columns which are lists of lists of type 'D' 'S' etc and how I can achieve that without casting those columns to a space delimited string. This is what is causing my saved csv to be so massive.
ie. I can do an 'Export all to csv' from QPAD on this and it is 21MB :
but if I want to save it programatically, I need to change those allDates and DESK_NAME column and it goes up to 500MB
UPDATE: Thanks everyone. I did not know that QPAD is truncating data like that on exports. That is worrying.
These csvs will not be identical. qPad truncates nested lists(including strings). The csv exported directly from kdb will be complete.
Eg.
([]a:3#enlist til 1000;b:3#enlist til 1000)
The qPad csv export of this looks like this at the end: 30j, 31j ....
Based on the update to your answer it seems you are exporting the data shown in the screenshot which would not be the same as the data you are transforming to save to csv directly from q.
Based on the screenshot it is likely the csv files are not identical for at least 3 reasons:
QPad is truncating the nested dates at a certain length
QPad adds enlist to nested lists of length 1
QPad adds/keeps backticks before symbols
Example data comparison
Here is a minimal example that should highlight this:
q)example:{n:1 2 20;([]someOtherCol:3?10;allDates:n?\:.z.d;stackedSymCol:n?\:`3)}[]
q)example
someOtherCol allDates
stackedSymCol
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 ,2006.01.13
,`hfg
1 2008.04.06 2008.01.11
`nha`plc
4 2009.06.12 2016.01.24 2021.02.02 2018.09.02 2011.06.19 2022.09.26 2008.10.29 2010.03.11 2022.07.30 2012.09.06 2021.11.27 2017.11.24 2007.09.10 2012.11.27 2020.03.10 2003.07.02 2007.11.29 2010.07.18 2001.10.23 2000.11.07 `ifd`jgp`eln`kkb`ahm`cal`eni`idj`mod`omb`dkc`ogf`eaj`mbf`kdd`hip`gkg`eef`edi`jak
I have used 'Export All to CSV' to save to C:/q/qpad.csv.
I couldn't get your "razing" function to work as-is so I modified it slightly and used that to convert nested lists to strings and saved the file to csv.
q)f:{`$$[1=count x;string first x;" "sv string x]}
q)`:C:/q/q.csv 0: csv 0: update f'[allDates], f'[stackedSymCol] from example
Reading from both files and comparing the contents results in mismatched contents.
q)a:read0`:C:/q/q.csv
q)b:read0`:C:/q/qpad.csv
q)a~b
0b
Side note
Since kdb+ V4.0 2020.03.17 it is possible to save nested vectors to csv using .h.cd to prepare the text. The variable .h.d is used as the delimiter for sublist items.
q).h.d:" ";
q).h.cd example
"someOtherCol,allDates,stackedSymCol"
"8,2013.09.10,pii"
"6,2007.08.09 2012.12.30,hbg blg"
"8,2011.04.04 2020.08.21 2006.02.12 2005.01.15 2016.05.31 2015.01.03 2021.12.09 2022.03.26 2013.10.15 2001.10.29 2011.02.17 2010.03.28 2005.11.14 2003.08.16 2002.04.20 2004.08.07 2014.09.19 2000.05.24 2018.06.19 2017.08.14,cim pgm gha chp dio gfc beh mbo cfe kec jbn bjh eni obf agb dce gnk jif pci ppc"
q)`:somefile.csv 0: .h.cd example
CSV saved from q
Contents of the csv saved from q and the character count are shown in the example:
q)read0`:C:/q/q.csv
"someOtherCol,allDates,stackedSymCol"
"8,2013.09.10,pii"
"6,2007.08.09 2012.12.30,hbg blg"
"8,2011.04.04 2020.08.21 2006.02.12 2005.01.15 2016.05.31 2015.01.03 2021.12.09 2022.03.26 2013.10.15 2001.10.29 2011.02.17 2010.03.28 2005.11.14 2003.08.16 2002.04.20 2004.08.07 2014.09.19 2000.05.24 2018.06.19 2017.08.14,cim pgm gha chp dio gfc beh mbo cfe kec jbn bjh eni obf agb dce gnk jif pci ppc"
q)count raze read0`:C:/q/q.csv
383
CSV saved from QPad
Similarly the contents of the csv saved from QPad and the character count:
q)read0`:C:/q/qpad.csv
"someOtherCol,allDates,stackedSymCol"
"1,enlist 2006.01.13,enlist `hfg"
"1,2008.04.06 2008.01.11,`nha`plc"
"4,2009.06.12 2016.01.24 2021.02.02 2018.09.02 2011.06.19 2022.09.26 2008.10.29 2010.03.11 2022.07.30 2012.09.06 2021.11.27 2017.11.24 2007.09.10 2012.11.27 ...,`ifd`jgp`eln`kkb`ahm`cal`eni`idj`mod`omb`dkc`ogf`eaj`mbf`kdd`hip`gkg`eef`edi`jak"
q)count raze read0`:C:/q/qpad.csv
338
Conclusion
We can see from these examples the points outlined above. The dates are truncated at a certain length, enlist is added to nested lists of length 1, and backticks are kept before symbols.
The truncated dates could be the reason why the file you have exported from QPad is so much smaller. Based on your comments above the files are not identical, so this may be the reason.
TL;DR - Both files are created differently and that's why they differ.

How do I extract the last string of a csv file and append it to the other?

I have csv file of many rows, each having 101 columns, with the 101th column being a char, while the rest of the columns are doubles. Eg.
1,-2.2,3 ... 98,99,100,N
I implemented a filter to operate on the numbers and wrote the result in a different file, but now I need to map the last column of my old csv to my new csv. how should I approach this?
I did the original loading using loadcsv but that didn't seem to load the character so how should I proceed?
In MATLAB there are many ways to do it, this answer expands on the use of tables:
Input
test.csv
1,2,5,A
2,3,5,G
5,6,8,C
8,9,7,T
test2.csv
1,2,1.2
2,3,8
5,6,56
8,9,3
Script
t1 = readtable('test.csv'); % Read the csv file
lastcol = t{:,end}; % Extract the last column
t2 = readtable('test2.csv'); % Read the second csv file
t2.addedvar = lastcol; % Add the last column of the first file to the table from the second file
writetable(t2,'test3.csv','Delimiter',',','WriteVariableNames',false) % write the new table in a file
Note that test3.csv is a new file but you could also overwrite test2.csv
'WriteVariableNames',false allows you to write the csv file without the headers of the table.
Output
test3.csv
1,2,1.2,A
2,3,8,G
5,6,56,C
8,9,3,T

The value in "sym" file disappears when using splayed tables

I am using the following line:
`:c:/dir/ set .Q.en[`:c:/dir; tablename]
Everything is ok if I don't exit KDB, but if I do and then try to load the table using
get `dir
all the symbol columns are integer. I would really appreciate your help into understanding why this happens.
It looks like you forgot to repeat the table name on the l.h.s. of set.
Try
q)`:c:/dir/tablename/ set .Q.en[`:c:/dir; tablename]
This will correctly save table columns in c:/dir/tablename subdirectory and place the sym file alongside. Now you should be able to load both your table and the sym file by using the \l command or specifying c:/dir on the command line when you restart q
q c:/dir
or
q
q)\l c:/dir
(no backticks or leading :'s in either of those commands)
If you want to use get on this table, you will have to load sym separately:
q)load`:c:/dir/sym
q)get`:c:/dir/tablename/
(note the leading : in the path specs)
Finally, you may want to take a look at the rsave command which will save your table without you having to write tablename twice.
.Q.en takes 2 oarams - file handle and table data
Your first param isnt a hsym - should be backtick then colon then path to your db root
Also set takes 2 params - first in this case should be the path to where you want to save like dir/splayedTableName/

postgresql how to have COPY interpret formatted numeric fields automatically?

I have an input CSV file containing something like:
SD-32MM-1001,"100.00",4/11/2012
SD-32MM-1001,"1,000.00",4/12/2012
I was trying to COPY import that into a postgresql table(varchar,float8,date) and ran into an error:
# copy foo from '/tmp/foo.csv' with header csv;
ERROR: invalid input syntax for type double precision: "1,000.00"
Time: 1.251 ms
Aside from preprocessing the input file, is there some setting in PG that will have it read a file like the one above and convert to numeric form in COPY? Something other than COPY?
If preprocessing is required, can it be set as part of the COPY command? (Not the psql \copy)?
Thanks a lot.
The option to pre processing is to first copy to a temporary table as text. From there insert into the definitive table using the to_number function:
select to_number('1,000.00', 'FM000,009.99')::double precision;
It's an odd CSV file that surrounds numeric values with double quotes, but leaves values like SD-32MM-1001 unquoted. In fact, I'm not sure I've ever seen a CSV file like that.
If I were in your shoes, I'd try copy against a file formatted like this.
"SD-32MM-1001",100.00,4/11/2012
"SD-32MM-1001",1000.00,4/12/2012
Note that numbers have no commas. I was able to import that file successfully with
copy test from '/fullpath/test.dat' with csv
I think your best bet is to get better formatted output from your source.

How can I copy columns from several files into the same output file using Perl

This is my problem.
I need to copy 2 columns each from 7 different files to the same output file.
All input and output files are CSV files.
And I need to add each new pair of columns beside the columns that have already been copied, so that at the end the output file has 14 columns.
I believe I cannot use
open(FILEHANDLE,">>file.csv").
Also all 7 CSV files have nearlly 20,000 rows each, therefore I'm reading and writing the files line by line.
It would be a great help if you could give me an idea as to what I should do.
Thanx a lot in advance.
Provided that your lines are 1:1 (Meaning you're combining data from line 1 of File_1, File_2, etc):
open all 7 files for input
open output file
read line of data from all input files
write line of combined data to output file
Text::CSV is probably the way to access CSV files.
You could define a csv handler for each file (including output), use getline or getline_hr (returns hashref) methods to fetch data, combine it into arrayrefs, than use print.