How can I read a zipped CSV file with KDB? - kdb

I've got a number of CSV files saved with pandas as zip files. I'd like to read them into KDB without having to manually unzip them in a terminal beforehand.
It looks like KDB supports compression:
https://code.kx.com/q/kb/file-compression/
But I can't figure out how to get it to decompress it. What I read in looks like the literal zip file.
How do I read a zipped CSV file in KDB?

Named pipes can be used for this purpose
https://code.kx.com/q/kb/named-pipes/
q)system"rm -f fifo && mkfifo fifo"
q)system"unzip -p t.zip t.csv > fifo &"
q)trade:flip `sym`time`ex`cond`size`price!"STCCFF"$\:()
q).Q.fps[{`trade insert ("STCCFF";",")0:x}]`:fifo

Related

Import data in gzip archive to mongodb

I have data stored in gzip archive folders, every archive contains a big file that includes json in the following format:
{key:value, key:value}
{key:value, key:value}
{key:value, key:value}
I need to import the data to MongoDB. What is the best way to do that? I can't extract the gzip on my PC as each file (not archived) is about 1950MB.
You can unzip the files to STDOUT and pipe the stream into mongoimport. Then you don't need to safe the uncompressed file to your local disk:
gunzip --stdout your_file.json.gz | mongoimport --uri=<connection string> --collection=<collection> --db=<database>
I've imported tens of billions of lines of CSV and JSON to MongoDB in the past year, even from zipped formats. Having tried them all to save precious time, here's what I would like to recommend:
unzip the file
pass it as an argument to mongoimport
create the index on the fields you want, but ONLY at the end of the entire data insert process.
You can find the mongoimport documentation at: https://www.mongodb.com/docs/database-tools/mongoimport/
If you have a lot of files, you may want to do a for in bash that unzips and passes the filename as an argument to mongoimport.
If you are worried about not having enough disk space you can also delete the unzipped file at the end of each single import of mongoimport.
Hope it helped!

How merge gz files from postgres dump into one big file?

There is a folder with postgres dump files like:
0001.dat.gz
0001.dat.gz
...
6000.dat.gz
toc.dat
How merge all these files into single gz archive which is recognized by postgres during restoring?
So it looks like you have the directory format. pg_restore will recognize that format. I don't think there is any supported way to convert it to one of the other formats. You can tar it up into a single file, but you will have to untar it before restoring. Next time you run pg_dump, you should tell it to use the format you want used.
There are subtle differences in the toc.dat file between the directory format and the tar format, so if you just uncompress and then tar up the directory, it will not work (at least in my hands). It does work the other way around, however.

ffmpeg concat command not reading input file correctly

I am trying to concatenate two video files using ffmpeg, and I am receiving an error.
ffmpeg -f concat -safe 0 -i list.txt -c copy concat.mp4
And the error output I receive is....
[concat # 0x7ff922000000] Line 1: unknown keyword '43.mp4'
list.txt: Invalid data found when processing input
It looks like that the file names in the list have to be specially formatted to look like:
file '/path/to/file1.wav'
with a word file included. I spent a lot of time trying to guess why ffmpeg encountered an error trying to read the file names. It didn't matter if they were in the list or in the command line. So only after I utilized a command
for f in *.wav; do echo "file '$f'" >> mylist.txt; done
to make list from ffmpeg's manual I had success. The only difference was an additional word file.
Here you can read it yourself: https://trac.ffmpeg.org/wiki/Concatenate#demuxer

How to extract a bz2 file in spark

I have a csv file zipped in bz2 format, like unix/linux do we have any single line command to extrac/decompress the file file.csv.bz2 to file.csv in spark-scala?
You can use built in function in SparkContext(sc), this worked for me
sc.textFile("file.csv.bz2").saveAsTextFile("file.csv")

How to import Zipped file into Postgres Table

I would like to important a file into my Postgresql system(specificly RedShift). I have found a arguement for copy that allows importing a gzip file. But the provider for the data I am trying to include in my system only produces the data in a .zip. Any built in postgres commands for opening a .zip?
From within Postgres:
COPY table_name FROM PROGRAM 'unzip -p input.csv.zip' DELIMITER ',';
From the man page for unzip -p:
-p extract files to pipe (stdout). Nothing but the file data is sent to stdout, and the files are always extracted in binary
format, just as they are stored (no conversions).
Can you just do something like
unzip -c myfile.zip | gzip myfile.gz
Easy enough to automate if you have enough files.
This might only work when loading redshift from S3, but you can actually just include a "gzip" flag when copying data to redshift tables, as described here:
This is the format that works for me if my s3 bucket contains a gzipped .csv.
copy <table> from 's3://mybucket/<foldername> '<aws-auth-args>' delimiter ',' gzip;
unzip -c /path/to/.zip | psql -U user
The 'user' must be have super user right else you will get a
ERROR: must be superuser to COPY to or from a file
To learn more about this see
https://www.postgresql.org/docs/8.0/static/backup.html
Basically this command is used in handling large databases