How to extract a bz2 file in spark - scala

I have a csv file zipped in bz2 format, like unix/linux do we have any single line command to extrac/decompress the file file.csv.bz2 to file.csv in spark-scala?

You can use built in function in SparkContext(sc), this worked for me
sc.textFile("file.csv.bz2").saveAsTextFile("file.csv")

Related

How to merge csv files into single parquet file inside a folder in pyspark?

I want to merge three csv files into single parquet file using pyspark.
Below mentioned is my S3 path,10th date folder having three files, I want merge those files into a single file as parquet
"s3://lla.raw.dev/data/shared/sap/orders/2022/09/10/orders1.csv,orders2.csv,orders3.csv"
Single file
"s3://lla.raw.dev/data/shared/sap/orders/parquet file
Just read from CSVs and write to parquet
(spark
# read from CSV
.read.csv('s3://lla.raw.dev/data/shared/sap/orders/2022/09/10/')
# turn to single file
.coalesce(1)
# write to parquet
.write
.parquet('s3://lla.raw.dev/data/shared/sap/orders/parquet')
)

How can I read a zipped CSV file with KDB?

I've got a number of CSV files saved with pandas as zip files. I'd like to read them into KDB without having to manually unzip them in a terminal beforehand.
It looks like KDB supports compression:
https://code.kx.com/q/kb/file-compression/
But I can't figure out how to get it to decompress it. What I read in looks like the literal zip file.
How do I read a zipped CSV file in KDB?
Named pipes can be used for this purpose
https://code.kx.com/q/kb/named-pipes/
q)system"rm -f fifo && mkfifo fifo"
q)system"unzip -p t.zip t.csv > fifo &"
q)trade:flip `sym`time`ex`cond`size`price!"STCCFF"$\:()
q).Q.fps[{`trade insert ("STCCFF";",")0:x}]`:fifo

how to convert json file into csv file using unix shellscript or command-line?

i have file example.json file. but i want to convert into csv file, so on csv file i can insert the data into external table using postgresSQL.
there is alternative way to convert it ? or there is way to convert it using unix shellscript or just direct use postgresSQL to insert the example.json file into external table ?

Using command line to compress a .bak file and moving the zip file

I am trying to use command line to zip a .bak file then cut the zip file to another location or copy/paste and delete the original once the copy is complete.
Right now my script is
copy "\\a\*.bak" "\\b"
I would like to compress the .bak file (in folder a) as it is a huge file, and then CUT the zip file into folder b.
Windows has a built-in functionality for this.
tar.exe -a -c -f file_Output.zip your_file.bak
file_Output.zip will be the file you will get as your zipped file, your_file.bak will be the file you are trying to compress.

Linux zip command - adding date elements to file name

occasionally I run a backup of my phpbb forum files from the Shell command line:
zip -r forum_backup ~/public_html/forum/*
I'd like to add date elements to the file name, so that the zip file created is automatically formed as
forum_backup_05182013.zip
any other similar current date format would also be acceptable
now=$(date +"%m%d%Y")
zip -r forum_backup_$now ~/public_html/forum/
Without defining a variable first you can do it in one line with
zip -r "forum_backup_$(date +"%Y-%m-%d").zip" filelist
As taken from here
the following shell command, change the format as you want
FORMAT="%Y%m%d"
_DATE=$(date +"$FORMAT" )
zip -r "forum_bakcup_${_DATE}" ~/public_html/forum/*