How to replace quotes from the output of pyspark sql?

How to replace quotes from the output of pyspark sql? - pyspark

My data output after pyspark sql code is
OUTPUT I AM GETTING
\Copy Home 400 Mini + 19\"\" TV + Antenna Upgrade EasyBuy Kapsabet\""
OUTPUT I WANT
Copy Home 400 Mini + 19" TV + Antenna Upgrade EasyBuy Kapsabet
If you notice I only have one quotation mark in Copy Home 400 Mini + 19" TV , But my output is very different . I know escape but its not working properly
Pyspark SQL Code for write
spark_raw_5.repartition(1).write.mode("overwrite").option("header","true").
option("timestampFormat", "yyyy-MM-dd HH:mm:ss").option("escape","\"")
csv("/Users/adityaverma/Downloads/accounts_spark_output.csv")
But the option("escape","\"") doesn't seem to work properly help me. Help me get an output
Copy Home 400 Mini + 19" TV + Antenna Upgrade EasyBuy Kapsabet

Related

How to read CSV correctly - pyspark and a mess data

I tried to read a CSV file with pyspark with the following line in it:
2100,"Apple Mac Air A1465 11.6"" Laptop - MD/B (Apr, 2014)",Apple MacBook
My code for reading:
df = spark.read.options(header='true', inferschema='true').csv(file_path)
And the df splits the second component at the middle:
first component: 2100
second component: "Apple Mac Air A1465 11.6"" Laptop - MD/B (Apr,
Third component: 2014)"
Meaning that the second original component was split into two components.
I tried several more syntaxes (databricks, sql context etc. ) but all had the same result.
What is the reason for that? How could I fix it?

For this type of scenarios spark has provided a great solution i.e. escape option.
just add escape =' " ' in options. you will get 3 components as shown below.
df= spark.read.options(header='true', inferschema='true',escape='"').csv("file:///home/srikarthik/av.txt")

This is happening because file seperator is comma(',').
So write a code such that it will ignore comma when it comes between " and "
otherwise second solution-you read the file as it is without column header.then replace comma with */any other punctuation when it comes bet " ".and then save the file then read using comma as seperator it will work

psycopg2.DataError: extra data after last expected column CONTEXT: COPY csvfails, line 1:

Its a Django app in which im loading a CSV , table gets created OK but the CSV copying to PSQL fails with ERROR =
psycopg2.DataError: extra data after last expected column
CONTEXT: COPY csvfails, line 1:
Questions already referred -
"extra data after last expected column" while trying to import a csv file into postgresql
Have tested multiple times , with CSV of different Column Counts , am sure now the COLUMN Count is not the issue , its the content of the CSV file. As when i change the Content and upload same CSV , table gets created and dont get this error . Content of CSV file that fails is as seen below. Kindly advise what in this content prompts - psycopg2/psql/postgres to give this error .
No as suggested in the comment cant paste even a single row of the CSV file , the **imgur** image add-in , wont allow , not sure what to do now ?
Seen below screenshots from psql - cli - the table had been created with the correct columns count , still got the error .
EDIT_1 - Further while saving on my ubuntu , using libre office , unchecked the - Separator Options >> Separated By >> TAB and SEMICOLON . This CSV then saved with only -- Separator Options >> COMMA.
The python line of code which might be the culprit is =
with open(path_csv_for_psql, 'r') as f:
next(f) # Skip the header row.
csv_up_cursor.copy_from(f, str(new_table_name), sep=',')
conn.commit()
I thought i read somewhere that the - separator parameter passed to copy_from which is default = sep=',') , could be the issue ?

Unzipping files in SAS using 7zip

I´m currently trying to unzip an excel file using 7zip in SAS.
I´ve done some looking around and I´ve managed to put this together although I get the error message "7-Zip: Cannot find archive"
%let UNZIP = C:\Users\maz\Outputfile;
%let CDRIVE = C:\Users\maz\Zip File\TodayFile.zip;
data _null_;
X "cd C:\Program Files\7-Zip";
X "7zG e &CDRIVE. -o&UNZIP.";
run;
Doing some research tells me the folder does not exist, but I know it does. Also, some sources use 7za but I only have 7zG. Any ideas on what to look at next or what is going on?

This is very likely due to the space in 'Zip File'. Try putting quotes around the path name. You can use a double double-quote in a string to represent a single double-quote(!), like this:
X "7zG e ""&CDRIVE"" -o&UNZIP";

X cd "C:\Program Files\7-Zip";
Not really a SAS question. You need to follow the rules of the OS for a path with blanks.

Printing report on a VSAM file

I am making a payslip project using IBM mainframes and I am asked to create a payslip report for an employee every month. This payslip is supposed to be stored into a VSAM file in a format as follows as follows:
The data can be fetched from DB2 via a COBOL program but the problem is that I don't know the method to write contents into the file in such a user formatted style.
So can anyone please send me a COBOL code to help me out because when I searched on the internet, all I could find was how to store data in a key-sequenced record format.
Also please tell as to what type of VSAM I should use.

Id division.
Program-id. VSAMSEQW.
Environment division.
Input-output section.
File-control.
Select F1
Assign to AS-DD1
Organization is sequential
File status is FS FS2.
Data division.
File section.
FD F1.
1 R1 pic X(133).
Working-storage section.
1 fs pic 99.
1 fs2.
2 rc pic s9(4) binary.
2 fc pic s9(4) binary.
2 fb pic s9(4) binary.
1 X pic X(10).
Procedure division.
Declaratives.
F1-error section.
Use after standard error procedure on F1.
1. Display "Error declarative entered"
Display " File status: " fs
Display " VSAM return code: " rc
Display " VSAM function code: " fc
Display " VSAM feedback code: " fb
Display "Testcase VSAMSEQW failed!"
Move 16 to return-code
Stop run
.
End declaratives.
MainPgm section.
Display "Entering VSAMSEQW"
Open output F1
* Do whatever you need to get data and update R1
Move "Line 1" to R1
Write R1
* 2nd output line
Move "Line 2" to R1
Write R1
Close F1
Display "Exiting VSAMSEQW"
Goback.

PostgreSQL: Save query results to file and get number of saved rows, windows/odbc

I am trying to save query results to text file automatically, without looping through reader object in VB.NET with using ODBC windows connection.
But I can't find how!
That's what I try so far:
mCmd = New OdbcCommand( _
"SELECT my_id FROM " & myTable & " WHERE myflag='1' \o 'c:/a_result.txt'", mCon)
n = mCmd.ExecuteNonQuery
But that don't work at all.
Please advice or code example how to get it.
And second...
It will be ideally that with saving results to text I get a number of saved rows in variable 'n'.
As for now I get only 0 or 1 depends if query was successful or not.
EDIT:
After some fighting I found a way for do this with more or less success.
To txt file:
mCmd = New OdbcCommand( _
"COPY (SELECT my_id FROM " & myTable & " WHERE myFlag='1' " & _
"ORDER BY my_id) TO 'c:/a_result.txt' DELIMITER AS '|'", mCon)
To csv file:
mCmd = New OdbcCommand( _
"COPY (SELECT my_id FROM " & myTable & " WHERE myFlag='1' " & _
"ORDER BY my_id) TO 'c:/a_result.csv' WITH CSV", mCon)
That works, but I am not able to escape quotes and '\' so I got double signs in output file.
If someone with experience know how to achieve escaping and changing delimiter for csv files I would be glad to see it on given example.
Variable 'n' after query contain a number of exported rows.

The \o sequence is a psql meta-command. That means it is a feature of psql. If you want this functionnality you will have to implement it in your client. It is very easy though.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse