Getting Python to accept a csv into postgreSQL table with ":" in the headers - postgresql

I receive a .csv export every 10 minutes that I'd like to import into a postgreSQL server. Working with a test csv, I got everything to work, but didn't take notice that my actual csv file has a forced ":" at the end of each column header (but not on the first header for some reason)(built into the back-end of the exporter, so I cant get it removed, already asked the company). So I added the ":"s to my test csv as shown in the link,
My insert into functions no longer work and give me syntax errors. First I'm trying to add them using the following code,
print("Reading file contents and copying into table...")
with open('C:\\Users\\admin\\Desktop\\test2.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
columns = next(readCSV) #skips the header row
query = 'insert into test({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
for data in readCSV:
cursor.execute(query, data)
con.commit()
Resulting in '42601' error near ":" in the second column header.
The results are the same while actually listing column headers and ? ? ?s out in the INSERT INTO section.
What is the syntax to get the script to accept ":" on column headers? If there's no way, is there a way to scan through headers and remove the ":" at the end of each?

Because : is a special character, if your column is named year: in the DB, you must double quote its name --> select "year:" from test;
You are getting a PG error because you are referencing the unquoted column name (insert into test({0})), so add double quotes there.
query = 'insert into test("year:","day:", "etc:") values (...)'
That being said, it might be simpler to remove every occurrence of : in your csv's 1st line

Much appreciated JGH and Adrian. I went with your suggestion to remove every occurrence of : by adding the following line after the first columns = ... statement
columns = [column.strip(':') for column in columns]
It worked well.

Related

Adding field delimiter ";" in last column on header file

I'm new in datastage and trying to create a sequential file with ";" as delimeter.
I would like to add my delimeter just after the last column in the headers
please see below exemple for more understanding
Actully i have this in my sequential file :
SERVICE_ID;OFFER_ID;MINIMUM;MAXIMUM
19441;162887;;;
19442;162889;;;
Expected result with delimiter after last column in header :
SERVICE_ID;OFFER_ID;MINIMUM;MAXIMUM;
19441;162887;;;
19442;162889;;;
How can i do that please ?
Use the Final Delimiter property in the Sequential File stage format properties.

psycopg2 error while writing data from csv file: extra data after last expected column

I am trying to insert data from a csv file (file.csv) into two columns of the table in Postgres. The data looks like this:
#Feature AC;Feature short label
EBI-517771;p.Leu107Phe
EBI-491052;p.Gly23Val
EBI-490120;p.Pro183His
EBI-517851;p.Gly12Val
EBI-492252;p.Lys49Met
EBI-527190;p.Cys360Ser
EBI-537514;p.Cys107Ser
The code I am running is as follows:
# create table in ebi_mut_db schema
cursor.execute("""
CREATE TABLE IF NOT EXISTS ebi_mut_db.mutations_affecting_interactions(
feature_ac TEXT,
feature_short_label TEXT)
""")
with open(file.csv', 'r') as f:
# Notice that we don't need the `csv` module.
next(f) # Skip the header row.
cursor.copy_from(f, 'ebi_mut_db.mutations_affecting_interactions', sep=';')
conn.commit()
The table is created but while writing the data, it is showing below error.
Traceback (most recent call last):
File "stdin<>", line 38, in <module>
cursor.copy_from(f, 'ebi_mut_db.mutations_affecting_interactions', sep=';')
psycopg2.errors.BadCopyFileFormat: extra data after last expected column
CONTEXT: COPY mutations_affecting_interactions, line 23: "EBI-878110;"p.[Ala223Pro;Ala226Pro;Ala234Asp]""
There are no extra columns except the two. My understanding is the code is detecting more than 2 columns.
Thanks
You have not told the COPY you are using CSV format, so it is using the default TEXT format. In that format, quoting does not protect special characters, and since there is more than one ; there is more than two columns.
If you want the COPY to know that ; inside quotes do not count as separators, then you have to tell it to use CSV format. In psycopg2, I think you have to use copy_expert, not copy_from, in order to accomplish this.

psycopg2.DataError: extra data after last expected column CONTEXT: COPY csvfails, line 1:

Its a Django app in which im loading a CSV , table gets created OK but the CSV copying to PSQL fails with ERROR =
psycopg2.DataError: extra data after last expected column
CONTEXT: COPY csvfails, line 1:
Questions already referred -
"extra data after last expected column" while trying to import a csv file into postgresql
Have tested multiple times , with CSV of different Column Counts , am sure now the COLUMN Count is not the issue , its the content of the CSV file. As when i change the Content and upload same CSV , table gets created and dont get this error . Content of CSV file that fails is as seen below. Kindly advise what in this content prompts - psycopg2/psql/postgres to give this error .
No as suggested in the comment cant paste even a single row of the CSV file , the **imgur** image add-in , wont allow , not sure what to do now ?
Seen below screenshots from psql - cli - the table had been created with the correct columns count , still got the error .
EDIT_1 - Further while saving on my ubuntu , using libre office , unchecked the - Separator Options >> Separated By >> TAB and SEMICOLON . This CSV then saved with only -- Separator Options >> COMMA.
The python line of code which might be the culprit is =
with open(path_csv_for_psql, 'r') as f:
next(f) # Skip the header row.
csv_up_cursor.copy_from(f, str(new_table_name), sep=',')
conn.commit()
I thought i read somewhere that the - separator parameter passed to copy_from which is default = sep=',') , could be the issue ?

Use of column names in Redshift COPY command which is a reserved keyword

I have a table in redshift where the column names are 'begin' and 'end'. They are Redshift keywords. I want to explicitly use them in the Redshift COPY command. Is there a workaround rather than renaming the column names in the table. That will be my last option.
I tried to enclose them within single/double quotes, but looks like the COPY command only accepts comma separated column names.
Copy command works fails if you don't escape keywords as column name. e.g. begin or end.
copy test1(col1,begin,end,col2) from 's3://example/file/data1.csv' credentials 'aws_access_key_id=XXXXXXXXXXXXXXX;aws_secret_access_key=XXXXXXXXXXX' delimiter ',';
ERROR: syntax error at or near "end"
But, it works fine if as begin and end are enclosed by double quote(") as below.
copy test1(col1,"begin","end",col2) from 's3://example/file/data1.csv' credentials 'aws_access_key_id=XXXXXXXXXXXXXXX;aws_secret_access_key=XXXXXXXXXXX' delimiter ',';
I hope it helps.
If there is some different error please update your question.

USQL Escape Quotes

I am new to Azure data lake analytics, I am trying to load a csv which is double quoted for sting and there are quotes inside a column on some random rows.
For example
ID, BookName
1, "Life of Pi"
2, "Story about "Mr X""
When I try loading, it fails on second record and throwing an error message.
1, I wonder if there is a way to fix this in csv file, unfortunatly we cannot extract new from source as these are log files?
2, is it possible to let ADLA to ignore the bad rows and proceed with rest of the records?
Execution failed with error '1_SV1_Extract Error :
'{"diagnosticCode":195887146,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXTRACT_ROW_ERROR","message":"Error
occurred while extracting row after processing 9045 record(s) in the
vertex' input split. Column index: 9, column name:
'instancename'.","description":"","resolution":"","helpLink":"","details":"","internalDiagnostics":"","innerError":{"diagnosticCode":195887144,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXTRACT_EXTRACT_INVALID_CHARACTER_AFTER_QUOTED_FIELD","message":"Invalid
character following the ending quote character in a quoted
field.","description":"Invalid character is detected following the
ending quote character in a quoted field. A column delimiter, row
delimiter or EOF is expected.\nThis error can occur if double-quotes
within the field are not correctly escaped as two
double-quotes.","resolution":"Column should be fully surrounded with
double-quotes and double-quotes within the field escaped as two
double-quotes."
As per the error message, if you are importing a quoted csv, which has quotes within some of the columns, then these need to be escaped as two double-quotes. In your particular example, you second row needs to be:
..."Life after death and ""good death"" models - a qualitative study",...
So one option is to fix up the original file on output. If you are not able to do this, then you can import all the columns as one column, use RegEx to fix up the quotes and output the file again, eg
// Import records as one row then use RegEx to clean columns
#input =
EXTRACT oneCol string
FROM "/input/input132.csv"
USING Extractors.Text( '|', quoting: false );
// Fix up the quotes using RegEx
#output =
SELECT Regex.Replace(oneCol, "([^,])\"([^,])", "$1\"\"$2") AS cleanCol
FROM #input;
OUTPUT #output
TO "/output/output.csv"
USING Outputters.Csv(quoting : false);
The file will now import successfully. My results: