Question/Resolved - "extra data after last expected column" Error when trying to import a csv file into postgresql

Question/Resolved - "extra data after last expected column" Error when trying to import a csv file into postgresql - postgresql

Just posting this question and the solution since it took forever for me to figure this out.
Using CSV file, I was trying to import data into PostgreSQL with pgAdmin. I kept running into the same issue of "extra data after last expected column."

Solution that worked for me (instead of using Import module): copy tablename (columns) FROM 'file location .csv' CSV HEADER
Since some of the data included multiple commas within the cell, it was counting as a new column each time.

Related

PySpark - Read CSV and ignore file header (not using pandas)

I have a problem that I hope you can help me with.
The text file that looks like this:
Report Name :
column1,column2,column3
this is row 1,this is row 2, this is row 3
I am leveraging Synapse Notebooks to try to read this file into a dataframe. If I try to read the csv file using spark.read.csv() it thinks that the column name is "Report Name : ", which is obviously incorrect.
I know that the Pandas csv reader has a 'skipRows[1]' function but unfortunately I cannot read the file directly with Pandas, as I am getting some strange networking errors. I can however convert a PySpark dataframe to a Pandas dataframe via: df.toPandas()
I'd like to be able to solve this with straight PySpark dataframes.
Surely someone else has encountered this issue! Help!
I have tried every variation of reading files, and drop, etc. but the schema has already been defined when the first dataframe was created, with 1 column (Report Name : ).
Not sure what to do now..

Copied answer from similar question: How to skip lines while reading a CSV file as a dataFrame using PySpark?
import csv
from pyspark.sql.types import StringType
df = sc.textFile("test.csv")\
.mapPartitions(lambda line: csv.reader(line,delimiter=',', quotechar='"')).filter(lambda line: len(line)>=2 and line[0]!= 'column1')\
.toDF(['column1','column2','column3'])

Microsoft got back to me with an answer that worked! When using pandas csv reader, and you use the path to the source file you want to read. It requires an endpoint to blob storage (not adls gen2). I only had an endpoint that read dfs in the URI and not blob. After I added the endpoint to blob storage, the pandas reader worked great! Thanks for looking at my thread.

Importing issue in postgresql using pgadmin 4

The file is not importing after having created a table. The first line of code is for the table (COPY), the second line of code is for the path of the file (FROM) and the WITH I am not entirely sure if there's a prior line of code that needs to be entered for its success as its not being highlighted in pink. The importing should be going through in either the built-in tool of pgAdmin or the syntax but neither of them generates the needed output. Here are some screenshots:
So I did another table, this time focusing on a single column and ensuring that the name of the column matched on both the table and the file and it worked. The prior example had several columns that had difference in spellings of the column content in table and the file:

You can try this sequentially...
1. First create csv file. .csv file column sequence is most important.
2. Consider the below employee_info.csv file
And consider your database table employee_info table which contain (emp_id [numeric],emp_name[character],emp_sal[numeric],emp_loc [character])
Then Execute the below query
a. copy employee_info(emp_id,emp_name,emp_sal,emp_loc) from 'C:\Users\Zbook\Desktop\employee_info.csv' DELIMITERS ',' CSV;
Note: Ensure that each .csv file row value has not null. Like below...

How to import CSV into PostgreSQL

Respected,
I have problems with importing CSV into PostgreSQL via pgAdmin. No matter what I do, it shows the following error:
ERROR: extra data after last expected column.
Can anyone please help me and point me out a possible solution?
Thank you.
Milorad K.

check that your data is formatted as postgresql expects it to be
That error could be caused by specifying the wrong quote character or the wrong field separator. or it could be that your input file is corrupt.
I've had corrupt CSV files from banks before, so don't trust anyone.

import Multiple records from csv file into sqlite3 database

Iam working on a app which needs huge to be stord in database and display it later, but problem is that how to insert that much data in database , i use .csv file to import data from file and store it in table ,Every thing is working fine but only one Row has been inserted in dataBase , any one have any idea how to insert multiple records in databse,
For Information Iam using Terminal for sqlite3,

You can use .import file table to load large file into the table from terminal, your file and table needs to be organized properly so that they have same columns.
Also you can do search and replace in text editor and then import file with .read filename command.
Type .help in terminal when in sqlite console for list of the commands available.

OpenRowSet command in TSQL is returning NULLS

Been investigating for a while now and keep hitting a brick wall. I am importing from xls files into temp tables via the OpenRowset command. Now I have a problem where I’m trying to import a certain column has a range values but the most common are the following. Columns structured as long numbers i.e. 15598 and the some columns as strings i.e. 15598-E.
Now the openrowset is reading the string version no problem but is reporting the number version as a NULL. I read (http://www.sqldts.com/254.aspx ) that openrowset has that issue and the author speaks of implementing “HDR=YES;IMEX=1” into the query string but that’s not working for me at all.
Have any of you guys every encountered this?
Just some more info as well. I may not do this with the JET engine (Microsoft.Jet.OLEDB.4.0) so this is what my query looks like:
SELECT *
FROM
OPENROWSET('MSDASQL'
, 'Driver=Microsoft Excel Driver (*.xls);HDR=YES;IMEX=1;DBQ=C:\ImportFile.xls;'
, 'SELECT * FROM [Sheet1$]')

I notice you are using the Excel ODBC driver. Have you tried the JET OLEDB Provider with the equivalent connection string?
select * from openrowset(
'Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\ImportFile.xls;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1"',
'SELECT * FROM [Sheet1$]')
EDIT: Sorry, just noticed your last paragraph. Surely the Excel ODBC driver still goes via the JET engine, so what difference would it make?
EDIT: I have looked at the KB194124 link, and the registry values it recommends are the default values on my machine, which I have never changed. I have used the above method several times myself without problems. Maybe it's an environmental issue?

If you don't mind opening the file in Excel, take the columns that have the problem, select the column, and do
Data -> Text to Columns -> Next -> Next -> Text
Save the spreadsheet and they should all come in as Text in OPENROWSET
I've found using .CSV files instead of Excel, opened by setting up a Linked Server, and setting up the format of the files in schema.ini a more practical approach for handling imports like this, with that method you can explicitly choose each column's format.

We've come across the same issue. Unfortunately we've not found a solution either. There's more information here which indicates that there might be a registry fix.

I had the same problem. I fixed it cuting and pasting a row that contains a column with the string/numeric value (for example 123ABC) in the first row position of the sheet. For some reason T-SQL reads the first row and assumes that all the values are numeric.

Response by SqlACID in this link worked great [https://wikigurus.com/Article/Show/185717/OpenRowSet-command-in-TSQL-is-returning-NULLS] :-
If you don't mind opening the file in Excel, take the columns that have the problem, select the column, and do
Data -> Text to Columns -> Next -> Next -> Text
Save the spreadsheet and they should all come in as Text in OPENROWSET
I've found using .CSV files instead of Excel, opened by setting up a Linked Server, and setting up the format of the files in schema.ini a more practical approach for handling imports like this, with that method you can explicitly choose each column's format.