Ignore duplicates when importing from CSV - postgresql

I'm using PostgreSQL database, after I've created my table I have to populate them with a CSV file. However the CSV file is corrupted and it violates the primary key rule and so the database is throwing an error and I'm unable to populate the table. Any ideas how to tell the database to ignore the duplicates when importing from CSV? Writing a script to remove them from the CSV file is no acceptable. Any workarounds are welcome too. Thank you! : )

On postgreSQL, duplicate rows are not permitted if they violate a unique constraint.
I think that your best option, is to import your CSV file on to temp table that has no constraint, delete from it duplicate values, and finally import from this temp table to your final table.

Related

prisma import fails to import relation tables with error: Failure inserting into relationtable

When trying to import data into a postgres database with prisma CLI's prisma import --data export.<DATE>.zip it fails with an error for all relations in the postgresql database:
I have to run prisma deploy first to make sure the schema is updated but when it imports the data it is trying to import the relations twice which is violating some sort of duplicate constraint.
"Failure inserting into relationtable _ExampleForOtherExample with ids StringIdGCValue(ckamscvpi0eo00702ep0c7log) and StringIdGCValue(cka2n025p03ri0766hnoyxf8s). Cause: insert or update on table \"_ExampleForOtherExample\" violates foreign key econstraint \"_ExampleForOtherExample_A_fkey\"\n Detail: Key (A)=(cjvba054700dz07236tafuscj) is not present in table \"Example\".",
"Failure inserting into relationtable _ThingToOtherThing with ids StringIdGCValue(ckbgyi96h0kl1079500z24pwu) and StringIdGCValue(ckbkvgde606s50855g62uhqsb). Cause: duplicate key value violates unique constraint \"_ThingToOtherThing_AB_unique\"\n Detail: Key (\"A\", \"B\")=(ck3kbmjgl036x0788furaqxkg, ck6uvgy7o09p40723rw34tna1) already exists.",
I'm not sure why it imports everything except for relations.
(prisma version 1.30.5)
You have foreign key dependencies between the tables, and your imported data don't satisfy the constraints.
Either import the data in an order that doesn't violate the constraints, or drop the constraints in the database and create them again after the import.
You could dump all the constraints with
pg_dump --section=post-data databasename
tldr;
prisma1 export creates duplicate relation table row data in their export data that are not present in the origin data. The error can be solved by removing those duplicates.
More Details
I found out that when I use the prisma1 export command, it creates duplicate relation table rows in its exported json files. My source database did not have these duplicates but they get introduced into the exported files.
I created a simple node script that parsed all the JSON files and printed out all the duplicate relation row column A IDs.
With this data I carefully removed all the duplicates prisma1 export command created in my data. Once I did this, this error went away and I was able to use prisma1 import without issue.
I had luck importing data after removing the .import/ folder
https://github.com/prisma/prisma1/issues/4216#issuecomment-513104926

copy csv postgres ignore rows that violate constraints

I have a .csv file with ~300,000 rows, some of which violate certain constraints I set in my postgres database. Is there a way to copy my .csv file into the database and have postgres filter out the rows that violate the constraints? I do not want these rows to show up in the database.
If this is not possible, is there any other way to solve this problem?
what I'm doing right now is
COPY blocksequences from '/tmp/blocksequences.csv CSV HEADER;
And I get
'ERROR: new row for relation "blocksequences" violates check constraint "blocksequences_partid3_check"
DETAIL: Failing row contains (M001-M049-S186, M001, null, M049, S186).
CONTEXT: COPY blocksequences, line 680: "M001-M049-S186,M001,,M049,S186"
reason for the error: column that contains M049 is not allowed to have that string entered. Many other rows have violations like this.
I read a little about exception when check violation --do nothing am I on the right track here? seems like it's only a mysql thing maybe
Usually this is done in this way:
create a temporary table with the same structure as the destination one but without constraints,
copy data to the temporary table with COPY command,
copy rows that do fulfill constraints from temp table to the destination one, using INSERT command with conditions in the WHERE clause based on the table constraint,
drop the temporary table.
When dealing with really large CSV files or very limited server resources, use the extension file_fdw instead of temporary tables. It's much more efficient way but it requires server access to a CSV file (while copying to a temporary table can be done over the network).
In Postgres 12 you can use the WHERE clause in COPY FROM.

How to avoid OIDs column from table in PostgreSQL?

I am using PostgreSQL 9.6. I have created a table with create query.
But when i checked in left panel of pgAdmin, under table i found more six columns named tableid,cmax,xmax,cmin,xmin and ctid.
When i searched about this, I found that these are OIDs column and does not affect to data on other columns.
I have to import data into this table. So after selecting table, from right click i got option for import/Export. So from that i am importing .csv file.
But when i tried to import the data in table, i am getting error like,
ERROR: column "tableoid" of relation "account" does not exist
Please suggest me how to eliminate these OID columns from table.
You must be missing some column that is present in the csv named "tableoid".
In this case ,TABLE according to the import file must be created first. IF there is no prior table , it wont work. This may help.
http://www.postgresqltutorial.com/import-csv-file-into-posgresql-table/

Can I import CSV data into a table without knowing the columns of the CSV?

I have a CSV file file.csv.
In Postgres, I have made a table named grants:
CREATE TABLE grants
(
)
WITH (
OIDS=FALSE
);
ALTER TABLE grants
OWNER TO postgres;
I want to import file.csv data without having to specify columns in Postgres.
But if I run COPY grants FROM '/PATH/TO/grants.csv' CSV HEADER;, I get this error: ERROR: extra data after last expected column.
How do I import the CSV data without having to specify columns and types?
The error is normal.
You created a table with no column. The COPY command try to import data into the table with the good structure.
So you have to create the table corresponding to your csv file before execute the COPY command.
I discovered pgfutter :
"Import CSV and JSON into PostgreSQL the easy way. This small tool abstract all the hassles and swearing you normally have to deal with when you just want to dump some data into the database"
Perhaps a solution ...
The best method for me was to convert the csv to dataframe and then follow
https://github.com/sp-anna-jones/data_science/wiki/Importing-pandas-dataframe-to-postgres
No, it is not possible using the COPY command
If a list of columns is specified, COPY will only copy the data in the
specified columns to or from the file. If there are any columns in the
table that are not in the column list, COPY FROM will insert the
default values for those columns.
COPY does not create columns for you.

"Invalid Input Syntax for Integer" in pgAdmin

I'm migrating data into Postgresql. I can generate my data into CSV or tab-delimited files, and I'm trying to import these files using pgAdmin.
An example CSV file looks exactly like this:
86,72,1,test
72,64,1,another test
The table I'm importing into looks like this:
CREATE TABLE common.category
(
id integer NOT NULL,
parent integer,
user_id integer,
name character varying(128),
CONSTRAINT category_pkey PRIMARY KEY (id),
CONSTRAINT category_parent_fkey FOREIGN KEY (parent)
REFERENCES common.category (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
)
However, upon importing this example, pgAdmin complains about an Invalid Input Syntax for Integer: "86" on the first line.
What am I missing here? I've tried performing the same import using a tab-delimited file, I've tried converting to both Windows and Unix EOLs.
Your sample have dependencies in the order of data imported. There is a foreign key 'parent' referencing 'id'. Having id 64 already in table, changing the order of your sample lines it imports just fine with:
COPY common.category
FROM 'd:\temp\importme.txt'
WITH CSV
I came across the same problem. After 2 hours of google, this did solve it. I just re-added the first line of the csv file, and every thing goes well now.
I had the same error after creating a new text file in Windows Explorer and changing the file extension to .csv.
I copied columns from an existing CSV file in Excel to the new one, also in Excel. After reading #Litty's comment about it not being tab-delimited it made me wonder if that was my problem.
Sure enough, opening the files in Excel hid the tab delimiting. When I opened it in Notepad++ it was obvious. I had to Export->Change File Type->CSV (Comma delimited) before I could import the file using pgAdmin as a default CSV file.