Postgres import csv duplicate keys - postgresql

I have a table called measurement with 3 columns: value, moment, seriesid.
51.02|2006-12-31 23:00:00|1
24.88|2006-12-31 23:00:00|2
55|2006-12-31 23:00:00|3
3.34823004011|2006-12-31 23:00:00|5
I am trying to load a csv in this table and I am getting the following error:
Key (moment, seriesid)=(2009-05-25 00:00:00,186) already exists.
After reading some posts here on StackOverflow, best I managed to do was this:
CREATE TEMP TABLE measurement_tmp AS SELECT * FROM measurement LIMIT 0;
COPY measurement_tmp FROM '/home/airquality/dat/import.csv'
WITH DELIMITER ',';
INSERT INTO measurement
SELECT DISTINCT ON (moment,seriesid)
value,moment,seriesid
FROM measurement_tmp
As far as I understand
1) A table measurement_tmp is created.
2) All contents of the measurement table are loaded in measeurement_tmp
3) All contents of the import.csv file are loaded in measurement_tmp without Key (moment, seriesid) restriction.
4) Selecting DISTINCT ON (moment, seriesid) should only return only 'sane' data and import them in measurement.
Still getting the same error,
2014-11-20 10:06:24 GMT-2 ERROR: duplicate key value violates unique constraint
"measurement_pkey"
2014-11-20 10:06:24 GMT-2 DETAIL: Key (moment, seriesid)=(2009-05-25 00:00:00,
186) already exists.
Any ideas?

Related

UPSERT from table with different table sizes

I'm getting the error:
ERROR: column "some_col_name" does not exist Hint: There is a column named "some_col_name" in table "usert_test", but it cannot be referenced from this part of the query.
On UPSERT The cause of this error is that the source table (read in from API) doesn't always have the same number of fields as the table I'm looking to UPSERT. Within the UPSERT process is there a way to handle this? So far I've tried the below:
INSERT INTO scratch."usert_test" (many_cols)
SELECT *
FROM scratch.daily_scraper
ON CONFLICT (same_unique_id)
DO UPDATE
SET
many_fields = excluded.many_fields;
Name each column specifically in every instance.
insert into scratch."usert_test" (column_name1, column_name2, column_name3,column_name3)
select cola, colb, colc, colf
from scratch.daily_scraper
on conflict (column_name1, column_name4)
do update
set
column_name3 = excluded.column_name3
, column_name2 = excluded.column_name2;
How ever many columns you have properly name every one. (IMHO) As you should always do.

Upserting into one Postgres table from another table?

I have two tables with identical structures. All columns are integers and are named "A" , "B" and "key".
I can insert from one table into another with some SQL like this:
INSERT INTO test_table_kewmfsznj
SELECT * FROM tmp_test_table_kewmfsznj_cnxtbkbq ta
But this doesn't work:
INSERT INTO test_table_kewmfsznj
SELECT * FROM tmp_test_table_kewmfsznj_cnxtbkbq ta
ON CONFLICT ("key")
DO NOTHING
My expectation was that this code would skip any row from "ta" where the key already exists in the table I'm inserting into. Instead I get this error:
ERROR: missing FROM-clause entry for table "ta"
Position: 152
Here's what I really want to do: When the key already exists in the table I'm inserting into, I want to update certain columns:
INSERT INTO test_table_kewmfsznj
SELECT * FROM tmp_test_table_kewmfsznj_cnxtbkbq ta
ON CONFLICT ("key")
DO UPDATE SET "A" = ta."A", "B" = ta."B"
Unfortunately this gives me (almost) the same error:
ERROR: missing FROM-clause entry for table "ta"
Position: 134
Can somebody explain what I am doing wrong here?
EDIT0: I tried it without upper case table names. The columns are now called "a", "b" and "key". The data remains unchanged - it's all integers.
INSERT INTO test_table_mrcvnaoia
SELECT * from tmp_test_table_mrcvnaoia_uuxkaidv ta
ON CONFLICT (key)
DO UPDATE SET a = ta.a, b = ta.b
... and now I get this error:
SQL Error [42P01]: ERROR: missing FROM-clause entry for table "ta"
Position: 129
To me, this suggests that there's something wrong with my ON CONFLICT statement, and probably not the first half of the query, but beyond that I'm out of clues. Can anybody help?
You almost had it, but you can't reference the table name, you reference EXCLUDED:
INSERT INTO test_table_mrcvnaoia
SELECT * from tmp_test_table_mrcvnaoia_uuxkaidv
ON CONFLICT (key)
DO UPDATE SET a = EXCLUDED.a, b = EXCLUDED.b;
Furthermore, to avoid errors in the future, make sure you explicitly specify the column names in the insert and select portions of your statement. They are the same for now, but that might not always be the case.

SQLDeveloper not able to handle empty dates?

I have a DB table export (.csv file generated using SQLDeveloper) that I need to import into another DB table.
The issue is that there are date columns that are nullable and these values are obviously exported as empty string. When I try to import that file then SQLDeveloper internally seems to generate an insert statement for each line since I get the below error message:
INSERT INTO <tablename> (<fieldnames here>) VALUES (... ,to_date('', 'DD.MM.RRRR HH24:MI:SS'), ...);
Error report -
ORA-01830: date format picture ends before converting entire input string
In that insert SQLDeveloper apparently tries to convert the empty string into a date using to_date(...) which then obviously yields an error.
Is there some workaround, so allows to import such dates as 'null' into the DB? After all: it should somehow be feasible to import .csv files again that were generated by SQLDeveloper, shouldn't it?
It's working for me.
Since you didn't provide table definition or sample data, I made up my own scenario. Compare what I did to what you're doing.
create table csv_null_dates (id integer, dates date);
insert into csv_null_dates values (1, sysdate);
insert into csv_null_dates values (2, sysdate-1);
insert into csv_null_dates values (3, sysdate+1);
insert into csv_null_dates values (4, null);
insert into csv_null_dates values (5, sysdate);
set sqlformat csv
cd c:\users\jdsmith\desktop
spool null_dates.csv
select * from csv_null_dates;
spool off
The output:
Table CSV_NULL_DATES created.
1 row inserted.
1 row inserted.
1 row inserted.
1 row inserted.
1 row inserted.
"ID","DATES"
1,26-SEP-19
2,25-SEP-19
3,27-SEP-19
4,
5,26-SEP-19
I then opened the table import wizard, and pointed to my CSV file:
I finished the wizard, import ran to completion, here's my log:
** Import Start ** at 2019.09.26-08.12.59
Import C:\Users\jdsmith\Desktop\null_dates.csv to HR.HR.CSV_NULL_DATES
Load Method: Insert
** Import End ** at 2019.09.26-08.13.00
And when I go to browse my table, I see that I have 2x the records I had before, including the row with the NULL date:

postgresql on conflict-cannot affect row a second time

I have a table, i have auto numbering/sequence on data_id
tabledata
---------
data_id [PK]
data_code [Unique]
data_desc
example code:
insert into tabledata(data_code,data_desc) values(Z01,'red')
on conflict (data_code) do update set data_desc=excluded.data_desc
works fine, and then i insert again
insert into tabledata(data_code,data_desc) values(Z01,'blue')
on conflict (data_code) do update set data_desc=excluded.data_desc
i got this error
[Err] ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
this is my real code
insert into psa_aso_branch(branch_code,branch_desc,regional_code,status,created_date,lastmodified_date)
(select branch_code, branch, kode_regional,
case when status_data='Y' then true
else false end, current_date, current_date
from branch_history) on conflict (branch_code) do
update set branch_desc = excluded.branch_desc, regional_code = excluded.regional_code,status = (case when excluded.status='Y' then true else false end), created_date=current_date, lastmodified_date=current_date;
working fine on first, but not the next one (like the example i give you before)
You can use update on the existing record/row, and not on row you are inserting.
Here update in on conflict clause applies to row in excluded table, which holds row temporarily.
In the first case record is inserted since there is no clash on data_code and update is not executed at all.
In the second insert you are inserting Z01 which is already inserted as data_code and data_code is unique.
The excluded table still holds the duplicate value of data_code after the update, so the record is not inserted. In update set data_code have to be changed in order to insert record properly.
I have been stuck on this issue for about 24 hours.
It is weird when I test the query on cli and it's works fine. It is working fine when I make an insertion using one data row. This errors only appear when I'm using insert-select.
It is not mostly because of insert-select problem. It is because the select rows is not unique. This will trigger the CONFLICT for more than once.
Thanks to #zivaricha comment. I experiment from his notes. Just that its hard to understand at first.
Solution:
Using distinct to make sure the select returns unique result.
This error comes when the duplicacy occurs multiple times in the single insertion
for example you have column a , b , c and combination of a and b is unique and on duplicate you are updating c.
Now suppose you already have a = 1 , b = 2 , c = 3 and you are inserting a = 1 b = 2 c = 4 and a = 1 b = 2 c = 4
so means conflict occurs twice so it cant update a row twice
I think what is happening here
when you do an update on conflict, it does an update that re conflicts again and then throws that error
We can find the error message from the source code, which we can simply understand why we got ON CONFLICT DO UPDATE command cannot affect row a second time.
In the source code of PostgreSQL at src/backend/executor/nodeModifyTable.c and the function of ExecOnConflictUpdate(), we can find this comment:
This can occur when a just inserted tuple is updated again in the same command. E.g. because multiple rows with the same conflicting key values are inserted.
This is somewhat similar to the ExecUpdate() TM_SelfModified case. We do not want to proceed because it would lead to the same row being updated a second time in some unspecified order, and in contrast to plain UPDATEs there's no historical behavior to break.
As the comment said, we can not update the row which we are inserting in INSERT ... ON CONFLICT, just like:
postgres=# CREATE TABLE t (id int primary key, name varchar);
postgres=# INSERT INTO t VALUES (1, 'smart'), (1, 'keyerror')
postgres=# ON CONFLICT (id) DO UPDATE SET name = 'Buuuuuz';
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
Remember, the executor of postgresql is a volcano model, so it will process the data we insert one by one. When we process to (1, 'smart'), since the table is empty, we can insert normally. When we get to (1, 'keyerror'), there is a conflict with the (1, 'smart') we just inserted, so the update logic is executed, which results in updating our own inserted data, which PostgreSQL doesn't allow us to do.
Similarly, we cannot update the same row of data twice:
postgres=# DROP TABLE IF EXISTS t;
postgres=# CREATE TABLE t (id int primary key, name varchar);
postgres=# INSERT INTO t VALUES (1, 'keyerror'), (1, 'buuuuz')
postgres=# ON CONFLICT (id) DO UPDATE SET name = 'Buuuuuuuuuz';
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

How to properly use dblink_build_sql_insert (postgreSQL)

I cannot find an example of how it's supposed to work for a table with only one PK field.
My attempt looks something like this:
CREATE EXTENSION IF NOT EXISTS dblink;
select dblink_build_sql_insert('table_name'::text, '1'::int2vector, 1::int2, '{"12345"}'::text[], '{"column1", "column2", "column3", "column4"}'::text[]);
It keeps trowing the error "target key array length must match number of key attributes". As I see it I told him that the number of key attributes is 1 and the target key array length has 1 item. What am I doing wrong?
If I read the examples right, I think you need to do something like
select dblink_build_sql_insert(
'table_name'::text,
'1'::int2vector,
1::int2, -- num of pkey values
'{"12345"}'::text[], -- old pkey
'{"column1"}'::text[] -- new pkey
);