Upsert error (On Conflict Do Update) pointing to duplicate constrained values - postgresql

I have a problem with ON CONFLICT DO UPDATE in Postgres 9.5 when I try to use more than one source in the FROM statement.
Example of working code:
INSERT INTO new.bookmonographs (citavi_id, abstract, createdon, edition, title, year)
SELECT "ID", "Abstract", "CreatedOn"::timestamp, "Edition", "Title", "Year"
FROM old."Reference"
WHERE old."Reference"."ReferenceType" = 'Book'
AND old."Reference"."Year" IS NOT NULL
AND old."Reference"."Title" IS NOT NULL
ON CONFLICT (citavi_id) DO UPDATE
SET (abstract, createdon, edition, title, year) = (excluded.abstract, excluded.createdon, excluded.edition, excluded.title, excluded.year)
;
Faulty code:
INSERT INTO new.bookmonographs (citavi_id, abstract, createdon, edition, title, year)
SELECT "ID", "Abstract", "CreatedOn"::timestamp, "Edition", "Title", "Year"
FROM old."Reference", old."ReferenceAuthor"
WHERE old."Reference"."ReferenceType" = 'Book'
AND old."Reference"."Year" IS NOT NULL
AND old."Reference"."Title" IS NOT NULL
AND old."ReferenceAuthor"."ReferenceID" = old."Reference"."ID"
--Year, Title and Author must be present in the data, otherwise the entry is deemed useless, hence won't be included
ON CONFLICT (citavi_id) DO UPDATE
SET (abstract, createdon, edition, title, year) = (excluded.abstract, excluded.createdon, excluded.edition, excluded.title, excluded.year)
;
I added an additional source in the FROM statement and one more WHERE statement to make sure only entries that have a title, year and author are inserted into the new database. (If old."Reference"."ID" exists in old."ReferenceAuthor" as "ReferenceID", then an author exists.) Even without the additional WHERE statement the query is faulty. The columns I specified in SELECT are only present in old."Reference", not in old."ReferenceAuthor".
Currently old."ReferenceAuthor" and old."Reference" don't have a UNIQUE CONSTRAINT,the uniqe constraints for bookmonographs are:
CONSTRAINT bookmonographs_pk PRIMARY KEY (bookmonographsid),
CONSTRAINT bookmonographs_bookseries FOREIGN KEY (bookseriesid)
REFERENCES new.bookseries (bookseriesid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT bookmonographs_citaviid_unique UNIQUE (citavi_id)
The error PSQL throws:
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
********** Error **********
ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
SQL state: 21000
Hint: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
I don't know what's wrong, or why the hint points to a duplicated constrained value.

The problem is caused by the fact that apparently some entries have multiple authors. So the inner join in the select query that you wrote will return multiple rows for the same entry and INSERT ... ON CONFLICT doesn't like that. Since you only use the ReferenceAuthor table for filtering, you can simply rewrite the query so that it uses that table to only filter entries that don't have any author by doing an exists on a correlated subquery. Here's how:
INSERT INTO new.bookmonographs (citavi_id, abstract, createdon, edition, title, year)
SELECT "ID", "Abstract", "CreatedOn"::timestamp, "Edition", "Title", "Year"
FROM old."Reference"
WHERE old."Reference"."ReferenceType" = 'Book'
AND old."Reference"."Year" IS NOT NULL
AND old."Reference"."Title" IS NOT NULL
AND exists(SELECT FROM old."ReferenceAuthor" WHERE old."ReferenceAuthor"."ReferenceID" = old."Reference"."ID")
--Year, Title and Author must be present in the data, otherwise the entry is deemed useless, hence won't be included
ON CONFLICT (citavi_id) DO UPDATE
SET (abstract, createdon, edition, title, year) = (excluded.abstract, excluded.createdon, excluded.edition, excluded.title, excluded.year)
;

Use an explicit INNER JOIN to join the two source tables together:
INSERT INTO new.bookmonographs (citavi_id, abstract, createdon, edition, title, year)
SELECT "ID", "Abstract", "CreatedOn"::timestamp, "Edition", "Title", "Year"
FROM old."Reference"
INNER JOIN old."ReferenceAuthor" -- explicit join
ON old."ReferenceAuthor"."ReferenceID" = old."Reference"."ID" -- ON condition
WHERE old."Reference"."ReferenceType" = 'Book' AND
old."Reference"."Year" IS NOT NULL AND
old."Reference"."Title" IS NOT NULL
ON CONFLICT (citavi_id) DO UPDATE
SET (abstract, createdon, edition, title, year) =
(excluded.abstract, excluded.createdon, excluded.edition, excluded.title,
excluded.year)

There's a great explanation of the issue in postgres' docs (ctrl + f: "Cardinality violation" errors in detail, as there's no direct link).
To quote from the docs:
The idea of raising "cardinality violation" errors is to ensure that any one row is affected no more than once per statement executed. In the lexicon of the SQL standard's discussion of SQL MERGE, the SQL statement is "deterministic". The user ought to be confident that a row will not be affected more than once - if that isn't the case, then it isn't predictable what the final value of a row affected multiple times will be.
To replay their simpler example, on table upsert the below query could not work, as we couldn't reliably know if select val from upsert where key = 1 was equal to 'Foo' or 'Bar':
INSERT INTO upsert(key, val)
VALUES(1, 'Foo'), (1, 'Bar')
ON CONFLICT (key) UPDATE SET val = EXCLUDED.val;
ERROR: 21000: ON CONFLICT UPDATE command could not lock/update self-inserted tuple
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

Related

How can I restrict a result to only include rows where one specific field is unique with UNION Select statement in BigQuery?

I have the following code. I try to stitch the two tables together, but restrict it to only add duplicate Opportunity_ID once, and then from the second table (OpportunitiesUpdates).
SELECT
Opportunity.Account_Name,
Opportunity.Opportunity_Name,
Opportunity.Opportunity_Owner,
Opportunity.Opportunity_ID
FROM
Opportunity
UNION DISTINCT
SELECT
OpportunityUpdates.Account_Name,
OpportunityUpdates.Opportunity_Name,
OpportunityUpdates.Opportunity_Owner,
OpportunityUpdates.Opportunity_ID
FROM
OpportunityUpdates
WHERE OpportunityUpdates.Opportunity_ID <> Opportunity.Opportunity_ID
This code consolidates all records from both tables (by Opportunity_ID) and gives priority to the OpportunityUpdates table based on Opportunity_ID.
It assumes that the same Opportunity_ID could be in either table ("duplicates"), but that within each table an Opportunity_ID is unique. It also assumes that Opportunity_ID is not nullable (never null).
SELECT DISTINCT
IF(ou.Opportunity_ID IS NOT NULL, ou.Account_Name, o.Account_Name) Account_Name,
IF(ou.Opportunity_ID IS NOT NULL, ou.Opportunity_Name, o.Opportunity_Name) Opportunity_Name,
IF(ou.Opportunity_ID IS NOT NULL, ou.Opportunity_Owner, o.Opportunity_Owner) Opportunity_Owner,
COALESCE(ou.Opportunity_ID, o.Opportunity_ID) Opportunity_ID
FROM OpportunityUpdates ou
FULL OUTER JOIN
Opportunity o
ON o.Opportunity_ID = ou.Opportunity_ID

Computed table column with MAX value between rows containing a shared value

I have the following table
CREATE TABLE T2
( ID_T2 integer NOT NULL PRIMARY KEY,
FK_T1 integer, <--- foreign key to T1(Table1)
FK_DATE date, <--- foreign key to T1(Table1)
T2_DATE date, <--- user input field
T2_MAX_DIFF COMPUTED BY ( (SELECT DATEDIFF (day, MAX(T2_DATE), CURRENT_DATE) FROM T2 GROUP BY FK_T1) )
);
I want T2_MAX_DIFF to display the number of days since last input across all similar entries with a common FK_T1.
It does work, but if another FK_T1 values is added to the table, I'm getting an error about "multiple rows in singleton select".
I'm assuming that I need some sort of WHERE FK_T1 = FK_T1 of corresponding row. Is it possible to add this? I'm using Firebird 3.0.7 with flamerobin.
The error "multiple rows in singleton select" means that a query that should provide a single scalar value produced multiple rows. And that is not unexpected for a query with GROUP BY FK_T1, as it will produce a row per FK_T1 value.
To fix this, you need to use a correlated sub-query by doing the following:
Alias the table in the subquery to disambiguate it from the table itself
Add a where clause, making sure to use the aliased table (e.g. src, and src.FK_T1), and explicitly reference the table itself for the other side of the comparison (e.g. T2.FK_T1)
(optional) remove the GROUP BY clause because it is not necessary given the WHERE clause. However, leaving the GROUP BY in place may uncover certain types of errors.
The resulting subquery then becomes:
(SELECT DATEDIFF (day, MAX(src.T2_DATE), CURRENT_DATE)
FROM T2 src
WHERE src.FK_T1 = T2.FK_T1
GROUP BY src.FK_T1)
Notice the alias src for the table referenced in the subquery, the use of src.FK_T1 in the condition, and the explicit use of the table in T2.FK_T1 to reference the column of the current row of the table itself. If you'd use src.FK_T1 = FK_T1, it would compare with the FK_T1 column of src (as if you'd used src.FK_T1 = src.FK_T2), so that would always be true.
CREATE TABLE T2
( ID_T2 integer NOT NULL PRIMARY KEY,
FK_T1 integer,
FK_DATE date,
T2_DATE date,
T2_MAX_DIFF COMPUTED BY ( (
SELECT DATEDIFF (day, MAX(src.T2_DATE), CURRENT_DATE)
FROM T2 src
WHERE src.FK_T1 = T2.FK_T1
GROUP BY src.FK_T1) )
);

Why postgresql encount duplicate key when key not exists?

When I am inserting data into Postgresql(9.6),throw this error:
ERROR: duplicate key value violates unique constraint "book_intial_name_isbn_isbn10_key"
DETAIL: Key (name, isbn, isbn10)=(三銃士, , ) already exists.
SQL state: 23505
I add uniq constraint on columns name, isbn, isbn10.But when I check the distination table,it does not contains the record:
select * from public.book where name like '%三銃%';
How to fix?This is my insert sql:
insert into public.book
select *
from public.book_backup20190405 legacy
where legacy."name" not in
(
select name
from public.book
)
limit 1000
An educated guess, there may be more than one row in the source table book_backup20190405 which has the unique key tuple ('三銃', '', '').
Since the bulk INSERT INTO ... SELECT ... will be be transactional, you'll be none the wiser to the error, since all data will have been rolled back when the constraint fails.
You can verify this by running a dupe check on the source table:
SELECT name, isbn, isbn10, COUNT(*)
FROM public.book_backup20190405
WHERE name = '三銃'
GROUP BY name, isbn, isbn10
HAVING COUNT(*) > 1;
To see if there are duplicates.
Here's an example of how the source table can be the sole source of duplicates:
http://sqlfiddle.com/#!17/29ba3

Inserting records into table1 depending on row value in table2

For each row in table exam 'where exam.examRegulation isnull', I want to insert one corresponding row in table examRegulation and copy columnvalues from exam to examregulation. Apparently the following query ist too naive and must be approved:
insert into examRegulation (graduation, course, examnumber, examversion)
values (exam.graduation, exam.course, exam.examnumber, exam.examversion)
where ?? (select graduation, course, examnumber, examversion
from exam
where exam.examRegulation isnull)
Is there a way to do this in postgresql?
You may rephrase this as an INSERT INTO ... SELECT statement:
INSERT INTO examRegulation (graduation, course, examnumber, examversion)
SELECT graduation, course, examnumber, examversion
FROM exam
WHERE examRegulation IS NULL;
The VALUES clause, as the name implies, can only be used with literal values. If you need to populate an insert using query logic, then you need to use a SELECT clause.

Using column aliases in derived selects or views breaks simple select queries

I have a genuine use-case which requires referring to column aliases in a "where" clause. I'm trying to use the techniques outlined here, which I expect to work in Sybase and MySQL but don't seem to work in either H2 or HSQLDB:
http://databases.aspfaq.com/database/how-do-i-use-a-select-list-alias-in-the-where-or-group-by-clause.html
If you'd be kind enough to try and recreate my issue, here's how you can do it:
create table CUSTOMER (code varchar(255), description varchar(255), active bit, accountOpeningDate date, currentBalance numeric(20,6), currentBalanceDigits int)
insert into CUSTOMER (code, description, active, accountOpeningDate, currentBalance, currentBalanceDigits) values ('BMW', 'BMW Motors', 0, '2011-01-01', 345.66, 2)
insert into CUSTOMER (code, description, active, accountOpeningDate, currentBalance, currentBalanceDigits) values ('MERC', 'Mercedes Motors', 1, '2012-02-02', 14032, 0)
Then, this SQL query fails:
select nest.* from (
select CODE "id", DESCRIPTION "description",
ACTIVE "active",
accountOpeningDate "accountOpeningDate",
currentBalance "currentBalance"
from customer
) as nest
where nest.id = 'BMW'
It's fine if you strip of the "where nest.id = 'BMW'". However, trying to use any of the aliases in either the where clause or the select clause (nest.id rather than next.*) then the query fails. Error code is Column "NEST.ID" not found; ... [42122-167] 42S22/42122
The same failure occurs if you try and create a view with aliased column names then try and select from the view. For example:
create view customer_view as
select CODE "id", DESCRIPTION "description",
ACTIVE "active",
accountOpeningDate "accountOpeningDate",
currentBalance "currentBalance"
from customer
Then:
select id from customer_view
The problem is the mixed usage of unquoted and quoted identifiers. According to the SQL specification, unquoted identifiers (such as id) are case insensitive, and the database might convert them to uppercase or lowercase. Quotes identifiers (such as "id") are case sensitive, and the database engine must not convert the identifier.
H2 converts unquoted identifiers to uppercase (like other database engines such as Oracle). In your query, you have used both quoted and unquoted identifiers. Simplied test case (fails for H2 and other databases):
select * from (select 1 "id") where id = 1
To solve the problem, you need to use either quoted identifiers everywhere, or unquoted identifiers:
select * from (select 1 id) where id = 1
or
select * from (select 1 "id") where "id" = 1