PostgreSQL 9.5 UPSERT in rule - postgresql

I have an INSERT rule in an updatable view system, for which I would like to realize an UPSERT, such as :
CREATE OR REPLACE RULE _insert AS
ON INSERT TO vue_pays_gex.bals
DO INSTEAD (
INSERT INTO geo_pays_gex.voie(name, code, district) VALUES (new.name, new.code, new.district)
ON CONFLICT DO NOTHING;
But my since there can be many different combinations of these three columns, I don't think I can set a CONSTRAINT including them all (although I may be missing a point of understanding in the SQL logics), hence nullifying the ON CONFLIT DO NOTHING part.
The ideal solution would seem to be the use of an EXCEPT, but it only works in an INSERT INTO SELECT statement. Is there a way to use an INSERT INTO SELECT statement referring to the newly inserted row? Something like FROM new.bals (in my case)?
If not I could imagine a WHERE NOT EXISTS condition, but the same problem than before arises.
I'm guessing it is a rather common SQL need, but cannot find how to solve it. Any idea?
EDIT :
As requested, here is the table definition :
CREATE TABLE geo_pays_gex.voie
(
id_voie serial NOT NULL,
name character varying(50),
code character varying(15),
district character varying(50),
CONSTRAINT prk_constraint_voie PRIMARY KEY (id_voie),
CONSTRAINT voie_unique_key UNIQUE (name, code, district)
);

How do you define uniqueness? If it is the combination of name + code + district, then just add a constraint UNIQUE(name, code, district) on the table geo_pays_gex.voie. The 3, together, must be unique... but you can have several time the same name, or code, or district.
See it at http://rextester.com/EWR73154
EDIT ***
Since you can have Nulls and want to treat them as a unique value, you can replace the constraint creation by a unique index that replace the nulls
CREATE UNIQUE INDEX
voie_uniq ON voie
(COALESCE(name,''), code, COALESCE(district,''));

In addition to #JGH's answer.
INSERT in rule for INSERT will lead to infinity recursion (Postgres 9.6).
Full (NOT)runnable example:
CREATE SCHEMA ttest;
CREATE TABLE ttest.table_1 (
id bigserial
CONSTRAINT pk_table_1 PRIMARY KEY,
col_1 text,
col_2 text
);
CREATE OR REPLACE RULE table_1_always_upsert AS
ON INSERT TO ttest.table_1
DO INSTEAD (
INSERT INTO ttest.table_1(id, col_1, col_2)
VALUES (new.id, new.col_1, new.col_2)
ON CONFLICT ON CONSTRAINT pk_table_1
DO UPDATE
SET col_1 = new.col_1,
col_2 = new.col_2
);
INSERT INTO ttest.table_1(id, col_1, col_2) -- will result error: infinity recursion in rules
VALUES (1, 'One', 'A'),
(2, 'Two', 'B');
INSERT INTO ttest.table_1(id, col_1, col_2)
VALUES (1, 'One_updated', 'A_updated'),
(2, 'Two_updated', 'B_updated'),
(3, 'Three_inserted', 'C_inserted');
SELECT *
FROM ttest.table_1;

Related

postgresql DO UPDATE ON CONFLICT with multiple constraints [duplicate]

I have two columns in table col1, col2, they both are unique indexed (col1 is unique and so is col2).
I need at insert into this table, use ON CONFLICT syntax and update other columns, but I can't use both column in conflict_targetclause.
It works:
INSERT INTO table
...
ON CONFLICT ( col1 )
DO UPDATE
SET
-- update needed columns here
But how to do this for several columns, something like this:
...
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
....
ON CONFLICT requires a unique index* to do the conflict detection. So you just need to create a unique index on both columns:
t=# create table t (id integer, a text, b text);
CREATE TABLE
t=# create unique index idx_t_id_a on t (id, a);
CREATE INDEX
t=# insert into t values (1, 'a', 'foo');
INSERT 0 1
t=# insert into t values (1, 'a', 'bar') on conflict (id, a) do update set b = 'bar';
INSERT 0 1
t=# select * from t;
id | a | b
----+---+-----
1 | a | bar
* In addition to unique indexes, you can also use exclusion constraints. These are a bit more general than unique constraints. Suppose your table had columns for id and valid_time (and valid_time is a tsrange), and you wanted to allow duplicate ids, but not for overlapping time periods. A unique constraint won't help you, but with an exclusion constraint you can say "exclude new records if their id equals an old id and also their valid_time overlaps its valid_time."
A sample table and data
CREATE TABLE dupes(col1 int primary key, col2 int, col3 text,
CONSTRAINT col2_unique UNIQUE (col2)
);
INSERT INTO dupes values(1,1,'a'),(2,2,'b');
Reproducing the problem
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this Q1. The result is
ERROR: duplicate key value violates unique constraint "col2_unique"
DETAIL: Key (col2)=(2) already exists.
What the documentation says
conflict_target can perform unique index inference. When performing
inference, it consists of one or more index_column_name columns and/or
index_expression expressions, and an optional index_predicate. All
table_name unique indexes that, without regard to order, contain
exactly the conflict_target-specified columns/expressions are inferred
(chosen) as arbiter indexes. If an index_predicate is specified, it
must, as a further requirement for inference, satisfy arbiter indexes.
This gives the impression that the following query should work, but it does not because it would actually require a together unique index on col1 and col2. However such an index would not guarantee that col1 and col2 would be unique individually which is one of the OP's requirements.
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1,col2) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this query Q2 (this fails with a syntax error)
Why?
Postgresql behaves this way is because what should happen when a conflict occurs on the second column is not well defined. There are number of possibilities. For example in the above Q1 query, should postgresql update col1 when there is a conflict on col2? But what if that leads to another conflict on col1? how is postgresql expected to handle that?
A solution
A solution is to combine ON CONFLICT with old fashioned UPSERT.
CREATE OR REPLACE FUNCTION merge_db(key1 INT, key2 INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
UPDATE dupes SET col3 = data WHERE col1 = key1 and col2 = key2;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently, or key2
-- already exists in col2,
-- we could get a unique-key failure
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col1) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col2) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the UPDATE again.
END;
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
You would need to modify the logic of this stored function so that it updates the columns exactly the way you want it to. Invoke it like
SELECT merge_db(3,2,'c');
SELECT merge_db(1,2,'d');
In nowadays is (seems) impossible. Neither the last version of the ON CONFLICT syntax permits to repeat the clause, nor with CTE is possible: not is possible to breack the INSERT from ON CONFLICT to add more conflict-targets.
If you are using postgres 9.5, you can use the EXCLUDED space.
Example taken from What's new in PostgreSQL 9.5:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Vlad got the right idea.
First you have to create a table unique constraint on the columns col1, col2 Then once you do that you can do the following:
INSERT INTO dupes values(3,2,'c')
ON CONFLICT ON CONSTRAINT dupes_pkey
DO UPDATE SET col3 = 'c', col2 = 2
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
works fine. but you should not update col1, col2 in the SET section.
Create a constraint (foreign index, for example).
OR/AND
Look at existing constraints (\d in psq).
Use ON CONSTRAINT(constraint_name) in the INSERT clause.
You can typically (I would think) generate a statement with only one on conflict that specifies the one and only constraint that is of relevance, for the thing you are inserting.
Because typically, only one constraint is the "relevant" one, at a time. (If many, then I'm wondering if something is weird / oddly-designed, hmm.)
Example:
(License: Not CC0, only CC-By)
// there're these unique constraints:
// unique (site_id, people_id, page_id)
// unique (site_id, people_id, pages_in_whole_site)
// unique (site_id, people_id, pages_in_category_id)
// and only *one* of page-id, category-id, whole-site-true/false
// can be specified. So only one constraint is "active", at a time.
val thingColumnName = thingColumnName(notfificationPreference)
val insertStatement = s"""
insert into page_notf_prefs (
site_id,
people_id,
notf_level,
page_id,
pages_in_whole_site,
pages_in_category_id)
values (?, ?, ?, ?, ?, ?)
-- There can be only one on-conflict clause.
on conflict (site_id, people_id, $thingColumnName) <—— look
do update set
notf_level = excluded.notf_level
"""
val values = List(
siteId.asAnyRef,
notfPref.peopleId.asAnyRef,
notfPref.notfLevel.toInt.asAnyRef,
// Only one of these is non-null:
notfPref.pageId.orNullVarchar,
if (notfPref.wholeSite) true.asAnyRef else NullBoolean,
notfPref.pagesInCategoryId.orNullInt)
runUpdateSingleRow(insertStatement, values)
And:
private def thingColumnName(notfPref: PageNotfPref): String =
if (notfPref.pageId.isDefined)
"page_id"
else if (notfPref.pagesInCategoryId.isDefined)
"pages_in_category_id"
else if (notfPref.wholeSite)
"pages_in_whole_site"
else
die("TyE2ABK057")
The on conflict clause is dynamically generated, depending on what I'm trying to do. If I'm inserting a notification preference, for a page — then there can be a unique conflict, on the site_id, people_id, page_id constraint. And if I'm configuring notification prefs, for a category — then instead I know that the constraint that can get violated, is site_id, people_id, category_id.
So I can, and fairly likely you too, in your case?, generate the correct on conflict (... columns ), because I know what I want to do, and then I know which single one of the many unique constraints, is the one that can get violated.
Kind of hacky but I solved this by concatenating the two values from col1 and col2 into a new column, col3 (kind of like an index of the two) and compared against that. This only works if you need it to match BOTH col1 and col2.
INSERT INTO table
...
ON CONFLICT ( col3 )
DO UPDATE
SET
-- update needed columns here
Where col3 = the concatenation of the values from col1 and col2.
I get I am late to the party but for the people looking for answers I found this:
here
INSERT INTO tbl_Employee
VALUES (6,'Noor')
ON CONFLICT (EmpID,EmpName)
DO NOTHING;
ON CONFLICT is very clumsy solution, run
UPDATE dupes SET key1=$1, key2=$2 where key3=$3
if rowcount > 0
INSERT dupes (key1, key2, key3) values ($1,$2,$3);
works on Oracle, Postgres and all other database

UPSERT - INSERT ... ON CONFLICT fails with a function based index as 'lower()' unique constraint

I have a database table with a code column that uses a lowercase index to prevent code values that only differ in case (e.g. 'XYZ' = 'xYZ' = 'xyz'). The typical way in Postgresql is to create a function based index, like this: CREATE UNIQUE INDEX mytable_lower_code_idx ON mytable (lower(code)).
Now I have a case where I need upsert behaviour on that column:
-- first insert
INSERT INTO mytable (code) VALUES ('abcd');
-- second insert, with upsert behaviour
INSERT INTO mytable (code) VALUES ('Abcd')
ON CONFLICT (code) DO UPDATE
SET code='Abcd';
For the second insert I get a unique key violation: ERROR: duplicate key value violates unique constraint "mytable_lower_code_idx"
(I also tried to use ON CONFLICT ON CONSTRAINT mytable_lower_code_idx but Postgresql tells me that this constraint does not exist so maybe it doesn't treat the index as a constraint.)
My final question: Is there any way to make INSERT ... ON CONFLICT work together with indexes on expressions? Or must I introduce a physical indexed lowercase column to accomplish the task?
Use ON CONFLICT (lower(code)) DO UPDATE:
CREATE TABLE mytable (
code text
);
CREATE UNIQUE INDEX mytable_lower_code_idx ON mytable (lower(code));
INSERT INTO mytable VALUES ('abcd');
INSERT INTO mytable (code) VALUES ('Abcd')
ON CONFLICT (lower(code)) DO UPDATE
SET code='Abcd';
SELECT * FROM mytable;
yields
| code |
|------|
| Abcd |
Note that ON CONFLICT syntax
allows for the conflict target to be an index_expression (my emphasis):
ON CONFLICT conflict_target
where conflict_target can be one of:
( { index_column_name | ( index_expression ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ]
ON CONSTRAINT constraint_name
and index_expression:
Similar to index_column_name, but used to infer expressions on
table_name columns appearing within index definitions (not simple
columns). Follows CREATE INDEX format.
Try to add your index as follow:
CREATE UNIQUE INDEX CONCURRENTLY mytable_lower_code_idx
ON mytable (lower(code));
ALTER TABLE mytable
ADD CONSTRAINT unique_mytab_code
UNIQUE USING INDEX mytable_lower_code_idx ;
and then:
INSERT INTO mytable (code) VALUES ('abcd');
-- second insert, with upsert behaviour
INSERT INTO mytable (code) VALUES ('Abcd')
ON CONFLICT ON CONSTRAINT unique_mytab_code DO UPDATE
SET code='Abcd';

sql drop primary key from temp table

I want to create e temp table using select into syntax. Like:
select top 0 * into #AffectedRecord from MyTable
Mytable has a primary key. When I insert record using merge into syntax primary key be a problem. How could I drop pk constraint from temp table
The "SELECT TOP (0) INTO.." trick is clever but my recommendation is to script out the table yourself for reasons just like this. SELECT INTO when you're actually bringing in data, on the other hand, is often faster than creating the table and doing the insert. Especially on 2014+ systems.
The existence of a primary key has nothing to do with your problem. Key Constraints and indexes don't get created when using SELECT INTO from another table, the data type and NULLability does. Consider the following code and note my comments:
USE tempdb -- a good place for testing on non-prod servers.
GO
IF OBJECT_ID('dbo.t1') IS NOT NULL DROP TABLE dbo.t1;
IF OBJECT_ID('dbo.t2') IS NOT NULL DROP TABLE dbo.t2;
GO
CREATE TABLE dbo.t1
(
id int identity primary key clustered,
col1 varchar(10) NOT NULL,
col2 int NULL
);
GO
INSERT dbo.t1(col1) VALUES ('a'),('b');
SELECT TOP (0)
id, -- this create the column including the identity but NOT the primary key
CAST(id AS int) AS id2, -- this will create the column but it will be nullable. No identity
ISNULL(CAST(id AS int),0) AS id3, -- this this create the column and make it nullable. No identity.
col1,
col2
INTO dbo.t2
FROM t1;
Here's the (cleaned up for brevity) DDL for the new table I created:
-- New table
CREATE TABLE dbo.t2
(
id int IDENTITY(1,1) NOT NULL,
id2 int NULL,
id3 int NOT NULL,
col1 varchar(10) NOT NULL,
col2 int NULL
);
Notice that the primary key is gone. When I brought in id as-is it kept the identity. Casting the id column as an int (even though it already is an int) is how I got rid of the identity insert. Adding an ISNULL is how to make a column nullable.
By default, identity insert is set to off here to this query will fail:
INSERT dbo.t2 (id, id3, col1) VALUES (1, 1, 'x');
Msg 544, Level 16, State 1, Line 39
Cannot insert explicit value for identity column in table 't2' when IDENTITY_INSERT is set to OFF.
Setting identity insert on will fix the problem:
SET IDENTITY_INSERT dbo.t2 ON;
INSERT dbo.t2 (id, id3, col1) VALUES (1, 1, 'x');
But now you MUST provide a value for that column. Note the error here:
INSERT dbo.t2 (id3, col1) VALUES (1, 'x');
Msg 545, Level 16, State 1, Line 51
Explicit value must be specified for identity column in table 't2' either when IDENTITY_INSERT is set to ON
Hopefully this helps.
On a side-note: this is a good way to play around with and understand how select insert works. I used a perm table because it's easier to find.

Use multiple conflict_target in ON CONFLICT clause

I have two columns in table col1, col2, they both are unique indexed (col1 is unique and so is col2).
I need at insert into this table, use ON CONFLICT syntax and update other columns, but I can't use both column in conflict_targetclause.
It works:
INSERT INTO table
...
ON CONFLICT ( col1 )
DO UPDATE
SET
-- update needed columns here
But how to do this for several columns, something like this:
...
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
....
ON CONFLICT requires a unique index* to do the conflict detection. So you just need to create a unique index on both columns:
t=# create table t (id integer, a text, b text);
CREATE TABLE
t=# create unique index idx_t_id_a on t (id, a);
CREATE INDEX
t=# insert into t values (1, 'a', 'foo');
INSERT 0 1
t=# insert into t values (1, 'a', 'bar') on conflict (id, a) do update set b = 'bar';
INSERT 0 1
t=# select * from t;
id | a | b
----+---+-----
1 | a | bar
* In addition to unique indexes, you can also use exclusion constraints. These are a bit more general than unique constraints. Suppose your table had columns for id and valid_time (and valid_time is a tsrange), and you wanted to allow duplicate ids, but not for overlapping time periods. A unique constraint won't help you, but with an exclusion constraint you can say "exclude new records if their id equals an old id and also their valid_time overlaps its valid_time."
A sample table and data
CREATE TABLE dupes(col1 int primary key, col2 int, col3 text,
CONSTRAINT col2_unique UNIQUE (col2)
);
INSERT INTO dupes values(1,1,'a'),(2,2,'b');
Reproducing the problem
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this Q1. The result is
ERROR: duplicate key value violates unique constraint "col2_unique"
DETAIL: Key (col2)=(2) already exists.
What the documentation says
conflict_target can perform unique index inference. When performing
inference, it consists of one or more index_column_name columns and/or
index_expression expressions, and an optional index_predicate. All
table_name unique indexes that, without regard to order, contain
exactly the conflict_target-specified columns/expressions are inferred
(chosen) as arbiter indexes. If an index_predicate is specified, it
must, as a further requirement for inference, satisfy arbiter indexes.
This gives the impression that the following query should work, but it does not because it would actually require a together unique index on col1 and col2. However such an index would not guarantee that col1 and col2 would be unique individually which is one of the OP's requirements.
INSERT INTO dupes values(3,2,'c')
ON CONFLICT (col1,col2) DO UPDATE SET col3 = 'c', col2 = 2
Let's call this query Q2 (this fails with a syntax error)
Why?
Postgresql behaves this way is because what should happen when a conflict occurs on the second column is not well defined. There are number of possibilities. For example in the above Q1 query, should postgresql update col1 when there is a conflict on col2? But what if that leads to another conflict on col1? how is postgresql expected to handle that?
A solution
A solution is to combine ON CONFLICT with old fashioned UPSERT.
CREATE OR REPLACE FUNCTION merge_db(key1 INT, key2 INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
UPDATE dupes SET col3 = data WHERE col1 = key1 and col2 = key2;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently, or key2
-- already exists in col2,
-- we could get a unique-key failure
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col1) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
BEGIN
INSERT INTO dupes VALUES (key1, key2, data) ON CONFLICT (col2) DO UPDATE SET col3 = data;
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the UPDATE again.
END;
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
You would need to modify the logic of this stored function so that it updates the columns exactly the way you want it to. Invoke it like
SELECT merge_db(3,2,'c');
SELECT merge_db(1,2,'d');
In nowadays is (seems) impossible. Neither the last version of the ON CONFLICT syntax permits to repeat the clause, nor with CTE is possible: not is possible to breack the INSERT from ON CONFLICT to add more conflict-targets.
If you are using postgres 9.5, you can use the EXCLUDED space.
Example taken from What's new in PostgreSQL 9.5:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Vlad got the right idea.
First you have to create a table unique constraint on the columns col1, col2 Then once you do that you can do the following:
INSERT INTO dupes values(3,2,'c')
ON CONFLICT ON CONSTRAINT dupes_pkey
DO UPDATE SET col3 = 'c', col2 = 2
ON CONFLICT ( col1, col2 )
DO UPDATE
SET
works fine. but you should not update col1, col2 in the SET section.
Create a constraint (foreign index, for example).
OR/AND
Look at existing constraints (\d in psq).
Use ON CONSTRAINT(constraint_name) in the INSERT clause.
You can typically (I would think) generate a statement with only one on conflict that specifies the one and only constraint that is of relevance, for the thing you are inserting.
Because typically, only one constraint is the "relevant" one, at a time. (If many, then I'm wondering if something is weird / oddly-designed, hmm.)
Example:
(License: Not CC0, only CC-By)
// there're these unique constraints:
// unique (site_id, people_id, page_id)
// unique (site_id, people_id, pages_in_whole_site)
// unique (site_id, people_id, pages_in_category_id)
// and only *one* of page-id, category-id, whole-site-true/false
// can be specified. So only one constraint is "active", at a time.
val thingColumnName = thingColumnName(notfificationPreference)
val insertStatement = s"""
insert into page_notf_prefs (
site_id,
people_id,
notf_level,
page_id,
pages_in_whole_site,
pages_in_category_id)
values (?, ?, ?, ?, ?, ?)
-- There can be only one on-conflict clause.
on conflict (site_id, people_id, $thingColumnName) <—— look
do update set
notf_level = excluded.notf_level
"""
val values = List(
siteId.asAnyRef,
notfPref.peopleId.asAnyRef,
notfPref.notfLevel.toInt.asAnyRef,
// Only one of these is non-null:
notfPref.pageId.orNullVarchar,
if (notfPref.wholeSite) true.asAnyRef else NullBoolean,
notfPref.pagesInCategoryId.orNullInt)
runUpdateSingleRow(insertStatement, values)
And:
private def thingColumnName(notfPref: PageNotfPref): String =
if (notfPref.pageId.isDefined)
"page_id"
else if (notfPref.pagesInCategoryId.isDefined)
"pages_in_category_id"
else if (notfPref.wholeSite)
"pages_in_whole_site"
else
die("TyE2ABK057")
The on conflict clause is dynamically generated, depending on what I'm trying to do. If I'm inserting a notification preference, for a page — then there can be a unique conflict, on the site_id, people_id, page_id constraint. And if I'm configuring notification prefs, for a category — then instead I know that the constraint that can get violated, is site_id, people_id, category_id.
So I can, and fairly likely you too, in your case?, generate the correct on conflict (... columns ), because I know what I want to do, and then I know which single one of the many unique constraints, is the one that can get violated.
Kind of hacky but I solved this by concatenating the two values from col1 and col2 into a new column, col3 (kind of like an index of the two) and compared against that. This only works if you need it to match BOTH col1 and col2.
INSERT INTO table
...
ON CONFLICT ( col3 )
DO UPDATE
SET
-- update needed columns here
Where col3 = the concatenation of the values from col1 and col2.
I get I am late to the party but for the people looking for answers I found this:
here
INSERT INTO tbl_Employee
VALUES (6,'Noor')
ON CONFLICT (EmpID,EmpName)
DO NOTHING;
ON CONFLICT is very clumsy solution, run
UPDATE dupes SET key1=$1, key2=$2 where key3=$3
if rowcount > 0
INSERT dupes (key1, key2, key3) values ($1,$2,$3);
works on Oracle, Postgres and all other database

Foreign key constraints involving multiple tables

I have the following scenario in a Postgres 9.3 database:
Tables B and C reference Table A.
Table C has an optional field that references table B.
I would like to ensure that for each row of table C that references table B, c.b.a = c.a. That is, if C has a reference to B, both rows should point at the same row in table A.
I could refactor table C so that if c.b is specified, c.a is null but that would make queries joining tables A and C awkward.
I might also be able to make table B's primary key include its reference to table A and then make table C's foreign key to table B include table C's reference to table A but I think this adjustment would be too awkward to justify the benefit.
I think this can be done with a trigger that runs before insert/update on table C and rejects operations that violate the specified constraint.
Is there a better way to enforce data integrity in this situation?
There is a very simple, bullet-proof solution. Works for Postgres 9.3 - when the original question was asked. Works for the current Postgres 13 - when the question in the bounty was added:
Would like information on if this is possible to achieve without database triggers
FOREIGN KEY constraints can span multiple columns. Just include the ID of table A in the FK constraint from table C to table B. This enforces that linked rows in B and C always point to the same row in A. Like:
CREATE TABLE a (
a_id int PRIMARY KEY
);
CREATE TABLE b (
b_id int PRIMARY KEY
, a_id int NOT NULL REFERENCES a
, UNIQUE (a_id, b_id) -- redundant, but required for FK
);
CREATE TABLE c (
c_id int PRIMARY KEY
, a_id int NOT NULL REFERENCES a
, b_id int
, CONSTRAINT fk_simple_and_safe_solution
FOREIGN KEY (a_id, b_id) REFERENCES b(a_id, b_id) -- THIS !
);
Minimal sample data:
INSERT INTO a(a_id) VALUES
(1)
, (2);
INSERT INTO b(b_id, a_id) VALUES
(1, 1)
, (2, 2);
INSERT INTO c(c_id, a_id, b_id) VALUES
(1, 1, NULL) -- allowed
, (2, 2, 2); -- allowed
Disallowed as requested:
INSERT INTO c(c_id, a_id, b_id) VALUES (3,2,1);
ERROR: insert or update on table "c" violates foreign key constraint "fk_simple_and_safe_solution"
DETAIL: Key (a_id, b_id)=(2, 1) is not present in table "b".
db<>fiddle here
The default MATCH SIMPLE behavior of FK constraints works like this (quoting the manual):
MATCH SIMPLE allows any of the foreign key columns to be null; if any of them are null, the row is not required to have a match in the referenced table.
So NULL values in c(b_id) are still allowed (as requested: "optional field"). The FK constraint is "disabled" for this special case.
We need the logically redundant UNIQUE constraint on b(a_id, b_id) to allow the FK reference to it. But by making it out to be on (a_id, b_id) instead of (b_id, a_id), it is also useful in its own right, providing a useful index on b(a_id) to support the other FK constraint, among other things. See:
Is a composite index also good for queries on the first field?
(An additional index on c(a_id) is typically useful accordingly.)
Further reading:
Differences between MATCH FULL, MATCH SIMPLE, and MATCH PARTIAL?
Enforcing constraints “two tables away”
I ended up creating a trigger as follows:
create function "check C.A = C.B.A"()
returns trigger
as $$
begin
if NEW.b is not null then
if NEW.a != (select a from B where id = NEW.b) then
raise exception 'a != b.a';
end if;
end if;
return NEW;
end;
$$
language plpgsql;
create trigger "ensure C.A = C.B.A"
before insert or update on C
for each row
execute procedure "check C.A = C.B.A"();
Would like information on if this is possible to achieve without database triggers
Yes, it is possible. The mechanism is called ASSERTION and it is defined in SQL-92 Standard(though it is not implemented by any major RDBMS).
In short it allows to create multiple-row constraints or multi-table check constraints.
As for PostgreSQL it could be emulated by using view with WITH CHECK OPTION and performing operation on view instead of base table.
WITH CHECK OPTION
This option controls the behavior of automatically updatable views. When this option is specified, INSERT and UPDATE commands on the view will be checked to ensure that new rows satisfy the view-defining condition (that is, the new rows are checked to ensure that they are visible through the view). If they are not, the update will be rejected.
Example:
CREATE TABLE a(id INT PRIMARY KEY, cola VARCHAR(10));
CREATE TABLE b(id INT PRIMARY KEY, colb VARCHAR(10), a_id INT REFERENCES a(id) NOT NULL);
CREATE TABLE c(id INT PRIMARY KEY, colc VARCHAR(10),
a_id INT REFERENCES a(id) NOT NULL,
b_id INT REFERENCES b(id));
Sample inserts:
INSERT INTO a(id, cola) VALUES (1, 'A');
INSERT INTO a(id, cola) VALUES (2, 'A2');
INSERT INTO b(id, colb, a_id) VALUES (12, 'B', 1);
INSERT INTO c(id, colc, a_id) VALUES (15, 'C', 2);
Violating the condition(connecting C with B different a_id on both tables)
UPDATE c SET b_id = 12 WHERE id = 15;;
-- no issues whatsover
Creating view:
CREATE VIEW view_c
AS
SELECT *
FROM c
WHERE NOT EXISTS(SELECT 1
FROM b
WHERE c.b_id = b.id
AND c.a_id != b.a_id) -- here is the clue, we want a_id to be the same
WITH CHECK OPTION ;
Trying update second time(error):
UPDATE view_c SET b_id = 12 WHERE id = 15;
--ERROR: new row violates check option for view "view_c"
--DETAIL: Failing row contains (15, C, 2, 12).
Trying brand new inserts with incorrect data(also errors)
INSERT INTO b(id, colb, a_id) VALUES (20, 'B2', 2);
INSERT INTO view_c(id, colc, a_id, b_id) VALUES (30, 'C2', 1, 20);
--ERROR: new row violates check option for view "view_c"
--DETAIL: Failing row contains (30, C2, 1, 20)
db<>fiddle demo