How do I implement circular constraints in PostgreSQL? - postgresql

I want to enforce that a row in one table must have a matching row in another table, and vice versa. I'm currently doing it like this to work around the fact that you can't REFERENCE a table that hasn't been created yet. Is there a more natural way that I'm not aware of?
CREATE TABLE LE (id int PRIMARY KEY);
CREATE TABLE LE_TYP (id int PRIMARY KEY, typ text);
ALTER TABLE LE ADD CONSTRAINT
twowayref FOREIGN KEY (id) REFERENCES LE_TYP (id) DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE LE_TYP ADD CONSTRAINT
twowayref_rev FOREIGN KEY (id) REFERENCES LE (id) DEFERRABLE INITIALLY DEFERRED;

What you are doing is ideal where you have circular constraints. There is nothing that can be improved on in your solution.
There are however two points that are worth mentioning briefly just for the next reader. The first is that with this solution in your code isn't obvious why you are choosing the order you are. If you are doing this you probably want to be careful about which constraint is deferred. Usually this is obvious, but you probably want to choose your insert order based on other foreign keys referencing the appropriate tables.
The other is that of the idea of whether it is desirable to engineer around circular dependencies. On one hand it may seem like a good idea when doing so seems like an obvious simplification.
It is however worth pointing out that in a case like this you may want to look at multiple table inheritance instead. In this way you might do:
CREATE TABLE p1 (
id serial not null,
attribute1 int not null,
attribute2 text not null,
check (attribute1 > 1 or length(attribute2) > 10)
);
CREATE TABLE p2 (
id int not null, -- shared primary key with p1
attribute3 int not null,
attribute4 text not null
);
CREATE TABLE combined (
primary key (id)
) INHERITS (p1, p2);
This puts all your columns in one table, but gives you logical interfaces to that table as if it is two tables joined on a common field.

Related

What is the best way to work with an optional FK constraint in Postgres? [duplicate]

This question already has answers here:
How can you represent inheritance in a database?
(7 answers)
Closed 3 years ago.
I have a table in Postgres which contains Things. Each of these Things can be of 1 of 3 types:
is a self contained Thing
is an instance of a SuperThingA
is an instance of a SuperThingB.
Essentially, in object terms, there's an AbstractThing, extended in
different ways by SuperThingA or SuperThingB and all the records on
the Thing table are either unextended or extended in 1 of the 2
ways.
In order to represent this on the Thing table, I have a number of fields which are common to Things of all types, and I have 2 optional FK reference columns into the SuperThingA and SuperThingB tables.
Ideally, I would like a "FK IF NOT NULL" constraint on each the two, but this doesn't appear to be possible; as far as I can see, all I can do is make both fields nullable and rely on the using code to maintain the FK relationships, which is pretty sucky. This seems to be doable in other databases, for example SQL Server, as per SQL Server 2005: Nullable Foreign Key Constraint, but not any way that I've found so far in PG
How else can I handle this - an insert/update trigger which checks the values when either of those fields is not null and checks the value is present on whichever parent table? That's maybe doable on a small parent table with limited inserts on the Thing table (which, in fairness, is largely the case here - no more than a couple of hundred records in each of the parent tables, and small numbers of inserts on the Thing table), but in a more general case, would be a performance black hole on the inserts if one or both parent table were large
This is not currently enforced with a FK relationship. I've reviewed the PG docs, and it seem pretty definitive that I can't have an optional FK relatioship (which is understandable). It leaves me with a table definition something like this:
CREATE TABLE IF NOT EXISTS Thing(
Thing_id int4 NOT NULL,
Thing_description varchar(40),
Thing_SuperA_FK int4,
Thing_SuperB_FK char(10),
CONSTRAINT ThingPK PRIMARY KEY (Thing_id)
)
;
Every foreign key on a nullable column is only enforced when the value is non-null. This is default behavior.
create table a (
id int4 not null primary key
);
create table b (
id int4 not null primary key,
a_id int4 references a(id)
);
In the example table b has an optional reference to table a.
insert into a values(1); -- entry in table a
insert into b values (1, 1); -- entry in b with reference to a
insert into b values (2, null); -- entry in b with no reference to a
Depending on your use case it also might make sense to reverse your table structure. Instead of having the common table pointing to two more specialized tables you can have it the other way around. This way you avoid the non-null columns entirely.
create table common(
id int4 primary key
-- all the common fields
);
create table special1(
common_id int4 not null references common(id)
-- all the special fields of type special1
);
create table special2(
common_id int4 not null references common(id)
-- all the special fields of type special2
);
You need your SuperN_FK fields defined as nullable foreign keys, then you'll need check constraint(s) on the table to enforce the optional NULLability requirements.
CREATE TABLE Things
( ID int primary key
, col1 varchar(1)
, col2 varchar(1)
, SuperA_FK int constraint fk_SuperA references Things(ID)
, cola1 varchar(1)
, constraint is_SuperA check ((SuperA_FK is null and cola1 is null) or
(SuperA_FK is not null and cola1 is not null))
, SuperB_FK int constraint fk_SuperB references Things(ID)
, colb1 varchar(1)
, constraint is_SuberB check ((SuperB_FK is null and colb1 is null) or
(SuperB_FK is not null))
, constraint Super_Constraint check (
case when SuperA_FK is not null then 1 else 0 end +
case when SuperB_FK is not null then 1 else 0 end
<= 1 )
);
In the above example I've split the check constraints up for ease maintenance. The two is_SuperN constraints enforce the NULL requirements on the FK and it's related detail columns, either all NULLs or the FK is not null and some or all of the detail columns are not null. The final Super_Constraint ensures that at most one SuperN_FK is not null.

Is this a good idea to store relations to many different tables in one field? [duplicate]

I have a database which has three tables
Messages - PK = MessageId
Drafts - PK = DraftId
History - FK = RelatedItemId
The History table has a single foreign Key [RelatedItemId] which maps to one of the two Primary keys in Messages and Drafts.
Is there a name for this relationship?
Is it just bad design?
Is there a better way to design this relationship?
Here are the CREATE TABLE statements for this question:
CREATE TABLE [dbo].[History](
[HistoryId] [uniqueidentifier] NOT NULL,
[RelatedItemId] [uniqueidentifier] NULL,
CONSTRAINT [PK_History] PRIMARY KEY CLUSTERED ( [HistoryId] ASC )
)
CREATE TABLE [dbo].[Messages](
[MessageId] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED ( [MessageId] ASC )
)
CREATE TABLE [dbo].[Drafts](
[DraftId] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_Drafts] PRIMARY KEY CLUSTERED ( [DraftId] ASC )
)
In a short description the solution you have used is called:
Polymorphic Association
Objective: Reference Multiple Parents
Resulting anti-pattern: Use dual-purpose foreign key, violating first normal form (atomic issue), loosing referential integrity
Solution: Simplify the Relationship
More information about the problem.
BTW createing a common super-table will help you:
Is there a name for this relationship?
There is no standard name that I'm aware of, but I've heard people using the term "generic FKs" or even "inner-platform effect".
Is it just bad design?
Yes.
The reason: it prevents you from declaring a FOREIGN KEY, and therefore prevents the DBMS from enforcing referential integrity directly. Therefore you must enforce it trough imperative code, which is surprisingly difficult.
Is there a better way to design this relationship?
Yes.
Create separate FOREIGN KEY for each referenced table. Make them NULL-able, but make sure exactly one of them is non-NULL, through a CHECK constraint.
Alternatively, take a look at inheritance.
Best practice I have found is to create a Function that returns whether the passed in value exists in either of your Messages and Drafts PK columns. You can then add a constraint on the column on the History that calls this function and will only insert if it passes (i.e. it exists).
Adding non-parsed example Code:
CREATE FUNCTION is_related_there (
IN #value uniqueidentifier )
RETURNS TINYINT
BEGIN
IF (select count(DraftId) from Drafts where DraftId = #value + select count(MessageId) from Messages where MessageId = #value) > 0 THEN
RETURN 1;
ELSE
RETURN 0;
END IF;
END;
ALTER TABLE History ADD CONSTRAINT
CK_HistoryExists CHECK (is_related_there (RelatedItemId) = 1)
Hope that runs and helps lol

postgres - constraint on sum of a column (without triggers)

I want to constrain the sum of a certain attribute of child entities of a parent entity to a certain attribute of that parent entity. I want to do this using PostgreSQL and without using triggers. An example follows;
Assume we have a crate with a volume attribute. We want to fill it with smaller boxes, which have their own volume attributes. The sum of volumes of all boxes in the crate cannot be greater than the volume of the crate.
The idea i have in mind is something like:
CREATE TABLE crates (
crate_id int NOT NULL,
crate_volume int NOT NULL,
crate_volume_used int NOT NULL DEFAULT 0,
CONSTRAINT crates_pkey PRIMARY KEY (crate_id),
CONSTRAINT ukey_for_fkey_ref_from_boxes
UNIQUE (crate_id, crate_volume, crate_volume_used),
CONSTRAINT crate_volume_used_cannot_be_greater_than_crate_volume
CHECK (crate_volume_used <= crate_volume),
CONSTRAINT crate_volume_must_be_positive CHECK (crate_volume >= 0)
);
CREATE TABLE boxes (
box_id int NOT NULL,
box_volume int NOT NULL,
crate_id int NOT NULL,
crate_volume int NOT NULL,
crate_volume_used int NOT NULL,
id_of_previous_box int,
previous_sum_of_volumes_of_boxes int,
current_sum_of_volumes_of_boxes int NOT NULL,
id_of_next_box int,
CONSTRAINT boxes_pkey PRIMARY KEY (box_id),
CONSTRAINT box_volume_must_be_positive CHECK (box_volume >= 0),
CONSTRAINT crate_fkey FOREIGN KEY (crate_id, crate_volume, crate_volume_used)
REFERENCES crates (crate_id, crate_volume, crate_volume_used) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED,
CONSTRAINT previous_box_self_ref_fkey FOREIGN KEY (id_of_previous_box, previous_sum_of_volumes_of_boxes)
REFERENCES boxes (box_id, current_sum_of_volumes_of_boxes) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED,
CONSTRAINT ukey_for_previous_box_self_ref_fkey UNIQUE (box_id, current_sum_of_volumes_of_boxes),
CONSTRAINT previous_box_self_ref_fkey_validity UNIQUE (crate_id, id_of_previous_box),
CONSTRAINT next_box_self_ref_fkey FOREIGN KEY (id_of_next_box)
REFERENCES boxes (box_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED,
CONSTRAINT next_box_self_ref_fkey_validity UNIQUE (crate_id, id_of_next_box),
CONSTRAINT self_ref_key_integrity CHECK (
(id_of_previous_box IS NULL AND previous_sum_of_volumes_of_boxes IS NULL) OR
(id_of_previous_box IS NOT NULL AND previous_sum_of_volumes_of_boxes IS NOT NULL)
),
CONSTRAINT sum_of_volumes_of_boxes_check1 CHECK (current_sum_of_volumes_of_boxes <= crate_volume),
CONSTRAINT sum_of_volumes_of_boxes_check2 CHECK (
(previous_sum_of_volumes_of_boxes IS NULL AND current_sum_of_volumes_of_boxes=box_volume) OR
(previous_sum_of_volumes_of_boxes IS NOT NULL AND current_sum_of_volumes_of_boxes=box_volume+previous_sum_of_volumes_of_boxes)
),
CONSTRAINT crate_volume_used_check CHECK (
(id_of_next_box IS NULL AND crate_volume_used=current_sum_of_volumes_of_boxes) OR
(id_of_next_box IS NOT NULL)
)
);
CREATE UNIQUE INDEX single_first_box ON boxes (crate_id) WHERE id_of_previous_box IS NULL;
CREATE UNIQUE INDEX single_last_box ON boxes (crate_id) WHERE id_of_next_box IS NULL;
My questions is if this is a way of doing this and, if there is a better (less confusing, more optimized etc.) way of doing this. Or should i just stick to triggers?
Thanks in advance.
My questions is if there is a better (less confusing, more optimized etc.) way of doing this.
Yes, there is: in a word, use a trigger…
No, never mind that you don't want to use one. Use a trigger here; no ifs, no buts.
Expanding on the comments I and others posted earlier:
What you're doing amounts to writing a constraint trigger that is verifying that sum(boxes.volume) <= crate.volume. It's just doing so in a very, very bastardized way (by having check constraints and unique keys and foreign keys masquerade as an aggregate function), and doing the relevant calculations within your app at that.
Your only achievement in avoiding to use a genuine trigger will be errors down the road when two concurrent updates will try to affect the same crate. All this, at the cost of maintaining unnecessary unique indexes and foreign keys.
Sure, you'll end up fixing some or all of these issues and refining your "implementation" further by making the foreign keys deferrable, adding locks, yada yada. But in the end, you're basically doing what amounts to writing a vastly inefficient aggregate function.
So use a trigger. Either maintain a current_volume column in crates using after triggers on boxes, and enforce a check using a simple check() constraint on crates. Or add constraint triggers on boxes, to enforce the check directly.
If you need more convincing, just consider at the overhead that you're creating. Really. Take a cold, hard look at it: instead of maintaining one volume column in crates using triggers (if even that), you're maintaining no less than six fields that serve absolutely no purpose beyond your constraint, and so many useless unique indexes and foreign keys constraints related to them that I genuinely lose count when I try to enumerate them. And check constraints on them, at that. This stuff all adds up in terms of storage and write performance.

Foreign Key referencing inherited table

I have the following tables:
CREATE TABLE mail (
id serial,
parent_mail_id integer,
...
PRIMARY KEY (id),
FOREIGN KEY (parent_mail_id) REFERENCES mail(id),
...
);
CREATE TABLE incoming (
from_contact_id integer NOT NULL REFERENCES contact(id),
...
PRIMARY KEY (id),
---> FOREIGN KEY (parent_mail_id) REFERENCES mail(id), <---
...
) INHERITS(mail);
CREATE TABLE outgoing (
from_user_id integer NOT NULL REFERENCES "user"(id),
...
PRIMARY KEY (id),
--> FOREIGN KEY (parent_mail_id) REFERENCES mail(id), <--
...
) INHERITS(mail);
incoming and outgoing inherit from mail and define their foreign keys (and primary keys) again, as they are not automatically inherited.
The problem is:
If I'd insert an incoming mail, it is not possible to reference it from the outgoing table as the foreign key only works with the super table (mails).
Is there a workaround for that?
PostgreSQL 9.3 docs:
A serious limitation of the inheritance feature is that indexes
(including unique constraints) and foreign key constraints only apply
to single tables, not to their inheritance children. This is true on
both the referencing and referenced sides of a foreign key constraint.
Thus, in the terms of the above example:
If we declared cities.name to be UNIQUE or a PRIMARY KEY, this would not stop the capitals table from having rows with names
duplicating rows in cities. And those duplicate rows would by default
show up in queries from cities. In fact, by default capitals would
have no unique constraint at all, and so could contain multiple rows
with the same name. You could add a unique constraint to capitals, but
this would not prevent duplication compared to cities.
Similarly, if we were to specify that cities.name REFERENCES some other table, this constraint would not automatically propagate to
capitals. In this case you could work around it by manually adding the
same REFERENCES constraint to capitals.
Specifying that another table's column REFERENCES cities(name) would allow the other table to contain city names, but not capital
names. There is no good workaround for this case.
These deficiencies will probably be fixed in some future release, but
in the meantime considerable care is needed in deciding whether
inheritance is useful for your application.
And not really a workaround, so maybe make mails a non-inherited table, and then separate incoming_columns and outgoing_columns for their respective extra columns, with the mail id as both their primary and foreign key. You can then create a view outgoing as mail INNER JOIN outgoing_columns, for example.
You may use a constraint trigger
CREATE OR REPLACE FUNCTION mail_ref_trigger()
RETURNS trigger AS
$BODY$
DECLARE
BEGIN
IF NOT EXISTS (
SELECT 1 FROM mail WHERE id = NEW.parent_mail_id
) THEN
RAISE foreign_key_violation USING MESSAGE = FORMAT('Referenced mail id not found, mail_id:%s', NEW.parent_mail_id);
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
CREATE CONSTRAINT TRIGGER mail_fkey_trigger
AFTER UPDATE OR INSERT ON incoming
DEFERRABLE
FOR EACH ROW EXECUTE PROCEDURE mail_ref_trigger();

Shared Primary key versus Foreign Key

I have a laboratory analysis database and I'm working on the bast data layout. I've seen some suggestions based on similar requirements for using a "Shared Primary Key", but I don't see the advantages over just foreign keys. I'm using PostgreSQL:tables listed below
Sample
___________
sample_id (PK)
sample_type (where in the process the sample came from)
sample_timestamp (when was the sample taken)
Analysis
___________
analysis_id (PK)
sample_id (FK references sample)
method (what analytical method was performed)
analysis_timestamp (when did the analysis take place)
analysis_notes
gc
____________
analysis_id (shared Primary key)
gc_concentration_meoh (methanol concentration)
gc_concentration_benzene (benzene concentration)
spectrophotometer
_____________
analysis_id
spectro_nm (wavelength used)
spectro_abs (absorbance measured)
I could use this design, or I could move the fields from the analysis table into both the gc and spectrophotometer tables, and just use foreign keys between sample, gc, and spectrophotometer tables. The only advantage I see of this design is in cases where I would just want information on how many or what types of analyses were performed, without having to join in the actual results. However, the additional rules to ensure referential integrity between the shared primary keys, and managing extra joins and triggers (on delete cascade, etc) appears to make it more of a headache than the minor advantages. I'm not a DBA, but a scientist, so please let me know what I'm missing.
UPDATE:
A shared primary key (as I understand it) is like a one-to-one foreign key with the additional constraint that each value in the parent tables(analysis) must appear in one of the child tables once, and no more than once.
I've seen some suggestions based on similar requirements for using a
"Shared Primary Key", but I don't see the advantages over just foreign
keys.
If I've understood your comments above, the advantage is that only the first implements the requirement that each row in the parent match a row in one child, and only in one child. Here's one way to do that.
create table analytical_methods (
method_id integer primary key,
method_name varchar(25) not null unique
);
insert into analytical_methods values
(1, 'gc'),(2, 'spec'), (3, 'Atomic Absorption'), (4, 'pH probe');
create table analysis (
analysis_id integer primary key,
sample_id integer not null, --references samples, not shown
method_id integer not null references analytical_methods (method_id),
analysis_timestamp timestamp not null,
analysis_notes varchar(255),
-- This unique constraint lets the pair of columns be the target of
-- foreign key constraints from other tables.
unique (analysis_id, method_id)
);
-- The combination of a) the default value and the check() constraint on
-- method_id, and b) the foreign key constraint on the paired columns
-- analysis_id and method_id guarantee that rows in this table match a
-- gc row in the analysis table.
--
-- If all the child tables have similar constraints, a row in analysis
-- can match a row in one and only one child table.
create table gc (
analysis_id integer primary key,
method_id integer not null
default 1
check (method_id = 1),
foreign key (analysis_id, method_id)
references analysis (analysis_id, method_id),
gc_concentration_meoh integer not null,
gc_concentration_benzene integer not null
);
It looks like in my case this supertype/subtype model in not the best choice. Instead, I should move the fields from the analysis table into all the child tables, and make a series of simple foreign key relationships. The advantage of the supertype/subtype model is when using the primary key of the supertype as a foreign key in another table. Since I am not doing this, the extra layer of complexity will not add anything.