Shared Primary key versus Foreign Key - postgresql

I have a laboratory analysis database and I'm working on the bast data layout. I've seen some suggestions based on similar requirements for using a "Shared Primary Key", but I don't see the advantages over just foreign keys. I'm using PostgreSQL:tables listed below
Sample
___________
sample_id (PK)
sample_type (where in the process the sample came from)
sample_timestamp (when was the sample taken)
Analysis
___________
analysis_id (PK)
sample_id (FK references sample)
method (what analytical method was performed)
analysis_timestamp (when did the analysis take place)
analysis_notes
gc
____________
analysis_id (shared Primary key)
gc_concentration_meoh (methanol concentration)
gc_concentration_benzene (benzene concentration)
spectrophotometer
_____________
analysis_id
spectro_nm (wavelength used)
spectro_abs (absorbance measured)
I could use this design, or I could move the fields from the analysis table into both the gc and spectrophotometer tables, and just use foreign keys between sample, gc, and spectrophotometer tables. The only advantage I see of this design is in cases where I would just want information on how many or what types of analyses were performed, without having to join in the actual results. However, the additional rules to ensure referential integrity between the shared primary keys, and managing extra joins and triggers (on delete cascade, etc) appears to make it more of a headache than the minor advantages. I'm not a DBA, but a scientist, so please let me know what I'm missing.
UPDATE:
A shared primary key (as I understand it) is like a one-to-one foreign key with the additional constraint that each value in the parent tables(analysis) must appear in one of the child tables once, and no more than once.

I've seen some suggestions based on similar requirements for using a
"Shared Primary Key", but I don't see the advantages over just foreign
keys.
If I've understood your comments above, the advantage is that only the first implements the requirement that each row in the parent match a row in one child, and only in one child. Here's one way to do that.
create table analytical_methods (
method_id integer primary key,
method_name varchar(25) not null unique
);
insert into analytical_methods values
(1, 'gc'),(2, 'spec'), (3, 'Atomic Absorption'), (4, 'pH probe');
create table analysis (
analysis_id integer primary key,
sample_id integer not null, --references samples, not shown
method_id integer not null references analytical_methods (method_id),
analysis_timestamp timestamp not null,
analysis_notes varchar(255),
-- This unique constraint lets the pair of columns be the target of
-- foreign key constraints from other tables.
unique (analysis_id, method_id)
);
-- The combination of a) the default value and the check() constraint on
-- method_id, and b) the foreign key constraint on the paired columns
-- analysis_id and method_id guarantee that rows in this table match a
-- gc row in the analysis table.
--
-- If all the child tables have similar constraints, a row in analysis
-- can match a row in one and only one child table.
create table gc (
analysis_id integer primary key,
method_id integer not null
default 1
check (method_id = 1),
foreign key (analysis_id, method_id)
references analysis (analysis_id, method_id),
gc_concentration_meoh integer not null,
gc_concentration_benzene integer not null
);

It looks like in my case this supertype/subtype model in not the best choice. Instead, I should move the fields from the analysis table into all the child tables, and make a series of simple foreign key relationships. The advantage of the supertype/subtype model is when using the primary key of the supertype as a foreign key in another table. Since I am not doing this, the extra layer of complexity will not add anything.

Related

Composite FK referencing atomic PK + non unique attribute

I am trying to create the following tables in Postgres 13.3:
CREATE TABLE IF NOT EXISTS accounts (
account_id Integer PRIMARY KEY NOT NULL
);
CREATE TABLE IF NOT EXISTS users (
user_id Integer PRIMARY KEY NOT NULL,
account_id Integer NOT NULL REFERENCES accounts(account_id) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS calendars (
calendar_id Integer PRIMARY KEY NOT NULL,
user_id Integer NOT NULL,
account_id Integer NOT NULL,
FOREIGN KEY (user_id, account_id) REFERENCES users(user_id, account_id) ON DELETE CASCADE
);
But I get the following error when creating the calendars table:
ERROR: there is no unique constraint matching given keys for referenced table "users"
Which does not make much sense to me since the foreign key contains the user_id which is the PK of the users table and therefore also has a uniqueness constraint. If I add an explicit uniqueness constraint on the combined user_id and account_id like so:
ALTER TABLE users ADD UNIQUE (user_id, account_id);
Then I am able to create the calendars table. This unique constraint seems unnecessary to me as user_id is already unique. Can someone please explain to me what I am missing here?
Postgres is so smart/dumb that it doesn't assume the designer to do stupid things.
The Postgres designers could have taken different strategies:
Detect the transitivity, and make the FK not only depend on users.id, but also on users.account_id -> accounts.id. This is doable but costly. It also involves adding multiple dependency-records in the catalogs for a single FK-constraint. When imposing the constraint(UPDATE or DELETE in any of the two referred tables), it could get very complex.
Detect the transitivity, and silently ignore the redundant column reference. This implies: lying to the programmer. It would also need to be represented in the catalogs.
cascading DDL operations would get more complex, too. (remember: DDL is already very hard w.r.t. concurrency/versioning)
From the execution/performance point of view: imposing the constraints currently involves "pseudo triggers" on the referred table's indexes. (except DEFERRED, which has to be handled specially)
So, IMHO the Postgres developers made the sane choice of refusing to do stupid complex things.

How to edit a record that results in a uniqueness violation and echo the change to child tables?

PostgreSQL 11.1
How can "Editing" of a record be transmitted to dependent records?
Summary of my issue:
The master table, disease, needs a unique constraint on the description column. This unique constraint is needed for foreign key ON UPDATE CASCADE to its children tables.
To allow for a temporary violation of the unique constraint, it must be made deferrable. BUT A DEFERABLE CONSTRAINT CAN NOT BE USED IN A FOREIGN KEY.
Here is the situation.
The database has 100+ tables (and keeps on growing).
Most all information has been normalized in that repeating groups of information have been delegated to their own table.
Following normalization, most tables are lists without duplication of records. Duplication of records within a table is not allowed.
All tables have a unique ID assigned to each record (in addition to a unique constraint placed on the record information).
Most tables are dependent on another table. The foreign keys reference the primary key of the table they are dependent on.
Most unique constraints involve a foreign key (which in turn references the primary key of the parent table).
So, assume the following schema:
CREATE TABLE phoenix.disease
(
recid integer NOT NULL DEFAULT nextval('disease_recid_seq'::regclass),
code text COLLATE pg_catalog."default",
description text COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT disease_pkey PRIMARY KEY (recid),
CONSTRAINT disease_code_unique UNIQUE (code)
DEFERRABLE,
CONSTRAINT disease_description_unique UNIQUE (description)
,
CONSTRAINT disease_description_check CHECK (description <> ''::text)
)
CREATE TABLE phoenix.dx
(
recid integer NOT NULL DEFAULT nextval('dx_recid_seq'::regclass),
disease_recid integer NOT NULL,
patient_recid integer NOT NULL,
CONSTRAINT pk_dx_recid PRIMARY KEY (recid),
CONSTRAINT dx_unique UNIQUE (tposted, patient_recid, disease_recid)
,
CONSTRAINT dx_disease_recid_fkey FOREIGN KEY (disease_recid)
REFERENCES phoenix.disease (recid) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE RESTRICT,
CONSTRAINT dx_patients FOREIGN KEY (patient_recid)
REFERENCES phoenix.patients (recid) MATCH SIMPLE
ON UPDATE CASCADE
ON DELETE RESTRICT
)
(Columns not involved in this question have been removed. :) )
There are many other children tables of disease with the same basic dependency on the disease table. Note that the primary key of the disease table is a foreign key to the dx table and that the dx table uses this foreign key in a unique constraint. Also note that the dx table is just one table of a long chain of table references. (That is the dx table also has its primary key referenced by other tables).
The problem: I wish to "edit" the contents of the parent disease record. By "edit", I mean:
change the data in the description column.
if the result of the change causes a duplication in the disease table, then one of the "duplicated" records will need to be deleted.
Herein lies my problem. There are many different tables that use the primary key of the disease table in their own unique constraint. If those tables ALSO have a foreign key reference to the duplicated record (in disease), then cascading the delete to those tables would be appropriate -- i.e., no duplication of records will occur.
However, if the child table does NOT have a reference to the "correct" record in the parent disease table, then simply deleting the record (by cascade) will result in loss of information.
Example:
Disease Table:
record 1: ID = 1 description = "ABC"
record 2: ID = 2 description = "DEF"
Dx Table:
record 5: ID = 5 refers to ID=1 of Disease Table.
Editing of record 1 in Disease table results in description becoming "DEF"
Disease Table:
record 1: ID = 1 "ABC" --> "DEF"
I have tried deferring the primary key of the disease table so as to allow the "correct" ID to be "cascaded" to the child tables. This causes the following errors:
A foreign key can not be dependent on a deferred column. "cannot use a deferrable unique constraint for referenced table "disease"
additionally, the parent table (disease) has no way of knowing ahead of time if its children already have a reference to the "correct" record so allowing deletion, or if the child needs to change its own column data to reflect the new "correct" id.
So, how can I allow a change in the parent table (disease) and notify the child tables to change their column values -- and delete within them selves should a duplicate record arise?
Lastly, I do not know today what future tables I will need. So I cannot "precode" into the parent table who its children are or will be.
Thank you for any help with this.

Composite PRIMARY KEY enforces NOT NULL constraints on involved columns

This is one strange, unwanted behavior I encountered in Postgres:
When I create a Postgres table with composite primary keys, it enforces NOT NULL constraint on each column of the composite combination.
For example,
CREATE TABLE distributors (m_id integer, x_id integer, PRIMARY KEY(m_id, x_id));
enforces NOT NULL constraint on columns m_id and x_id, which I don't want!
MySQL doesn't do this. I think Oracle doesn't do it as well.
I understand that PRIMARY KEY enforces UNIQUE and NOT NULL automatically but that makes sense for single-column primary key. In a multi-column primary key table, the uniqueness is determined by the combination.
Is there any simple way of avoiding this behavior of Postgres? When I execute this:
CREATE TABLE distributors (m_id integer, x_id integer);
I do not get any NOT NULL constraints of course. But I would not have a primary key either.
If you need to allow NULL values, use a UNIQUE constraint (or index) instead of a PRIMARY KEY (and add a surrogate PK column - I suggest a serial or IDENTITY column in Postgres 10 or later).
Auto increment table column
A UNIQUE constraint allows columns to be NULL:
CREATE TABLE distributor (
distributor_id GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, m_id integer
, x_id integer
, UNIQUE(m_id, x_id) -- !
-- , CONSTRAINT distributor_my_name_uni UNIQUE (m_id, x_id) -- verbose form
);
The manual:
For the purpose of a unique constraint, null values are not considered equal, unless NULLS NOT DISTINCT is specified.
In your case, you could enter something like (1, NULL) for (m_id, x_id) any number of times without violating the constraint. Postgres never considers two NULL values equal - as per definition in the SQL standard.
If you need to treat NULL values as equal (i.e. "not distinct") to disallow such "duplicates", I see two three (since Postgres 15) options:
0. NULLS NOT DISTINCT
This option was added with Postgres 15 and allows to treat NULL values as "not distinct", so two of them conflict in a unique constraint or index. This is the most convenient option, going forward. The manual:
That means even in the presence of a unique constraint it is possible
to store duplicate rows that contain a null value in at least one of
the constrained columns. This behavior can be changed by adding the
clause NULLS NOT DISTINCT ...
Detailed instructions:
Create unique constraint with null columns
1. Two partial indexes
In addition to the UNIQUE constraint above:
CREATE UNIQUE INDEX dist_m_uni_idx ON distributor (m_id) WHERE x_id IS NULL;
CREATE UNIQUE INDEX dist_x_uni_idx ON distributor (x_id) WHERE m_id IS NULL;
But this gets out of hands quickly with more than two columns that can be NULL. See:
Create unique constraint with null columns
2. A multi-column UNIQUE index on expressions
Instead of the UNIQUE constraint. We need a free default value that is never present in involved columns, like -1. Add CHECK constraints to disallow it:
CREATE TABLE distributor (
distributor serial PRIMARY KEY
, m_id integer
, x_id integer
, CHECK (m_id &lt> -1)
, CHECK (x_id &lt> -1)
);
CREATE UNIQUE INDEX distributor_uni_idx
ON distributor (COALESCE(m_id, -1), COALESCE(x_id, -1));
When you want a polymorphic relation
Your table uses column names that indicate that they are probably references to other tables:
CREATE TABLE distributors (m_id integer, x_id integer);
So I think you probably are trying to model a polymorphic relation to other tables – where a record in your table distributors can refer to one m record xor one x record.
Polymorphic relations are difficult in SQL. The best resource I have seen about this topic is "Modeling Polymorphic Associations in a Relational Database". There, four alternative options are presented, and the recommendation for most cases is called "Exclusive Belongs To", which in your case would lead to a table like this:
CREATE TABLE distributors (
id serial PRIMARY KEY,
m_id integer REFERENCES m,
x_id integer REFERENCES x,
CHECK (
((m_id IS NOT NULL)::integer + (x_id IS NOT NULL)::integer) = 1
)
);
CREATE UNIQUE INDEX ON distributors (m_id) WHERE m_id IS NOT NULL;
CREATE UNIQUE INDEX ON distributors (x_id) WHERE x_id IS NOT NULL;
Like other solutions, this uses a surrogate primary key column because primary keys are enforced to not contain NULL values in the SQL standard.
This solution adds a 4th option to the three in #Erwin Brandstetter's answer for how to avoid the case where "you could enter something like (1, NULL) for (m_id, x_id) any number of times without violating the constraint." Here, that case is excluded by a combination of two measures:
Partial unique indexes on each column individually: two records (1, NULL) and (1, NULL) would not violate the constraint on the second column as NULLs are considered distinct, but they would violate the constraint on the first column (two records with value 1).
Check constraint: The missing piece is preventing multiple (NULL, NULL) records, still allowed because NULLs are considered distinct, and anyway because our partial indexes do not cover them to save space and write events. This is achieved by the CHECK constraint, which prevents any (NULL, NULL) records by making sure that exactly one column is NULL.
There's one difference though: all alternatives in #Erwin Brandstetter's answer allow at least one record (NULL, NULL) and any number of records with no NULL value in any column (like (1, 2)). When modeling a polymorphic relation, you want to disallow such records. That is achieved by the check constraint in the solution above.

Will a primary key index serve as an index for a foreign key when fk columns are subset of pk?

I have a table where part of the primary key is a foreign key to another table.
create table player_result (
event_id integer not null,
pub_time timestamp not null,
name_key varchar(128) not null,
email_address varchar(128),
withdrawn boolean not null,
place integer,
realized_values hstore,
primary key (event_id, pub_time, name_key),
foreign key (email_address) references email(address),
foreign key (event_id, pub_time) references event_publish(event_id, pub_time));
Will the index generated for the primary key suffice to back the foreign key on event_id and pub_time?
Yes.
Index A,B,C
is good for:
A
A,B
A,B,C (and any other combination of the full 3 fields, if default order is unimportant)
but not good for other combinations (such as B,C, C,A etc.).
It will be useful for the referencing side, such that a DELETE or UPDATE on the referenced table can use the PRIMARY KEY of the referencing side as an index when performing checks for the existence of referencing rows or running cascade update/deletes. PostgreSQL doesn't require this index to exist at all, it just makes foreign key constraint checks faster if it is there.
It is not sufficient to serve as the unique constraint for a reference to those columns. You couldn't create a FOREIGN KEY that REFERENCES player_result(event_id, pub_time) because there is no unique constraint on those columns. That pair can appear multiple times in the table so long as each pair has a different name_key.
As #xagyg accurately notes, the unique b-tree index created by the foreign key reference is also only useful for references to columns from the left of the index. It could not be used for a lookup of pub_time, name_key or just name_key, for example.

How do I implement circular constraints in PostgreSQL?

I want to enforce that a row in one table must have a matching row in another table, and vice versa. I'm currently doing it like this to work around the fact that you can't REFERENCE a table that hasn't been created yet. Is there a more natural way that I'm not aware of?
CREATE TABLE LE (id int PRIMARY KEY);
CREATE TABLE LE_TYP (id int PRIMARY KEY, typ text);
ALTER TABLE LE ADD CONSTRAINT
twowayref FOREIGN KEY (id) REFERENCES LE_TYP (id) DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE LE_TYP ADD CONSTRAINT
twowayref_rev FOREIGN KEY (id) REFERENCES LE (id) DEFERRABLE INITIALLY DEFERRED;
What you are doing is ideal where you have circular constraints. There is nothing that can be improved on in your solution.
There are however two points that are worth mentioning briefly just for the next reader. The first is that with this solution in your code isn't obvious why you are choosing the order you are. If you are doing this you probably want to be careful about which constraint is deferred. Usually this is obvious, but you probably want to choose your insert order based on other foreign keys referencing the appropriate tables.
The other is that of the idea of whether it is desirable to engineer around circular dependencies. On one hand it may seem like a good idea when doing so seems like an obvious simplification.
It is however worth pointing out that in a case like this you may want to look at multiple table inheritance instead. In this way you might do:
CREATE TABLE p1 (
id serial not null,
attribute1 int not null,
attribute2 text not null,
check (attribute1 > 1 or length(attribute2) > 10)
);
CREATE TABLE p2 (
id int not null, -- shared primary key with p1
attribute3 int not null,
attribute4 text not null
);
CREATE TABLE combined (
primary key (id)
) INHERITS (p1, p2);
This puts all your columns in one table, but gives you logical interfaces to that table as if it is two tables joined on a common field.