Postgres: best way to implement single favourite flag - postgresql

I have this table:
CREATE TABLE public.data_source__instrument (
instrument_id int4 NOT NULL,
data_source_id int4 NOT NULL,
CONSTRAINT data_source__instrument__pk PRIMARY KEY (data_source_id, instrument_id)
);
For clarity, here's an example of the data I might have in this table:
instrument_id
data_source_id
1
1
1
2
1
3
2
2
2
3
I would like to be able to set a favourite data source for each instrument. I'd also like each instrument to have 1 and only 1 favourite data source.
The solution I came up with is the below:
CREATE TABLE public.data_source__instrument (
instrument_id int4 NOT NULL,
data_source_id int4 NOT NULL,
fav_data_source boolean NULL, -- ### new column ###
CONSTRAINT data_source__instrument__pk PRIMARY KEY (data_source_id, instrument_id),
CONSTRAINT fav_data_source UNIQUE (instrument_id,fav_data_source) -- ### new constraint ###
);
I chose to mark favourites with the value true and set non-favourite tuples to null (because the unique constraint doesn't affect NULLs).
This solution will allow at most one true value per instrument in the fav_data_source column.
Example:
instrument_id
data_source_id
fav_data_source
1
1
true
1
2
null
1
3
null
2
2
null
2
3
true
However, I'm not completely satisfied with this solution. For starters, it allows instruments without a favourite data source. Moreover, the value of the fav_data_source could be set to false which is confusing and not very useful (since it doesn't play well with the unique constraint).
Is there a better way of doing this? Am I overlooking something?
EDIT: Ideally, I'd prefer a simple solution to this problem and avoid using features like database triggers.

What you need is to think about DB normalisation and Entities and their Relations. Specifically
having that boolean flag in your joint table is violating normal form (functional dependency on other rows)
"favourite data source" is an entity on its own, the flag you have is describing a relation when it should describe an entity
I assume you have separate tables for instruments and data sources, and omitted those just for the sake of simplifying your question.
Here's what you need:
create table instrument
(
id serial primary key
);
create table data_source
(
id serial primary key
);
create table data_source__instrument
(
data_source_id integer references data_source on delete cascade on update cascade,
instrument_id integer references instrument on delete cascade on update cascade,
constraint data_source__instrument_pk primary key (data_source_id, instrument_id)
);
create table favourite_data_source
(
data_source_id integer not null,
instrument_id integer not null unique,
constraint favourite_data_source_pk foreign key (data_source_id, instrument_id) references data_source__instrument (data_source_id, instrument_id) on update cascade on delete cascade
);
And now to reproduce your example:
insert into instrument
values (1),
(2);
insert into data_source
values (1),
(2),
(3);
insert into data_source__instrument (instrument_id, data_source_id)
values (1, 1),
(1, 2),
(1, 3),
(2, 2),
(2, 3);
insert into favourite_data_source (instrument_id, data_source_id)
values (1, 1),
(2, 3);
With this scheme
you can have 0 to 1 favourite data source per instrument
you have to insert favourite data source data separately
FKs will handle integrity thanks to cascade and checks

I chose to mark favourites with the value true and set non-favourite tuples to null (because the unique constraint doesn't affect NULLs), [however] the value of the fav_data_source could be set to false which is confusing and not very useful
There are two ways to improve this:
put a check constraint on the column to prevent false values:
fav_data_source boolean NULL check (fav_data_source IS TRUE or fav_data_source IS NULL)
use a partial index to check only rows with true values (see also here):
fav_data_source boolean NOT NULL
…
CREATE UNIQUE INDEX unique_fav_data_source ON data_source__instrument (instrument_id) WHERE (fav_data_source);
It allows instruments without a favourite data source.
I don't think it's possible to express such a "out of x, at least one y must exist" constraint in a relational database without triggers. (Assuming you still want to allow instruments without any data sources at all).
I'd also like each instrument to have 1 and only 1 favourite data source.
You might consider storing the favourite data source in the instruments table itself, as a non-nullable foreign reference column.
To avoid duplication, put only the non-favourite (additional/alternate) data source in the relation table, then provide data_source__instrument as a view:
CREATE VIEW data_source__instrument(instrument_id, data_source_id, is_favourite) AS
SELECT id, fav_data_source_id, true FROM instrument
UNION ALL
SELECT instrument_id, data_source_id, false FROM alternate_data_source__instrument;
The downside of this approach is that you cannot create a foreign key reference to the view (should you need to do so), and that updating the data sources gets quite complicated. Also this only works for M:N relations, not 1:N ones, where you would need a way to assert that the favourite source of an instrument actually belongs to that instrument (technically possible with an extra foreign key, but really ugly) and where you would introduce circular references which are just a mess.

Related

Composite FK referencing atomic PK + non unique attribute

I am trying to create the following tables in Postgres 13.3:
CREATE TABLE IF NOT EXISTS accounts (
account_id Integer PRIMARY KEY NOT NULL
);
CREATE TABLE IF NOT EXISTS users (
user_id Integer PRIMARY KEY NOT NULL,
account_id Integer NOT NULL REFERENCES accounts(account_id) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS calendars (
calendar_id Integer PRIMARY KEY NOT NULL,
user_id Integer NOT NULL,
account_id Integer NOT NULL,
FOREIGN KEY (user_id, account_id) REFERENCES users(user_id, account_id) ON DELETE CASCADE
);
But I get the following error when creating the calendars table:
ERROR: there is no unique constraint matching given keys for referenced table "users"
Which does not make much sense to me since the foreign key contains the user_id which is the PK of the users table and therefore also has a uniqueness constraint. If I add an explicit uniqueness constraint on the combined user_id and account_id like so:
ALTER TABLE users ADD UNIQUE (user_id, account_id);
Then I am able to create the calendars table. This unique constraint seems unnecessary to me as user_id is already unique. Can someone please explain to me what I am missing here?
Postgres is so smart/dumb that it doesn't assume the designer to do stupid things.
The Postgres designers could have taken different strategies:
Detect the transitivity, and make the FK not only depend on users.id, but also on users.account_id -> accounts.id. This is doable but costly. It also involves adding multiple dependency-records in the catalogs for a single FK-constraint. When imposing the constraint(UPDATE or DELETE in any of the two referred tables), it could get very complex.
Detect the transitivity, and silently ignore the redundant column reference. This implies: lying to the programmer. It would also need to be represented in the catalogs.
cascading DDL operations would get more complex, too. (remember: DDL is already very hard w.r.t. concurrency/versioning)
From the execution/performance point of view: imposing the constraints currently involves "pseudo triggers" on the referred table's indexes. (except DEFERRED, which has to be handled specially)
So, IMHO the Postgres developers made the sane choice of refusing to do stupid complex things.

What is the best way to work with an optional FK constraint in Postgres? [duplicate]

This question already has answers here:
How can you represent inheritance in a database?
(7 answers)
Closed 3 years ago.
I have a table in Postgres which contains Things. Each of these Things can be of 1 of 3 types:
is a self contained Thing
is an instance of a SuperThingA
is an instance of a SuperThingB.
Essentially, in object terms, there's an AbstractThing, extended in
different ways by SuperThingA or SuperThingB and all the records on
the Thing table are either unextended or extended in 1 of the 2
ways.
In order to represent this on the Thing table, I have a number of fields which are common to Things of all types, and I have 2 optional FK reference columns into the SuperThingA and SuperThingB tables.
Ideally, I would like a "FK IF NOT NULL" constraint on each the two, but this doesn't appear to be possible; as far as I can see, all I can do is make both fields nullable and rely on the using code to maintain the FK relationships, which is pretty sucky. This seems to be doable in other databases, for example SQL Server, as per SQL Server 2005: Nullable Foreign Key Constraint, but not any way that I've found so far in PG
How else can I handle this - an insert/update trigger which checks the values when either of those fields is not null and checks the value is present on whichever parent table? That's maybe doable on a small parent table with limited inserts on the Thing table (which, in fairness, is largely the case here - no more than a couple of hundred records in each of the parent tables, and small numbers of inserts on the Thing table), but in a more general case, would be a performance black hole on the inserts if one or both parent table were large
This is not currently enforced with a FK relationship. I've reviewed the PG docs, and it seem pretty definitive that I can't have an optional FK relatioship (which is understandable). It leaves me with a table definition something like this:
CREATE TABLE IF NOT EXISTS Thing(
Thing_id int4 NOT NULL,
Thing_description varchar(40),
Thing_SuperA_FK int4,
Thing_SuperB_FK char(10),
CONSTRAINT ThingPK PRIMARY KEY (Thing_id)
)
;
Every foreign key on a nullable column is only enforced when the value is non-null. This is default behavior.
create table a (
id int4 not null primary key
);
create table b (
id int4 not null primary key,
a_id int4 references a(id)
);
In the example table b has an optional reference to table a.
insert into a values(1); -- entry in table a
insert into b values (1, 1); -- entry in b with reference to a
insert into b values (2, null); -- entry in b with no reference to a
Depending on your use case it also might make sense to reverse your table structure. Instead of having the common table pointing to two more specialized tables you can have it the other way around. This way you avoid the non-null columns entirely.
create table common(
id int4 primary key
-- all the common fields
);
create table special1(
common_id int4 not null references common(id)
-- all the special fields of type special1
);
create table special2(
common_id int4 not null references common(id)
-- all the special fields of type special2
);
You need your SuperN_FK fields defined as nullable foreign keys, then you'll need check constraint(s) on the table to enforce the optional NULLability requirements.
CREATE TABLE Things
( ID int primary key
, col1 varchar(1)
, col2 varchar(1)
, SuperA_FK int constraint fk_SuperA references Things(ID)
, cola1 varchar(1)
, constraint is_SuperA check ((SuperA_FK is null and cola1 is null) or
(SuperA_FK is not null and cola1 is not null))
, SuperB_FK int constraint fk_SuperB references Things(ID)
, colb1 varchar(1)
, constraint is_SuberB check ((SuperB_FK is null and colb1 is null) or
(SuperB_FK is not null))
, constraint Super_Constraint check (
case when SuperA_FK is not null then 1 else 0 end +
case when SuperB_FK is not null then 1 else 0 end
<= 1 )
);
In the above example I've split the check constraints up for ease maintenance. The two is_SuperN constraints enforce the NULL requirements on the FK and it's related detail columns, either all NULLs or the FK is not null and some or all of the detail columns are not null. The final Super_Constraint ensures that at most one SuperN_FK is not null.

Composite PRIMARY KEY enforces NOT NULL constraints on involved columns

This is one strange, unwanted behavior I encountered in Postgres:
When I create a Postgres table with composite primary keys, it enforces NOT NULL constraint on each column of the composite combination.
For example,
CREATE TABLE distributors (m_id integer, x_id integer, PRIMARY KEY(m_id, x_id));
enforces NOT NULL constraint on columns m_id and x_id, which I don't want!
MySQL doesn't do this. I think Oracle doesn't do it as well.
I understand that PRIMARY KEY enforces UNIQUE and NOT NULL automatically but that makes sense for single-column primary key. In a multi-column primary key table, the uniqueness is determined by the combination.
Is there any simple way of avoiding this behavior of Postgres? When I execute this:
CREATE TABLE distributors (m_id integer, x_id integer);
I do not get any NOT NULL constraints of course. But I would not have a primary key either.
If you need to allow NULL values, use a UNIQUE constraint (or index) instead of a PRIMARY KEY (and add a surrogate PK column - I suggest a serial or IDENTITY column in Postgres 10 or later).
Auto increment table column
A UNIQUE constraint allows columns to be NULL:
CREATE TABLE distributor (
distributor_id GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, m_id integer
, x_id integer
, UNIQUE(m_id, x_id) -- !
-- , CONSTRAINT distributor_my_name_uni UNIQUE (m_id, x_id) -- verbose form
);
The manual:
For the purpose of a unique constraint, null values are not considered equal, unless NULLS NOT DISTINCT is specified.
In your case, you could enter something like (1, NULL) for (m_id, x_id) any number of times without violating the constraint. Postgres never considers two NULL values equal - as per definition in the SQL standard.
If you need to treat NULL values as equal (i.e. "not distinct") to disallow such "duplicates", I see two three (since Postgres 15) options:
0. NULLS NOT DISTINCT
This option was added with Postgres 15 and allows to treat NULL values as "not distinct", so two of them conflict in a unique constraint or index. This is the most convenient option, going forward. The manual:
That means even in the presence of a unique constraint it is possible
to store duplicate rows that contain a null value in at least one of
the constrained columns. This behavior can be changed by adding the
clause NULLS NOT DISTINCT ...
Detailed instructions:
Create unique constraint with null columns
1. Two partial indexes
In addition to the UNIQUE constraint above:
CREATE UNIQUE INDEX dist_m_uni_idx ON distributor (m_id) WHERE x_id IS NULL;
CREATE UNIQUE INDEX dist_x_uni_idx ON distributor (x_id) WHERE m_id IS NULL;
But this gets out of hands quickly with more than two columns that can be NULL. See:
Create unique constraint with null columns
2. A multi-column UNIQUE index on expressions
Instead of the UNIQUE constraint. We need a free default value that is never present in involved columns, like -1. Add CHECK constraints to disallow it:
CREATE TABLE distributor (
distributor serial PRIMARY KEY
, m_id integer
, x_id integer
, CHECK (m_id &lt> -1)
, CHECK (x_id &lt> -1)
);
CREATE UNIQUE INDEX distributor_uni_idx
ON distributor (COALESCE(m_id, -1), COALESCE(x_id, -1));
When you want a polymorphic relation
Your table uses column names that indicate that they are probably references to other tables:
CREATE TABLE distributors (m_id integer, x_id integer);
So I think you probably are trying to model a polymorphic relation to other tables – where a record in your table distributors can refer to one m record xor one x record.
Polymorphic relations are difficult in SQL. The best resource I have seen about this topic is "Modeling Polymorphic Associations in a Relational Database". There, four alternative options are presented, and the recommendation for most cases is called "Exclusive Belongs To", which in your case would lead to a table like this:
CREATE TABLE distributors (
id serial PRIMARY KEY,
m_id integer REFERENCES m,
x_id integer REFERENCES x,
CHECK (
((m_id IS NOT NULL)::integer + (x_id IS NOT NULL)::integer) = 1
)
);
CREATE UNIQUE INDEX ON distributors (m_id) WHERE m_id IS NOT NULL;
CREATE UNIQUE INDEX ON distributors (x_id) WHERE x_id IS NOT NULL;
Like other solutions, this uses a surrogate primary key column because primary keys are enforced to not contain NULL values in the SQL standard.
This solution adds a 4th option to the three in #Erwin Brandstetter's answer for how to avoid the case where "you could enter something like (1, NULL) for (m_id, x_id) any number of times without violating the constraint." Here, that case is excluded by a combination of two measures:
Partial unique indexes on each column individually: two records (1, NULL) and (1, NULL) would not violate the constraint on the second column as NULLs are considered distinct, but they would violate the constraint on the first column (two records with value 1).
Check constraint: The missing piece is preventing multiple (NULL, NULL) records, still allowed because NULLs are considered distinct, and anyway because our partial indexes do not cover them to save space and write events. This is achieved by the CHECK constraint, which prevents any (NULL, NULL) records by making sure that exactly one column is NULL.
There's one difference though: all alternatives in #Erwin Brandstetter's answer allow at least one record (NULL, NULL) and any number of records with no NULL value in any column (like (1, 2)). When modeling a polymorphic relation, you want to disallow such records. That is achieved by the check constraint in the solution above.

How do I create a check to make sure a value exists in another table?

Right now I have two tables, one that contains a compound primary key and another that that references one of the values of the primary key but is a one-to-many relationship between Product and Mapping. The following is an idea of the setup:
CREATE TABLE dev."Product"
(
"Id" serial NOT NULL,
"ShortCode" character(6),
CONSTRAINT "ProductPK" PRIMARY KEY ("Id")
)
CREATE TABLE dev."Mapping"
(
"LookupId" integer NOT NULL,
"ShortCode" character(6) NOT NULL,
CONSTRAINT "MappingPK" PRIMARY KEY ("LookupId", "ShortCode")
)
Since the ShortCode is displayed to the user as a six character string I don't want to have a another table to have a proper foreign key reference but trying to create one with the current design is not allowed by PostgreSQL. As such, how can I create a check so that the short code in the Mapping table is checked to make sure it exists?
Depending on the fine print of your requirements and your version of Postgres I would suggest a TRIGGER or a NOT VALID CHECK constraint.
We have just discussed the matter in depth in this related question on dba.SE:
Disable all constraints and table checks while restoring a dump
If I understand you correctly, you need a UNIQUE constraint on "Product"."ShortCode". Surely it should be declared NOT NULL, too.
CREATE TABLE dev."Product"
(
"Id" serial NOT NULL,
"ShortCode" character(6) NOT NULL UNIQUE,
CONSTRAINT "ProductPK" PRIMARY KEY ("Id")
);
CREATE TABLE dev."Mapping"
(
"LookupId" integer NOT NULL,
"ShortCode" character(6) NOT NULL REFERENCES dev."Product" ("ShortCode"),
CONSTRAINT "MappingPK" PRIMARY KEY ("LookupId", "ShortCode")
);
Your original "Product" table will allow this INSERT statement to succeed, but it shouldn't.
insert into dev."Product" ("ShortCode") values
(NULL), (NULL), ('ABC'), ('ABC'), ('ABC');
Data like that is just about useless.
select * from dev."Product"
id ShortCode
--
1
2
3 ABC
4 ABC
5 ABC

Shared Primary key versus Foreign Key

I have a laboratory analysis database and I'm working on the bast data layout. I've seen some suggestions based on similar requirements for using a "Shared Primary Key", but I don't see the advantages over just foreign keys. I'm using PostgreSQL:tables listed below
Sample
___________
sample_id (PK)
sample_type (where in the process the sample came from)
sample_timestamp (when was the sample taken)
Analysis
___________
analysis_id (PK)
sample_id (FK references sample)
method (what analytical method was performed)
analysis_timestamp (when did the analysis take place)
analysis_notes
gc
____________
analysis_id (shared Primary key)
gc_concentration_meoh (methanol concentration)
gc_concentration_benzene (benzene concentration)
spectrophotometer
_____________
analysis_id
spectro_nm (wavelength used)
spectro_abs (absorbance measured)
I could use this design, or I could move the fields from the analysis table into both the gc and spectrophotometer tables, and just use foreign keys between sample, gc, and spectrophotometer tables. The only advantage I see of this design is in cases where I would just want information on how many or what types of analyses were performed, without having to join in the actual results. However, the additional rules to ensure referential integrity between the shared primary keys, and managing extra joins and triggers (on delete cascade, etc) appears to make it more of a headache than the minor advantages. I'm not a DBA, but a scientist, so please let me know what I'm missing.
UPDATE:
A shared primary key (as I understand it) is like a one-to-one foreign key with the additional constraint that each value in the parent tables(analysis) must appear in one of the child tables once, and no more than once.
I've seen some suggestions based on similar requirements for using a
"Shared Primary Key", but I don't see the advantages over just foreign
keys.
If I've understood your comments above, the advantage is that only the first implements the requirement that each row in the parent match a row in one child, and only in one child. Here's one way to do that.
create table analytical_methods (
method_id integer primary key,
method_name varchar(25) not null unique
);
insert into analytical_methods values
(1, 'gc'),(2, 'spec'), (3, 'Atomic Absorption'), (4, 'pH probe');
create table analysis (
analysis_id integer primary key,
sample_id integer not null, --references samples, not shown
method_id integer not null references analytical_methods (method_id),
analysis_timestamp timestamp not null,
analysis_notes varchar(255),
-- This unique constraint lets the pair of columns be the target of
-- foreign key constraints from other tables.
unique (analysis_id, method_id)
);
-- The combination of a) the default value and the check() constraint on
-- method_id, and b) the foreign key constraint on the paired columns
-- analysis_id and method_id guarantee that rows in this table match a
-- gc row in the analysis table.
--
-- If all the child tables have similar constraints, a row in analysis
-- can match a row in one and only one child table.
create table gc (
analysis_id integer primary key,
method_id integer not null
default 1
check (method_id = 1),
foreign key (analysis_id, method_id)
references analysis (analysis_id, method_id),
gc_concentration_meoh integer not null,
gc_concentration_benzene integer not null
);
It looks like in my case this supertype/subtype model in not the best choice. Instead, I should move the fields from the analysis table into all the child tables, and make a series of simple foreign key relationships. The advantage of the supertype/subtype model is when using the primary key of the supertype as a foreign key in another table. Since I am not doing this, the extra layer of complexity will not add anything.