Composite FK referencing atomic PK + non unique attribute - postgresql

I am trying to create the following tables in Postgres 13.3:
CREATE TABLE IF NOT EXISTS accounts (
account_id Integer PRIMARY KEY NOT NULL
);
CREATE TABLE IF NOT EXISTS users (
user_id Integer PRIMARY KEY NOT NULL,
account_id Integer NOT NULL REFERENCES accounts(account_id) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS calendars (
calendar_id Integer PRIMARY KEY NOT NULL,
user_id Integer NOT NULL,
account_id Integer NOT NULL,
FOREIGN KEY (user_id, account_id) REFERENCES users(user_id, account_id) ON DELETE CASCADE
);
But I get the following error when creating the calendars table:
ERROR: there is no unique constraint matching given keys for referenced table "users"
Which does not make much sense to me since the foreign key contains the user_id which is the PK of the users table and therefore also has a uniqueness constraint. If I add an explicit uniqueness constraint on the combined user_id and account_id like so:
ALTER TABLE users ADD UNIQUE (user_id, account_id);
Then I am able to create the calendars table. This unique constraint seems unnecessary to me as user_id is already unique. Can someone please explain to me what I am missing here?

Postgres is so smart/dumb that it doesn't assume the designer to do stupid things.
The Postgres designers could have taken different strategies:
Detect the transitivity, and make the FK not only depend on users.id, but also on users.account_id -> accounts.id. This is doable but costly. It also involves adding multiple dependency-records in the catalogs for a single FK-constraint. When imposing the constraint(UPDATE or DELETE in any of the two referred tables), it could get very complex.
Detect the transitivity, and silently ignore the redundant column reference. This implies: lying to the programmer. It would also need to be represented in the catalogs.
cascading DDL operations would get more complex, too. (remember: DDL is already very hard w.r.t. concurrency/versioning)
From the execution/performance point of view: imposing the constraints currently involves "pseudo triggers" on the referred table's indexes. (except DEFERRED, which has to be handled specially)
So, IMHO the Postgres developers made the sane choice of refusing to do stupid complex things.

Related

PostgreSQL declarative partition - unique constraint on partitioned table must include all partitioning columns [duplicate]

This question already has an answer here:
ERROR: unique constraint on partitioned table must include all partitioning columns
(1 answer)
Closed last month.
I'm trying to create a partitioned table which refers to itself, creating a doubly-linked list.
CREATE TABLE test2 (
id serial NOT NULL,
category integer NOT NULL,
time timestamp(6) NOT NULL,
prev_event integer,
next_event integer
) PARTITION BY HASH (category);
Once I add primary key I get the following error.
alter table test2 add primary key (id);
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: PRIMARY KEY constraint on table "test2" lacks column "category" which is part of the partition key.
Why does the unique constrain require all partitioned columns to be included?
EDIT: Now I understand why this is needed: https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-LIMITATIONS
Once I add PK with both columns it works.
alter table test2 add primary key (id, category);
But then adding the FK to itself doesn't work.
alter table test2 add foreign key (prev_event) references test2 (id) on update cascade on delete cascade;
ERROR: there is no unique constraint matching given keys for referenced table "test2"
Since PK is not just id but id-category I can't create FK pointing to id.
Is there any way to deal with this or am I missing something?
I would like to avoid using inheritance partitioning if possible.
EDIT2: It seems this is a known problem. https://www.reddit.com/r/PostgreSQL/comments/di5mbr/postgresql_12_foreign_keys_and_partitioned_tables/f3tsoop/
Seems that there is no straightforward solution. PostgreSQL simply doesn't support this as of v14. One solution is to use triggers to enforce 'foreign key' behavior. Other is to use multi-column foreign keys. Both are far from optimal.

Dropping Unique Constraint - PostgreSQL

TL;DR
I am seeking clarity on this: does a FOREIGN KEY require a UNIQUE CONSTRAINT on the other side, specifically, in Postgres and, generally, in relational database systems?
Perhaps, I can test this, but I'll ask, if the UNIQUE CONSTRAINT is required by the FOREIGN KEY what would happen if I don't create it? Will the Database create one or will it throw an error?
How I got there
I had earlier on created a table with a column username on which I imposed a unique constraint. I then created another table with a column bearer_name having a FOREIGN KEY referencing the previous table's column username; the one which had a UNIQUE CONSTRAINT.
Now, I want to drop the UNIQUE CONSTRAINT on the username column from the database because I have later on created a UNIQUE INDEX on the same column and intuitively I feel that they serve the same purpose, or don't they? But the database is complaining that the UNIQUE INDEX has some dependent objects and so it can't be dropped unless I provide CASCADE as an option in order to drop even the dependent object. It's identifying the FOREIGN KEY on bearer_name column in the second table as the dependent object.
And is it possible for the FOREIGN KEY to be a point to the UNIQUE INDEX instead of the UNIQUE CONSTRAINT?
I am seeking clarity on this: does a FOREIGN KEY require a UNIQUE CONSTRAINT on the other side
No it does not require only UNIQUE CONSTRAINT. It could be PRIMARY KEY or UNIQUE INDEX.
Perhaps, I can test this, but I'll ask, if the UNIQUE CONSTRAINT is required by the FOREIGN KEY what would happen if I don't create it? Will the Database create one or will it throw an error?
CREATE TABLE tab_a(a_id INT, b_id INT);
CREATE TABLE tab_b(b_id INT);
ALTER TABLE tab_a ADD CONSTRAINT fk_tab_a_tab_b FOREIGN KEY (b_id)
REFERENCES tab_b(b_id);
ERROR: there is no unique constraint matching given keys
for referenced table "tab_b"
DBFiddle Demo
And is it possible for the FOREIGN KEY to be a point to the UNIQUE INDEX instead of the UNIQUE CONSTRAINT?
Yes, it is possible.
CREATE UNIQUE INDEX tab_b_i ON tab_b(b_id);
DBFiddle Demo2

PostgreSQL table inheritance and constraints

I have a DDL for some tables similar to the following:
CREATE TABLE devices
(
device_uuid uuid NOT NULL,
manufacturer_uuid NOT NULL,
...
CONSTRAINT device_manufacturer_uuid_fkey FOREIGN_KEY (manufacturer_uuid)
REFERENCES manufacturer (manufacturer_uuid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT devices_device_uuid_key UNIQUE (device_uuid),
CONSTRAINT devices_pkey PRIMARY KEY (device_uuid)
);
I might have different types of devices, e.g. "peripherals", "power", "graphics", etc.
There are two approaches I can take:
Add a simple type column to this table
Pros: I have a pretty simple database structure.
Cons: There could potentially be hundreds of thousands of entries, which could lead to performance problems. The list of "devices" would be searched fairly regularly, and searching through all those entries to find all those of type "peripherals" every time might not be great.
Inherit from the above table for every type.
The DDL for this would look similar to the following (I believe):
CREATE TABLE devicesperipherals
(
type device NOT NULL,
CONSTRAINT devicesperipherals_manufacturer_uuid_fkey FOREIGN_KEY (manufacturer_uuid)
REFERENCES manufacturer (manufacturer_uuid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT devicesperipherals_device_uuid_key UNIQUE (device_uuid)
CONSTRAINT devices_manufacturer_uuid_fkey FOREIGN_KEY (manufacturer_uuid)
REFERENCES manufacturer (manufacturer_uuid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
)
INHERITS (devices)
WITH (
OIDS=FALSE
);
Pros: From what I know about table inheritance, this approach should have better performance.
Cons: More complex database structure, more space required for indexing.
Which is the preferred approach in such a case? Why?

Is this a good idea to store relations to many different tables in one field? [duplicate]

I have a database which has three tables
Messages - PK = MessageId
Drafts - PK = DraftId
History - FK = RelatedItemId
The History table has a single foreign Key [RelatedItemId] which maps to one of the two Primary keys in Messages and Drafts.
Is there a name for this relationship?
Is it just bad design?
Is there a better way to design this relationship?
Here are the CREATE TABLE statements for this question:
CREATE TABLE [dbo].[History](
[HistoryId] [uniqueidentifier] NOT NULL,
[RelatedItemId] [uniqueidentifier] NULL,
CONSTRAINT [PK_History] PRIMARY KEY CLUSTERED ( [HistoryId] ASC )
)
CREATE TABLE [dbo].[Messages](
[MessageId] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED ( [MessageId] ASC )
)
CREATE TABLE [dbo].[Drafts](
[DraftId] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_Drafts] PRIMARY KEY CLUSTERED ( [DraftId] ASC )
)
In a short description the solution you have used is called:
Polymorphic Association
Objective: Reference Multiple Parents
Resulting anti-pattern: Use dual-purpose foreign key, violating first normal form (atomic issue), loosing referential integrity
Solution: Simplify the Relationship
More information about the problem.
BTW createing a common super-table will help you:
Is there a name for this relationship?
There is no standard name that I'm aware of, but I've heard people using the term "generic FKs" or even "inner-platform effect".
Is it just bad design?
Yes.
The reason: it prevents you from declaring a FOREIGN KEY, and therefore prevents the DBMS from enforcing referential integrity directly. Therefore you must enforce it trough imperative code, which is surprisingly difficult.
Is there a better way to design this relationship?
Yes.
Create separate FOREIGN KEY for each referenced table. Make them NULL-able, but make sure exactly one of them is non-NULL, through a CHECK constraint.
Alternatively, take a look at inheritance.
Best practice I have found is to create a Function that returns whether the passed in value exists in either of your Messages and Drafts PK columns. You can then add a constraint on the column on the History that calls this function and will only insert if it passes (i.e. it exists).
Adding non-parsed example Code:
CREATE FUNCTION is_related_there (
IN #value uniqueidentifier )
RETURNS TINYINT
BEGIN
IF (select count(DraftId) from Drafts where DraftId = #value + select count(MessageId) from Messages where MessageId = #value) > 0 THEN
RETURN 1;
ELSE
RETURN 0;
END IF;
END;
ALTER TABLE History ADD CONSTRAINT
CK_HistoryExists CHECK (is_related_there (RelatedItemId) = 1)
Hope that runs and helps lol

Do I need a primary key for my table, which has a UNIQUE (composite 4-columns), one of which can be NULL?

I have the following table (PostgreSQL 8.3) which stores prices of some products. The prices are synchronised with another database, basically most of the fields below (apart from one) are not updated by our client - but instead dropped and refreshed every once-in-a-while to sync with another stock database:
CREATE TABLE product_pricebands (
template_sku varchar(20) NOT NULL,
colourid integer REFERENCES colour (colourid) ON DELETE CASCADE,
currencyid integer NOT NULL REFERENCES currency (currencyid) ON DELETE CASCADE,
siteid integer NOT NULL REFERENCES site (siteid) ON DELETE CASCADE,
master_price numeric(10,2),
my_custom_field boolean,
UNIQUE (template_sku, siteid, currencyid, colourid)
);
On the synchronisation, I basically DELETE most of the data above except for data WHERE my_custom_field is TRUE (if it's TRUE, it means the client updated this field via their CMS and therefore this record should not be dropped). I then INSERT 100s to 1000s of rows into the table, and UPDATE where the INSERT fails (i.e. where the combination of (template_sku, siteid, currencyid, colourid) already exists).
My question is - what best practice should be applied here to create a primary key? Is a primary key even needed? I wanted to make the primary key = (template_sku, siteid, currencyid, colourid) - but the colourid field can be NULL, and using it in a composite primary key is not possible.
From what I read on other forum posts, I think I have done the above correctly, and just need to clarify:
1) Should I use a "serial" primary key just in case I ever need one? At the moment I don't, and don't think I ever will, because the important data in the table is the price and my custom field, only identified by the (template_sku, siteid, currencyid, colourid) combination.
2) Since (template_sku, siteid, currencyid, colourid) is the combination that I will use to query a product's price, should I add any further indexing to my columns, such as the "template_sku" which is a varchar? Or is the UNIQUE constraint a good index already for my SELECTs?
Should I use a "serial" primary key just in case I ever need one?
You can easily add a serial column later if you need one:
ALTER TABLE product_pricebands ADD COLUMN id serial;
The column will be filled with unique values automatically. You can even make it the primary key in the same statement (if no primary key is defined, yet):
ALTER TABLE product_pricebands ADD COLUMN id serial PRIMARY KEY;
If you reference the table from other tables I would advise to use such a surrogate primary key, because it is rather unwieldy to link by four columns. It is also slower in SELECTs with JOINs.
Either way, you should define a primary key. The UNIQUE index including a nullable column is not a full replacement. It allows duplicates for combinations including a NULL value, because two NULL values are never considered the same. This can lead to trouble.
As
the colourid field can be NULL
you might want to create two unique indexes. The combination (template_sku, siteid, currencyid, colourid) cannot be a PRIMARY KEY, because of the nullable colourid, but you can create a UNIQUE constraint like you already have (implementing an index automatically):
ALTER TABLE product_pricebands ADD CONSTRAINT product_pricebands_uni_idx
UNIQUE (template_sku, siteid, currencyid, colourid)
This index perfectly covers the queries you mention in 2).
Create a partial unique index in addition if you want to avoid "duplicates" with (colourid IS NULL):
CREATE UNIQUE INDEX product_pricebands_uni_null_idx
ON product_pricebands (template_sku, siteid, currencyid)
WHERE colourid IS NULL;
To cover all bases. I wrote more about that technique in a related answer on dba.SE.
The simple alternative to the above is to make colourid NOT NULL and create a primary key instead of the above product_pricebands_uni_idx.
Also, as you
basically DELETE most of the data
for your refill operation, it will be faster to drop indexes, that are not needed during the refill operation, and recreate those afterwards. It is faster by an order of magnitude to build an index from scratch than to add all rows incrementally.
How do you know, which indexes are used (needed)?
Test your queries with EXPLAIN ANALYZE.
Or use the built-in statistics. pgAdmin displays statistics in a separate tab for the selected object.
It may also be faster to select the few rows with my_custom_field = TRUE into a temporary table, TRUNCATE the base table and re-INSERT the survivors. Depends on whether you have foreign keys defined. Would look like this:
CREATE TEMP TABLE pr_tmp AS
SELECT * FROM product_pricebands WHERE my_custom_field;
TRUNCATE product_pricebands;
INSERT INTO product_pricebands SELECT * FROM pr_tmp;
This avoids a lot of vacuuming.