I have a schema like this:
create table thing
(
section uuid not null,
thing_id uuid not null,
alias_of uuid default null,
constraint alias_ref foreign key (section, alias_of)
references thing (section, thing_id),
primary key (section, thing_id)
) partition by list (section);
create table dependent_thing
(
section uuid not null,
dependent_thing_id uuid not null,
depends_on_thing uuid not null,
alias_of uuid default null,
constraint thing_ref foreign key (section, depends_on_thing)
references thing (section, thing_id),
constraint alias_ref foreign key (section, alias_of)
references dependent_thing (section, dependent_thing_id),
primary key (section, dependent_thing_id)
) partition by list (section);
So there are dependent_things which may reference things, and both things and dependent_things may reference themselves.
These tables are partitioned and as can be seen from the foreign key definitions they only ever reference things within their same partition.
However if I `truncate _thing_partitition_63df23f7b60d3345824bfe726cade143 cascade' it will take down all data from all sections from all partitions.
I know that if I were to remove foreign keys, detach the partitions, truncate their contents – I'd be fine. However I am wondering whether there is a smarter way to do this? Theoretically postgres should be able to prove that I am just pruning a partition. If I were to truncate _thing_partition_123 _dependent_thing_partition_123 it should remove all data for section 123, not for any other section; as the foreign keys are limited to the sections by which we partition.
If you have a foreign key pointing to a partitioned table, you cannot truncate, drop or detach a partition. But you can do all of these things if you have foreign key constraints between the individual partitions...
Related
I'm trying to run create_distributed_table for tables which i need to shard and almost all of the tables have self relation ( parent child )
but when I run SELECT create_distributed_table('table-name','id');
it throws error cannot create foreign key constraint
simple steps to reproduce
CREATE TABLE TEST (
ID TEXT NOT NULL,
NAME CHARACTER VARYING(255) NOT NULL,
PARENT_ID TEXT
);
ALTER TABLE TEST ADD CONSTRAINT TEST_PK PRIMARY KEY (ID);
ALTER TABLE TEST ADD CONSTRAINT TEST_PARENT_FK FOREIGN KEY (PARENT_ID) REFERENCES TEST (ID);
ERROR
citus=> SELECT create_distributed_table('test','id');
ERROR: cannot create foreign key constraint
DETAIL: Foreign keys are supported in two cases, either in between two colocated tables including partition column in the same ordinal in the both tables or from distributed to reference tables
For the time being, it is not possible to shard a table on PostgreSQL without dropping the self referencing foreign key constraints, or altering them to include a separate and new distribution column.
Citus places records into shards based on the hash values of the distribution column values. It is most likely the case that the hashes of parent and child id values are different and hence the records should be stored in different shards, and possibly on different worker nodes. PostgreSQL does not have a mechanism to create foreign key constraints that reference records on different PostgreSQL clusters.
Consider adding a new column tenant_id and adding this column to the primary key and foreign key constraints.
CREATE TABLE TEST (
tenant_id INT NOT NULL,
id TEXT NOT NULL,
name CHARACTER VARYING(255) NOT NULL,
parent_id TEXT NOT NULL,
FOREIGN KEY (tenant_id, parent_id) REFERENCES test(tenant_id, id),
PRIMARY KEY (tenant_id, id)
);
SELECT create_distributed_table('test','tenant_id');
Note that parent and child should always be in the same tenant for this to work.
I've got an existing table of dogs which I would like to partition by list using the colour column:
CREATE TABLE dogs (
id int PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
colour text,
name text
)
;
Because it's not possible to partition an existing table, I'm going to make a new empty partitioned table then copy the data across.
CREATE TABLE
trade_capture._customer_invoice
(
LIKE
trade_capture.customer_invoice
INCLUDING ALL
)
PARTITION BY LIST (bill_month)
;
This reports:
ERROR: insufficient columns in PRIMARY KEY constraint definition
DETAIL: PRIMARY KEY constraint on table "_dogs" lacks column "colour" which is part of the partition key.
I know I can ignore the primary key like this, but it seems bad to have a table with no primary key!
CREATE TABLE
_dogs
(
LIKE
dogs
INCLUDING ALL
EXCLUDING INDEXES
)
PARTITION BY LIST (colour)
;
What's the best way to proceed so I have a partitioned table which still has a primary key?
Add the partitioning key to the primary key:
ALTER TABLE trade_capture._customer_invoice
ADD PRIMARY KEY (id, bill_month);
There is no other option to have a unique constraint on a partitioned table.
I need to scale out our application DB due to the amount of data. It's on PostgreSQL 9.3. So, I've found PostgreSQL-XL and it looks awesome, but I'm having a hard time trying to wrap my head around the limitations for distributed tables. To distribute them by replication (where the whole table is replicated in every datanode) is quite OK, but let's say I have two big related tables that need to be "sharded" along the datanodes:
CREATE TABLE foos
(
id bigserial NOT NULL,
project_id integer NOT NULL,
template_id integer NOT NULL,
batch_id integer,
dataset_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT pk_foos PRIMARY KEY (id),
CONSTRAINT fk_foos_batch_id FOREIGN KEY (batch_id)
REFERENCES batches (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_dataset_id FOREIGN KEY (dataset_id)
REFERENCES datasets (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_project_id FOREIGN KEY (project_id)
REFERENCES projects (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_template_id FOREIGN KEY (template_id)
REFERENCES templates (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT uc_foos UNIQUE (project_id, name)
);
CREATE TABLE foo_childs
(
id bigserial NOT NULL,
foo_id bigint NOT NULL,
template_id integer NOT NULL,
batch_id integer,
ffdata hstore,
CONSTRAINT pk_ff_foos PRIMARY KEY (id),
CONSTRAINT fk_fffoos_batch_id FOREIGN KEY (batch_id)
REFERENCES batches (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_fffoos_foo_id FOREIGN KEY (foo_id)
REFERENCES foos (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_fffoos_template_id FOREIGN KEY (template_id)
REFERENCES templates (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
);
Now Postgres-XL documentation states that:
"(...) in distributed tables, UNIQUE constraints must include the
distribution column of the table"
"(...) the distribution column must be included in PRIMARY KEY"
"(...) column with REFERENCES (FK) should be the distribution column.
(...) PRIMARY KEY must be the distribution column as well."
Their examples are over simplistic and scarse, so can someone please DDL me the two above tables for postgres-XL using DISTRIBUTE BY HASH()?
Or maybe suggest other ways to scale out?
CREATE TABLE foos
( ... ) DISTRIBUTE BY HASH(id);
CREATE TABLE foos_child
( ... ) DISTRIBUTE BY HASH(foo_id);
Now any join on foos.id = foos_child.foo_id can be pushed down and done locally.
I have a table where part of the primary key is a foreign key to another table.
create table player_result (
event_id integer not null,
pub_time timestamp not null,
name_key varchar(128) not null,
email_address varchar(128),
withdrawn boolean not null,
place integer,
realized_values hstore,
primary key (event_id, pub_time, name_key),
foreign key (email_address) references email(address),
foreign key (event_id, pub_time) references event_publish(event_id, pub_time));
Will the index generated for the primary key suffice to back the foreign key on event_id and pub_time?
Yes.
Index A,B,C
is good for:
A
A,B
A,B,C (and any other combination of the full 3 fields, if default order is unimportant)
but not good for other combinations (such as B,C, C,A etc.).
It will be useful for the referencing side, such that a DELETE or UPDATE on the referenced table can use the PRIMARY KEY of the referencing side as an index when performing checks for the existence of referencing rows or running cascade update/deletes. PostgreSQL doesn't require this index to exist at all, it just makes foreign key constraint checks faster if it is there.
It is not sufficient to serve as the unique constraint for a reference to those columns. You couldn't create a FOREIGN KEY that REFERENCES player_result(event_id, pub_time) because there is no unique constraint on those columns. That pair can appear multiple times in the table so long as each pair has a different name_key.
As #xagyg accurately notes, the unique b-tree index created by the foreign key reference is also only useful for references to columns from the left of the index. It could not be used for a lookup of pub_time, name_key or just name_key, for example.
I have made classical Employee - Employer table that has one primary key and one foreign key. Foreign key references primary so it is a self referencing table:
CREATE TABLE Worker
(
OIB NUMERIC(2,0),
Name NVARCHAR(10),
Surname NVARCHAR(20),
DateOfEmployment DATETIME2 NOT NULL,
Adress NVARCHAR(20),
City NVARCHAR(10),
SUPERIOR NUMERIC(2,0) UNIQUE,
Constraint PK_Worker PRIMARY KEY(OIB),
CONSTRAINT FK_Worker FOREIGN KEY (Superior) REFERENCES Worker(OIB)
);
Second table that should keep points for all my employees is made like this:
CREATE TABLE Point
(
OIB_Worker NUMERIC(2,0) NOT NULL,
OIB_Superior NUMERIC(2,0) NOT NULL,
Pt_To_Worker tinyint,
Pt_To_Superior tinyint,
Month_ INT NOT NULL,
Year_ INT NOT NULL,
CONSTRAINT FK_Point_Worker FOREIGN KEY (OIB_Worker) References Worker(OIB),
CONSTRAINT FK_Point_Worker_2 FOREIGN KEY (OIB_Superior) References Worker(Superior),
CONSTRAINT PK_Point PRIMARY KEY(OIB_Worker,OIB_Superior,Month_,Year_)
);
It should enable storing grades per month for every employee.
That is, I have two foreign keys, one for Worker.OIB and one for Worker.Superior. Also, I have a composite primary key made of columns Point.OIB_Worker, Point.OIB_superior, Point.Month_ and Point.Year_. Key is composite because it needs to disable entering grades more then once a month.
My question is:
How to make a foreign key from Point to Worker so that any superior can have more then one employee assigned to him?
If you look closely, my implementation works but it can have only one employee per manager.
That is because of a fact that a foreign key has to reference either the primary or the unique column from other table. And my Worker.Superior is UNIQUE, so it can have only unique values (no repetition).
I think many people will find this example interesting as it is a common problem when making a new database.
I think your FK_Point_Worker_2 should also have References Worker(OIB), and you should remove the UNIQUE constraint from Worker.Superior. That way a superior can have more than one worker assigned to him.
Think about it. You have unique constraint on SUPERIOR and you are confused as to why two employees cannot have the same SUPERIOR. That is what a unique constraint does - not allow duplicates.
A FK can only reference a unique column or columns.
A FK_Point_Worker_2 with a References Worker(OIB) does not assure OIB is a Superior.
I would add a unique constraint on Worker on (OIB, SUPERIOR)
and remove the unique constraint on SUPERIOR.
It will always be unique as OIB is unique.
Then have composite FK relationship.
This is an example of a composite FK relationship
ALTER TABLE [dbo].[wfBchFolder] WITH CHECK ADD CONSTRAINT [FK_wfBchFolder_wfBch] FOREIGN KEY([wfID], [bchID])
REFERENCES [dbo].[WFbch] ([wfID], [ID])
GO