Postgresql sharding with citus extension not working - postgresql

I am using Postgresql with citus extension for sharding and unable to shard tables like below.
Below table has a primary key and 2 unique keys. I am trying to shard against column with primary key i.e pid.
Note: I am not allowed to change the table structure. These tables are created by tool.
CREATE TABLE person
(
pid bigint NOT NULL,
name character varying(100),
address_pid bigint NOT NULL,
address_type character varying(100),
CONSTRAINT id_pkey PRIMARY KEY (pid),
CONSTRAINT addr_id UNIQUE (address_pid),
CONSTRAINT addr_type_id UNIQUE (address_type, address_pid)
);
This my sharding query:
select create_distributed_table('person', 'pid');
Error it throw is:
Error: Distributed relations cannot have UNIQUE, EXCLUDE, or PRIMARY KEY constraints that do not include the partition column
Can anyone help me with sharding these kind of tables?
#CraigKerstiens Addition to this question:
How to handle sharding when we have multiple foreign keys like this one.
CREATE TABLE table
(
pid bigint NOT NULL,
search_order integer NOT NULL,
resource_pid bigint NOT NULL,
search_pid bigint NOT NULL,
CONSTRAINT hfj_search_result_pkey PRIMARY KEY (pid),
CONSTRAINT idx_searchres_order UNIQUE (search_pid, search_order),
CONSTRAINT fk_searchres_res FOREIGN KEY (resource_pid)
REFERENCES public.table1 (res_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT fk_searchres_search FOREIGN KEY (search_pid)
REFERENCES public.table2 (pid) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
Assuming that table1 and table2 are already sharded.

Within Citus at this time you cannot have a unique constraint that doesn't include they column you are partitioning on. In this case, it'd be possible to enforce addresses were unique to the person id, but not globally unique. To do that you could:
CREATE TABLE person
(
pid bigint NOT NULL,
name character varying(100),
address_pid bigint NOT NULL,
address_type character varying(100),
CONSTRAINT id_pkey PRIMARY KEY (pid),
CONSTRAINT addr_id UNIQUE (pid, address_pid),
CONSTRAINT addr_type_id UNIQUE (pid, address_type, address_pid)
);

Related

Trying to make a UNIQUE KEY become a FOREIGN KEY, but it says error

I created this table. I want the user_id to be inserted without someone having to type it. Can someone help me? When I tried to create it it shows this error:
Why is the fk_id being compared to cpf?
CREATE TABLE code.went_to
(
user_id INTEGER,
cpf VARCHAR(11) NOT NULL,
cep VARCHAR(8) NOT NULL,
date DATE,
contact_id INTEGER,
CONSTRAINT fk_id
FOREIGN KEY (user_id) REFERENCES code.user,
CONSTRAINT fk_cpf_user
FOREIGN KEY (cpf) REFERENCES code.user,
CONSTRAINT fk_cep_place
FOREIGN KEY (cep) REFERENCES code.place,
CONSTRAINT fk_tipo_id
FOREIGN KEY (tipo_id) REFERENCES code.contact,
EXCLUDE USING gist (cpf WITH =, cep WITH =, daterange(data, (data + interval '4 months')::date) WITH &&),
EXCLUDE USING gist (cep WITH =, contact_id WITH =, daterange(data, (data + interval '1 months')::date) WITH &&)
);
The table that user_id comes from:
CREATE TABLE code.user
(
user_id SERIAL NOT NULL,
cpf VARCHAR(11) NOT NULL,
name CHAR(75) NOT NULL,
nick_name VARCHAR(15),
CONSTRAINT pk_cpf PRIMARY KEY(cpf),
CONSTRAINT un_id UNIQUE (id),
CONSTRAINT un_nick_name UNIQUE (nick_name)
);
Sorry if its stated weird, my English is not the best. But essentially the question should be:
How could I import data (user_id) from one table (user) into another table (went_to) based on the primary key (cpf)?
If you don't specify a target column, the references clause assumes the primary key of the target table. There is no matching on the name happening.
You need to include the column of the unique key in your foreign key definition:
CONSTRAINT fk_id FOREIGN KEY (user_id) REFERENCES code."user"(user_id)
To make things less confusing, I would also do that for the FK that references the PK
CONSTRAINT fk_cpf_user FOREIGN KEY (cpf) REFERENCES code."user"(cpf),

Do I need to declare a unique constraint for `bigint generated always as identity`?

I'm creating a multi-tenant application and am prepending the tenant_id to all tables that my tenants will access. All of the tables will also have an incrementing surrogate key. Will I need to declare a unique constraint of the surrogate key or is that redundant?
CREATE TABLE tenant (
primary key (tenant_id),
tenant_id bigint generated always as identity
);
CREATE TABLE person (
primary key (tenant_id, person_id)
person_id bigint generated always as identity,
tenant_id bigint not null,
unique (person_id), -- Do I need this?
foreign key (tenant_id) references tenant
);
The primary key of a table should be a minimal set of columns that uniquely identify a table row. So that should be person_id, as it was specifically created for that purpose.
Add another (non-unique) index on tenant_id or (tenant_id, person_id) if you need to speed up searches based on tenant_id.

How to use timescale hypertables with foreign keys and keep a one-to-many relation?

I am trying to create a database with minimum redundancy in mind. We would like to use the timescaledb hypertables (I run postgreSQL v. 12 and timescaledb v. 1.7.4). The postgreSQL code to create the tables are as follows - you can see the dbdiagram here https://dbdiagram.io/d/5f992f0e3a78976d7b797ca2 or view the tables here Image of database
CREATE TABLE "datapoints" (
"id" bigserial UNIQUE NOT NULL,
"tstz" timestamptz NOT NULL,
"entity_id" bigint NOT NULL,
"value" real NOT NULL,
PRIMARY KEY ("tstz", "entity_id")
);
CREATE TABLE "datapoint_quality" (
"tstz" timestamptz NOT NULL,
"datapoint_id" bigint NOT NULL,
"flag_id" bigint NOT NULL,
PRIMARY KEY ("tstz", "datapoint_id", "flag_id")
);
CREATE TABLE "quality_flags" (
"id" bigserial PRIMARY KEY,
"value" text
);
CREATE TABLE "sensor_types" (
"id" bigserial PRIMARY KEY,
"name" text UNIQUE NOT NULL
);
CREATE TABLE "sensors" (
"tstz" timestamptz NOT NULL DEFAULT (now()),
"id" bigserial UNIQUE NOT NULL,
"name" text NOT NULL,
"parent" bigint NOT NULL,
"type" bigint NOT NULL,
PRIMARY KEY ("tstz", "id")
);
CREATE TABLE "datapoint_annotation" (
"tstz" timestamptz NOT NULL,
"datapoint_id" bigint NOT NULL,
"annotation_id" bigint NOT NULL,
PRIMARY KEY ("tstz", "datapoint_id", "annotation_id")
);
CREATE TABLE "annotations" (
"id" bigserial PRIMARY KEY NOT NULL,
"value" text NOT NULL
);
ALTER TABLE "datapoints" ADD FOREIGN KEY ("entity_id") REFERENCES "sensors" ("id");
ALTER TABLE "datapoint_quality" ADD FOREIGN KEY ("datapoint_id") REFERENCES "datapoints" ("id");
ALTER TABLE "datapoint_quality" ADD FOREIGN KEY ("flag_id") REFERENCES "quality_flags" ("id");
ALTER TABLE "sensors" ADD FOREIGN KEY ("parent") REFERENCES "sensors" ("id");
ALTER TABLE "sensors" ADD FOREIGN KEY ("type") REFERENCES "sensor_types" ("id");
ALTER TABLE "datapoint_annotation" ADD FOREIGN KEY ("datapoint_id") REFERENCES "datapoints" ("id");
ALTER TABLE "datapoint_annotation" ADD FOREIGN KEY ("annotation_id") REFERENCES "annotations" ("id");
CREATE UNIQUE INDEX ON "quality_flags" ("value");
CREATE UNIQUE INDEX ON "annotations" ("value");
So far so good - next I want to create the hypertables, which I do as:
SELECT create_hypertable('datapoint_annotation', 'tstz');
SELECT create_hypertable('datapoint_quality', 'tstz');
SELECT create_hypertable('datapoints', 'tstz');
SELECT create_hypertable('sensors', 'tstz');
This works well for the first two lines, but for the latter two I get the following error:
ERROR: cannot create a unique index without the column "tstz" (used in partitioning)
SQL state: TS103
I can include the tstz in the primary key as ("id", "tstz") and use that as foreign key, but this gives me a one-to-one relation, and for minimum redundancy I would like to have a one-to-many relation.
I am sure there should be some way to do this - so what am I missing?
I'll take the foreign key constraint from datapoint_quality to datapoints as an example.
To make that work with a partitioned table, you need a unique constraint on datapoint. As the error message tell you, such a constraint must contain the partitioning key. So you end up with
ALTER TABLE datapoints ADD UNIQUE (id, tstz);
To reference that unique constraint from datapoint_quality, you need to have the timestamp there too:
ALTER TABLE datapoint_quality ADD datapoints_tstz timestamp with time zone;
You have to fill it with the appropriate values:
UPDATE datapoint_quality AS dq
SET datapoints_tstz = d.tstz
FROM datapoints AS d
WHERE d.id = dq.datapoint_id;
Then set it NOT NULL:
ALTER TABLE datapoint_quality ALTER datapoints_tstz SET NOT NULL;
Now you can define your foreign key:
ALTER TABLE datapoint_quality
ADD FOREIGN KEY (datapoint_id, datapoints_tstz)
REFERENCES datapoints (id, tstz) MATCH FULL;
There is no other way to have foreign key constraints with partitioned tables.
After testing the proposed solution by Laurenz in a database I have and also after replicating the original database of this case. I use PostgreSQL 12.6 and timescaledb 1.7.5.
Basically, I arrived well until defining the Foreign Key for Table datapoint_quality:
ALTER TABLE datapoint_quality
ADD FOREIGN KEY (datapoint_id, datapoints_tstz)
REFERENCES datapoints (id, tstz) MATCH FULL;
The next error is present in both databases I've tested after several attempts (included above one) to define the foreign key to a hypertable:
ERROR: foreign keys to hypertables are not supported Blockquote SQL state: 0A000
According to https://docs.timescale.com/timescaledb/latest/overview/limitations/##distributed-hypertable-limitations, it looks like the above error is part of the hypertable limitations:
Foreign key constraints referencing a hypertable are not supported.
Considering this, does anyone know any solution at the DB level to establish the relationships (1..* or ...) among a table without hypertables to other tables with hypertables behind?
Maybe could be a solution to deal with this at even a REST API level (e.g. Django or Flask) given at timescaledb or PostgreSQL I have not found much more solutions.

Postgres: foreign key to foreign table

I have a foreign table, for example:
CREATE FOREIGN TABLE film (
id varchar(40) NOT NULL,
title varchar(40) NOT NULL,
did integer NOT NULL,
date_prod date,
kind varchar(10),
len interval hour to minute
)
SERVER film_server;
with id as the primary key for that table (set in the remote database). I would like to have a local table reference the foreign table, and set a foreign key constraint on the local table -- for example:
CREATE TABLE actor (
id varchar(40) NOT NULL,
name varchar(40) NOT NULL,
film_id varchar(40) NOT NULL,
)
ALTER TABLE actor ADD CONSTRAINT actor_film_fkey FOREIGN KEY (film_id)
REFERENCES film(id);
However, when I try to add the foreign key constraint, I get the error:
ERROR: referenced relation "film" is not a table
Is it possible to add a foreign key constraint to a foreign table?
It's no possible create index on foreign tables.
CREATE INDEX idx_film ON film (id);
This is the error:
ERROR: cannot create index on foreign table

Migrating from Postgresql to Postgres-XL: distributed tables design

I need to scale out our application DB due to the amount of data. It's on PostgreSQL 9.3. So, I've found PostgreSQL-XL and it looks awesome, but I'm having a hard time trying to wrap my head around the limitations for distributed tables. To distribute them by replication (where the whole table is replicated in every datanode) is quite OK, but let's say I have two big related tables that need to be "sharded" along the datanodes:
CREATE TABLE foos
(
id bigserial NOT NULL,
project_id integer NOT NULL,
template_id integer NOT NULL,
batch_id integer,
dataset_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT pk_foos PRIMARY KEY (id),
CONSTRAINT fk_foos_batch_id FOREIGN KEY (batch_id)
REFERENCES batches (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_dataset_id FOREIGN KEY (dataset_id)
REFERENCES datasets (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_project_id FOREIGN KEY (project_id)
REFERENCES projects (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_template_id FOREIGN KEY (template_id)
REFERENCES templates (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT uc_foos UNIQUE (project_id, name)
);
CREATE TABLE foo_childs
(
id bigserial NOT NULL,
foo_id bigint NOT NULL,
template_id integer NOT NULL,
batch_id integer,
ffdata hstore,
CONSTRAINT pk_ff_foos PRIMARY KEY (id),
CONSTRAINT fk_fffoos_batch_id FOREIGN KEY (batch_id)
REFERENCES batches (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_fffoos_foo_id FOREIGN KEY (foo_id)
REFERENCES foos (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_fffoos_template_id FOREIGN KEY (template_id)
REFERENCES templates (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
);
Now Postgres-XL documentation states that:
"(...) in distributed tables, UNIQUE constraints must include the
distribution column of the table"
"(...) the distribution column must be included in PRIMARY KEY"
"(...) column with REFERENCES (FK) should be the distribution column.
(...) PRIMARY KEY must be the distribution column as well."
Their examples are over simplistic and scarse, so can someone please DDL me the two above tables for postgres-XL using DISTRIBUTE BY HASH()?
Or maybe suggest other ways to scale out?
CREATE TABLE foos
( ... ) DISTRIBUTE BY HASH(id);
CREATE TABLE foos_child
( ... ) DISTRIBUTE BY HASH(foo_id);
Now any join on foos.id = foos_child.foo_id can be pushed down and done locally.