Migrating from Postgresql to Postgres-XL: distributed tables design

Migrating from Postgresql to Postgres-XL: distributed tables design - postgresql

I need to scale out our application DB due to the amount of data. It's on PostgreSQL 9.3. So, I've found PostgreSQL-XL and it looks awesome, but I'm having a hard time trying to wrap my head around the limitations for distributed tables. To distribute them by replication (where the whole table is replicated in every datanode) is quite OK, but let's say I have two big related tables that need to be "sharded" along the datanodes:
CREATE TABLE foos
(
id bigserial NOT NULL,
project_id integer NOT NULL,
template_id integer NOT NULL,
batch_id integer,
dataset_id integer NOT NULL,
name text NOT NULL,
CONSTRAINT pk_foos PRIMARY KEY (id),
CONSTRAINT fk_foos_batch_id FOREIGN KEY (batch_id)
REFERENCES batches (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_dataset_id FOREIGN KEY (dataset_id)
REFERENCES datasets (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_project_id FOREIGN KEY (project_id)
REFERENCES projects (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_foos_template_id FOREIGN KEY (template_id)
REFERENCES templates (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT uc_foos UNIQUE (project_id, name)
);
CREATE TABLE foo_childs
(
id bigserial NOT NULL,
foo_id bigint NOT NULL,
template_id integer NOT NULL,
batch_id integer,
ffdata hstore,
CONSTRAINT pk_ff_foos PRIMARY KEY (id),
CONSTRAINT fk_fffoos_batch_id FOREIGN KEY (batch_id)
REFERENCES batches (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_fffoos_foo_id FOREIGN KEY (foo_id)
REFERENCES foos (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT fk_fffoos_template_id FOREIGN KEY (template_id)
REFERENCES templates (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
);
Now Postgres-XL documentation states that:
"(...) in distributed tables, UNIQUE constraints must include the
distribution column of the table"
"(...) the distribution column must be included in PRIMARY KEY"
"(...) column with REFERENCES (FK) should be the distribution column.
(...) PRIMARY KEY must be the distribution column as well."
Their examples are over simplistic and scarse, so can someone please DDL me the two above tables for postgres-XL using DISTRIBUTE BY HASH()?
Or maybe suggest other ways to scale out?

CREATE TABLE foos
( ... ) DISTRIBUTE BY HASH(id);
CREATE TABLE foos_child
( ... ) DISTRIBUTE BY HASH(foo_id);
Now any join on foos.id = foos_child.foo_id can be pushed down and done locally.

Related

there is no unique constraint matching given keys for referenced table "category"?

I'm trying to define more than one foreign key in the process table. but I am getting the error that the columns I am trying to define as foreign key are not 'unique value'.
For this, I wanted to define id and name columns as primary keys in category and subject tables. However, when I want to create the process table, I still get this error." there is no unique constraint matching given keys for referenced table "category"
I have researched and continue to do so on Stackoverflow and many more. but I couldn't figure it out with solutions or viewpoints of the issues that got the same error I was facing. Maybe there is something I'm not seeing.
first table;
CREATE TABLE category(
category_id INT GENERATED ALWAYS AS IDENTITY,
category_name VARCHAR(210),
category_description TEXT,
CONSTRAINT category_pk PRIMARY KEY(category_id,category_name)
);
second table;
CREATE TABLE subject(
subject_id INT GENERATED ALWAYS AS IDENTITY,
subject_name VARCHAR(210),
subject_description TEXT,
CONSTRAINT subject_pk PRIMARY KEY(subject_id,subject_name)
);
I tried that too but I keep getting the same error
ALTER TABLE category ADD CONSTRAINT some_constraint PRIMARY KEY(category_id,category_name);
third table;
CREATE TABLE process(
process_id INT GENERATED ALWAYS AS IDENTITY,
fk_category_id INTEGER,
fk_subject_id INTEGER,
FOREIGN KEY(fk_category_id) REFERENCES category(category_id) ON DELETE CASCADE ON UPDATE
CASCADE,
FOREIGN KEY(fk_subject_id) REFERENCES subject(subject_id) ON DELETE CASCADE ON UPDATE
CASCADE
);

In your FOREIGN KEY declaration either:
Include both the columns that make up the PRIMARY KEY on category and subject e.g. ... REFERENCES category(category_id, category_name) ...
OR
Do not refer to any column and let the FK pick up the PK automatically e.g. ... REFERENCES category ON DELETE ....
I am going to say that what you are really after is:
CREATE TABLE category(
category_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
category_name VARCHAR(210) UNIQUE,
category_description TEXT
);
CREATE TABLE subject(
subject_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
subject_name VARCHAR(210) UNIQUE,
subject_description TEXT
);
CREATE TABLE process(
process_id INT GENERATED ALWAYS AS IDENTITY,
fk_category_id INTEGER,
fk_subject_id INTEGER,
FOREIGN KEY(fk_category_id) REFERENCES category ON DELETE CASCADE ON UPDATE
CASCADE,
FOREIGN KEY(fk_subject_id) REFERENCES subject ON DELETE CASCADE ON UPDATE
CASCADE
);

An identity column isn't automatically a primary key. So your tables category and subjectdon't have any primary keys and thus can't be referenced by a foreign key.
You need to add PRIMARY KEY to the columns of the tables category and subject

Citus: How can I add self referencing table in distributed tables list

I'm trying to run create_distributed_table for tables which i need to shard and almost all of the tables have self relation ( parent child )
but when I run SELECT create_distributed_table('table-name','id');
it throws error cannot create foreign key constraint
simple steps to reproduce
CREATE TABLE TEST (
ID TEXT NOT NULL,
NAME CHARACTER VARYING(255) NOT NULL,
PARENT_ID TEXT
);
ALTER TABLE TEST ADD CONSTRAINT TEST_PK PRIMARY KEY (ID);
ALTER TABLE TEST ADD CONSTRAINT TEST_PARENT_FK FOREIGN KEY (PARENT_ID) REFERENCES TEST (ID);
ERROR
citus=> SELECT create_distributed_table('test','id');
ERROR: cannot create foreign key constraint
DETAIL: Foreign keys are supported in two cases, either in between two colocated tables including partition column in the same ordinal in the both tables or from distributed to reference tables

For the time being, it is not possible to shard a table on PostgreSQL without dropping the self referencing foreign key constraints, or altering them to include a separate and new distribution column.
Citus places records into shards based on the hash values of the distribution column values. It is most likely the case that the hashes of parent and child id values are different and hence the records should be stored in different shards, and possibly on different worker nodes. PostgreSQL does not have a mechanism to create foreign key constraints that reference records on different PostgreSQL clusters.
Consider adding a new column tenant_id and adding this column to the primary key and foreign key constraints.
CREATE TABLE TEST (
tenant_id INT NOT NULL,
id TEXT NOT NULL,
name CHARACTER VARYING(255) NOT NULL,
parent_id TEXT NOT NULL,
FOREIGN KEY (tenant_id, parent_id) REFERENCES test(tenant_id, id),
PRIMARY KEY (tenant_id, id)
);
SELECT create_distributed_table('test','tenant_id');
Note that parent and child should always be in the same tenant for this to work.

How to use timescale hypertables with foreign keys and keep a one-to-many relation?

I am trying to create a database with minimum redundancy in mind. We would like to use the timescaledb hypertables (I run postgreSQL v. 12 and timescaledb v. 1.7.4). The postgreSQL code to create the tables are as follows - you can see the dbdiagram here https://dbdiagram.io/d/5f992f0e3a78976d7b797ca2 or view the tables here Image of database
CREATE TABLE "datapoints" (
"id" bigserial UNIQUE NOT NULL,
"tstz" timestamptz NOT NULL,
"entity_id" bigint NOT NULL,
"value" real NOT NULL,
PRIMARY KEY ("tstz", "entity_id")
);
CREATE TABLE "datapoint_quality" (
"tstz" timestamptz NOT NULL,
"datapoint_id" bigint NOT NULL,
"flag_id" bigint NOT NULL,
PRIMARY KEY ("tstz", "datapoint_id", "flag_id")
);
CREATE TABLE "quality_flags" (
"id" bigserial PRIMARY KEY,
"value" text
);
CREATE TABLE "sensor_types" (
"id" bigserial PRIMARY KEY,
"name" text UNIQUE NOT NULL
);
CREATE TABLE "sensors" (
"tstz" timestamptz NOT NULL DEFAULT (now()),
"id" bigserial UNIQUE NOT NULL,
"name" text NOT NULL,
"parent" bigint NOT NULL,
"type" bigint NOT NULL,
PRIMARY KEY ("tstz", "id")
);
CREATE TABLE "datapoint_annotation" (
"tstz" timestamptz NOT NULL,
"datapoint_id" bigint NOT NULL,
"annotation_id" bigint NOT NULL,
PRIMARY KEY ("tstz", "datapoint_id", "annotation_id")
);
CREATE TABLE "annotations" (
"id" bigserial PRIMARY KEY NOT NULL,
"value" text NOT NULL
);
ALTER TABLE "datapoints" ADD FOREIGN KEY ("entity_id") REFERENCES "sensors" ("id");
ALTER TABLE "datapoint_quality" ADD FOREIGN KEY ("datapoint_id") REFERENCES "datapoints" ("id");
ALTER TABLE "datapoint_quality" ADD FOREIGN KEY ("flag_id") REFERENCES "quality_flags" ("id");
ALTER TABLE "sensors" ADD FOREIGN KEY ("parent") REFERENCES "sensors" ("id");
ALTER TABLE "sensors" ADD FOREIGN KEY ("type") REFERENCES "sensor_types" ("id");
ALTER TABLE "datapoint_annotation" ADD FOREIGN KEY ("datapoint_id") REFERENCES "datapoints" ("id");
ALTER TABLE "datapoint_annotation" ADD FOREIGN KEY ("annotation_id") REFERENCES "annotations" ("id");
CREATE UNIQUE INDEX ON "quality_flags" ("value");
CREATE UNIQUE INDEX ON "annotations" ("value");
So far so good - next I want to create the hypertables, which I do as:
SELECT create_hypertable('datapoint_annotation', 'tstz');
SELECT create_hypertable('datapoint_quality', 'tstz');
SELECT create_hypertable('datapoints', 'tstz');
SELECT create_hypertable('sensors', 'tstz');
This works well for the first two lines, but for the latter two I get the following error:
ERROR: cannot create a unique index without the column "tstz" (used in partitioning)
SQL state: TS103
I can include the tstz in the primary key as ("id", "tstz") and use that as foreign key, but this gives me a one-to-one relation, and for minimum redundancy I would like to have a one-to-many relation.
I am sure there should be some way to do this - so what am I missing?

I'll take the foreign key constraint from datapoint_quality to datapoints as an example.
To make that work with a partitioned table, you need a unique constraint on datapoint. As the error message tell you, such a constraint must contain the partitioning key. So you end up with
ALTER TABLE datapoints ADD UNIQUE (id, tstz);
To reference that unique constraint from datapoint_quality, you need to have the timestamp there too:
ALTER TABLE datapoint_quality ADD datapoints_tstz timestamp with time zone;
You have to fill it with the appropriate values:
UPDATE datapoint_quality AS dq
SET datapoints_tstz = d.tstz
FROM datapoints AS d
WHERE d.id = dq.datapoint_id;
Then set it NOT NULL:
ALTER TABLE datapoint_quality ALTER datapoints_tstz SET NOT NULL;
Now you can define your foreign key:
ALTER TABLE datapoint_quality
ADD FOREIGN KEY (datapoint_id, datapoints_tstz)
REFERENCES datapoints (id, tstz) MATCH FULL;
There is no other way to have foreign key constraints with partitioned tables.

After testing the proposed solution by Laurenz in a database I have and also after replicating the original database of this case. I use PostgreSQL 12.6 and timescaledb 1.7.5.
Basically, I arrived well until defining the Foreign Key for Table datapoint_quality:
ALTER TABLE datapoint_quality
ADD FOREIGN KEY (datapoint_id, datapoints_tstz)
REFERENCES datapoints (id, tstz) MATCH FULL;
The next error is present in both databases I've tested after several attempts (included above one) to define the foreign key to a hypertable:
ERROR: foreign keys to hypertables are not supported Blockquote SQL state: 0A000
According to https://docs.timescale.com/timescaledb/latest/overview/limitations/##distributed-hypertable-limitations, it looks like the above error is part of the hypertable limitations:
Foreign key constraints referencing a hypertable are not supported.
Considering this, does anyone know any solution at the DB level to establish the relationships (1..* or ...) among a table without hypertables to other tables with hypertables behind?
Maybe could be a solution to deal with this at even a REST API level (e.g. Django or Flask) given at timescaledb or PostgreSQL I have not found much more solutions.

Postgresql sharding with citus extension not working

I am using Postgresql with citus extension for sharding and unable to shard tables like below.
Below table has a primary key and 2 unique keys. I am trying to shard against column with primary key i.e pid.
Note: I am not allowed to change the table structure. These tables are created by tool.
CREATE TABLE person
(
pid bigint NOT NULL,
name character varying(100),
address_pid bigint NOT NULL,
address_type character varying(100),
CONSTRAINT id_pkey PRIMARY KEY (pid),
CONSTRAINT addr_id UNIQUE (address_pid),
CONSTRAINT addr_type_id UNIQUE (address_type, address_pid)
);
This my sharding query:
select create_distributed_table('person', 'pid');
Error it throw is:
Error: Distributed relations cannot have UNIQUE, EXCLUDE, or PRIMARY KEY constraints that do not include the partition column
Can anyone help me with sharding these kind of tables?
#CraigKerstiens Addition to this question:
How to handle sharding when we have multiple foreign keys like this one.
CREATE TABLE table
(
pid bigint NOT NULL,
search_order integer NOT NULL,
resource_pid bigint NOT NULL,
search_pid bigint NOT NULL,
CONSTRAINT hfj_search_result_pkey PRIMARY KEY (pid),
CONSTRAINT idx_searchres_order UNIQUE (search_pid, search_order),
CONSTRAINT fk_searchres_res FOREIGN KEY (resource_pid)
REFERENCES public.table1 (res_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT fk_searchres_search FOREIGN KEY (search_pid)
REFERENCES public.table2 (pid) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
Assuming that table1 and table2 are already sharded.

Within Citus at this time you cannot have a unique constraint that doesn't include they column you are partitioning on. In this case, it'd be possible to enforce addresses were unique to the person id, but not globally unique. To do that you could:
CREATE TABLE person
(
pid bigint NOT NULL,
name character varying(100),
address_pid bigint NOT NULL,
address_type character varying(100),
CONSTRAINT id_pkey PRIMARY KEY (pid),
CONSTRAINT addr_id UNIQUE (pid, address_pid),
CONSTRAINT addr_type_id UNIQUE (pid, address_type, address_pid)
);

Adding primary key changes column type

Our database currently doesn't define primary keys on any tables. All of the id columns are simply unique indexes. I'm dropping those indexes and replacing them with proper primary keys.
My problem: In Postgres 8.4.7, one table in particular changes the data type from bigint to integer when I add the primary key to the table.
I've got the following table definition:
psql=# \d events
Table "public.events"
Column | Type | Modifiers
-----------------------+--------------------------+-----------------------------------------------------
id | bigint | not null default nextval('events_id_seq'::regclass)
[more columns omitted]
Indexes:
"events_id_unique_pk" UNIQUE, btree (id)
Foreign-key constraints:
"events_clearing_event_ref_fk" FOREIGN KEY (clearing_event_id) REFERENCES events(id)
"events_event_configs_id_fk" FOREIGN KEY (event_config_id) REFERENCES event_configs(id)
"events_pdu_circuitbreaker_id_fk" FOREIGN KEY (pdu_circuitbreaker_id) REFERENCES pdu_circuitbreaker(id)
"events_pdu_id_fk" FOREIGN KEY (pdu_id) REFERENCES pdus(id) ON DELETE CASCADE
"events_pdu_outlet_id_fk" FOREIGN KEY (pdu_outlet_id) REFERENCES pdu_outlet(id)
"events_sensor_id_fk" FOREIGN KEY (sensor_id) REFERENCES sensors(id)
"events_user_id_fk" FOREIGN KEY (clearing_user_id) REFERENCES users(id)
Referenced by:
TABLE "events" CONSTRAINT "events_clearing_event_ref_fk" FOREIGN KEY (clearing_event_id) REFERENCES events(id)
TABLE "event_params" CONSTRAINT "events_params_event_id_fk" FOREIGN KEY (event_id) REFERENCES events(id) ON DELETE CASCADE
Triggers:
event_validate BEFORE INSERT OR UPDATE ON events FOR EACH ROW EXECUTE PROCEDURE event_validate()
This is what happens:
psql=# ALTER TABLE events ADD PRIMARY KEY (id);
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "events_pkey" for table "events"
ALTER TABLE
psql=# \d events
Table "public.events"
Column | Type | Modifiers
-----------------------+--------------------------+-----------------------------------------------------
id | integer | not null default nextval('events_id_seq'::regclass)
[more columns omitted]
Indexes:
"events_pkey" PRIMARY KEY, btree (id)
"events_id_unique_pk" UNIQUE, btree (id)
Foreign-key constraints:
"events_clearing_event_ref_fk" FOREIGN KEY (clearing_event_id) REFERENCES events(id)
"events_event_configs_id_fk" FOREIGN KEY (event_config_id) REFERENCES event_configs(id)
"events_pdu_circuitbreaker_id_fk" FOREIGN KEY (pdu_circuitbreaker_id) REFERENCES pdu_circuitbreaker(id)
"events_pdu_id_fk" FOREIGN KEY (pdu_id) REFERENCES pdus(id) ON DELETE CASCADE
"events_pdu_outlet_id_fk" FOREIGN KEY (pdu_outlet_id) REFERENCES pdu_outlet(id)
"events_sensor_id_fk" FOREIGN KEY (sensor_id) REFERENCES sensors(id)
"events_user_id_fk" FOREIGN KEY (clearing_user_id) REFERENCES users(id)
Referenced by:
TABLE "events" CONSTRAINT "events_clearing_event_ref_fk" FOREIGN KEY (clearing_event_id) REFERENCES events(id)
TABLE "event_params" CONSTRAINT "events_params_event_id_fk" FOREIGN KEY (event_id) REFERENCES events(id) ON DELETE CASCADE
Triggers:
event_validate BEFORE INSERT OR UPDATE ON events FOR EACH ROW EXECUTE PROCEDURE event_validate()
I considered a few workarounds, but I'd really rather know why it's happening. There are a few other tables that also use bigint, so I don't want to just hack a solution in place.
This is scripted with Liquibase, but it happens directly in the Postgres console too.
Update
Two other points:
I can create a simple table with a bigint id and a unique index on id, add the primary key, and the column type stays the same.
All tables are empty at the time execution.
Could it have something to do with the constraints?

That's pretty interesting. I can't reproduce it with version 9.1.0 (yes, I should upgrade too!). But then I don't know precisely how the original table and sequence were created.
This page seems to allude to a similar automatic change of types between SERIAL and INTEGER: http://grover.open2space.com/content/migrate-data-postgresql-and-maintain-existing-primary-key
Could it be something like creating the table using SERIAL instead of BIGSERIAL, and then forcing the type to BIGINT? Something in between the sequence and primary key manipulations might have reset it.

I wasn't able to reproduce this the next day, even after reproducing it multiple times with witnesses the first time it occurred. I'm chalking it up to gremlins.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Migrating from Postgresql to Postgres-XL: distributed tables design - postgresql

CREATE TABLE foos ( ... ) DISTRIBUTE BY HASH(id); CREATE TABLE foos_child ( ... ) DISTRIBUTE BY HASH(foo_id); Now any join on foos.id = foos_child.foo_id can be pushed down and done locally.

Related

there is no unique constraint matching given keys for referenced table "category"?

Citus: How can I add self referencing table in distributed tables list

How to use timescale hypertables with foreign keys and keep a one-to-many relation?

Postgresql sharding with citus extension not working

Adding primary key changes column type

Categories

Resources