TimescaleDB/PostgreSQL: how to use unique constraint when creating hypertables? - postgresql

I am trying to create a table in PostgreSQL to contain lots of data and for that reason I want to use timescales hypertable as in the example below.
CREATE TABLE "datapoints" (
"tstz" timestamptz NOT NULL,
"id" bigserial UNIQUE NOT NULL,
"entity_id" bigint NOT NULL,
"value" real NOT NULL,
PRIMARY KEY ("id", "tstz", "entity_id")
);
SELECT create_hypertable('datapoints','tstz');
However, this throws an error - shown below. As far as I have figured out the error arise since the unique constraint isn't allowed in hypertables, but I really need the uniqueness. So does anyone have an idea on how to solve it or work around it?
ERROR: cannot create a unique index without the column "tstz" (used in partitioning)
SQL state: TS103

There is no way to avoid that.
TimescaleDB uses PostgreSQL partitioning, and it is not possible to have a primary key or unique constraint on a partitioned table that does not contain the partitioning key.
The reason behind that is that an index on a partitioned table consists of individual indexes on the partitions (these are the partitions of the partitioned index). Now the only way to guarantee uniqueness for such a partitioned index is to have the uniqueness implicit in the definition, which is only the case if the partitioning key is part of the index.
So you either have to sacrifice the uniqueness constraint on id (which is pretty much given if you use a sequence) or you have to do without partitioning.

Related

Postgresql partition table unique index problem

postgres 14
I have some table:
CREATE TABLE sometable (
id integer NOT NULL PRIMARY KEY UNIQUE ,
a integer NOT NULL DEFAULT 1,
b varchar(32) UNIQUE)
PARTITION BY RANGE (id);
But when i try to execute it, i get
ERROR: unique constraint on partitioned table must include all partitioning columns
If i execute same table definition without PARTITION BY RANGE (id) and check indexes, i get:
tablename indexname indexdef
sometable, sometable_b_key, CREATE UNIQUE INDEX sometable_b_key ON public.sometable USING btree (b)
sometable, sometable_pkey, CREATE UNIQUE INDEX sometable_pkey ON public.sometable USING btree (id)
So... unique constraints exist
whats the problem? how can i fix it?
On partitioned tables, all primary keys, unique constraints and unique indexes must contain the partition expression. That is because indexes on partitioned tables are implemented by individual indexes on each partition, and there is no way to enforce uniqueness across different indexes.
If you want to use partitioning, you have to sacrifice some consistency guarantees. There is no way around that. What you can do is create unique constraints on the partitions. That will guarantee uniqueness within each partition, but not global uniqueness.
This limitation is also mentioned in the docs
5.11.2.3. Limitations The following limitations apply to partitioned tables:
Unique constraints (and hence primary keys) on partitioned tables must
include all the partition key columns. This limitation exists because
the individual indexes making up the constraint can only directly
enforce uniqueness within their own partitions; therefore, the
partition structure itself must guarantee that there are not
duplicates in different partitions.
There is no way to create an exclusion constraint spanning the whole
partitioned table. It is only possible to put such a constraint on
each leaf partition individually. Again, this limitation stems from
not being able to enforce cross-partition restrictions.
https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE

PostgreSQL declarative partition - unique constraint on partitioned table must include all partitioning columns [duplicate]

This question already has an answer here:
ERROR: unique constraint on partitioned table must include all partitioning columns
(1 answer)
Closed last month.
I'm trying to create a partitioned table which refers to itself, creating a doubly-linked list.
CREATE TABLE test2 (
id serial NOT NULL,
category integer NOT NULL,
time timestamp(6) NOT NULL,
prev_event integer,
next_event integer
) PARTITION BY HASH (category);
Once I add primary key I get the following error.
alter table test2 add primary key (id);
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: PRIMARY KEY constraint on table "test2" lacks column "category" which is part of the partition key.
Why does the unique constrain require all partitioned columns to be included?
EDIT: Now I understand why this is needed: https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-LIMITATIONS
Once I add PK with both columns it works.
alter table test2 add primary key (id, category);
But then adding the FK to itself doesn't work.
alter table test2 add foreign key (prev_event) references test2 (id) on update cascade on delete cascade;
ERROR: there is no unique constraint matching given keys for referenced table "test2"
Since PK is not just id but id-category I can't create FK pointing to id.
Is there any way to deal with this or am I missing something?
I would like to avoid using inheritance partitioning if possible.
EDIT2: It seems this is a known problem. https://www.reddit.com/r/PostgreSQL/comments/di5mbr/postgresql_12_foreign_keys_and_partitioned_tables/f3tsoop/
Seems that there is no straightforward solution. PostgreSQL simply doesn't support this as of v14. One solution is to use triggers to enforce 'foreign key' behavior. Other is to use multi-column foreign keys. Both are far from optimal.

Composite FK referencing atomic PK + non unique attribute

I am trying to create the following tables in Postgres 13.3:
CREATE TABLE IF NOT EXISTS accounts (
account_id Integer PRIMARY KEY NOT NULL
);
CREATE TABLE IF NOT EXISTS users (
user_id Integer PRIMARY KEY NOT NULL,
account_id Integer NOT NULL REFERENCES accounts(account_id) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS calendars (
calendar_id Integer PRIMARY KEY NOT NULL,
user_id Integer NOT NULL,
account_id Integer NOT NULL,
FOREIGN KEY (user_id, account_id) REFERENCES users(user_id, account_id) ON DELETE CASCADE
);
But I get the following error when creating the calendars table:
ERROR: there is no unique constraint matching given keys for referenced table "users"
Which does not make much sense to me since the foreign key contains the user_id which is the PK of the users table and therefore also has a uniqueness constraint. If I add an explicit uniqueness constraint on the combined user_id and account_id like so:
ALTER TABLE users ADD UNIQUE (user_id, account_id);
Then I am able to create the calendars table. This unique constraint seems unnecessary to me as user_id is already unique. Can someone please explain to me what I am missing here?
Postgres is so smart/dumb that it doesn't assume the designer to do stupid things.
The Postgres designers could have taken different strategies:
Detect the transitivity, and make the FK not only depend on users.id, but also on users.account_id -> accounts.id. This is doable but costly. It also involves adding multiple dependency-records in the catalogs for a single FK-constraint. When imposing the constraint(UPDATE or DELETE in any of the two referred tables), it could get very complex.
Detect the transitivity, and silently ignore the redundant column reference. This implies: lying to the programmer. It would also need to be represented in the catalogs.
cascading DDL operations would get more complex, too. (remember: DDL is already very hard w.r.t. concurrency/versioning)
From the execution/performance point of view: imposing the constraints currently involves "pseudo triggers" on the referred table's indexes. (except DEFERRED, which has to be handled specially)
So, IMHO the Postgres developers made the sane choice of refusing to do stupid complex things.

How to create TimescaleDB Hypertable with time partitioning on non unique timestamp?

I have just started to use TimescaleDB and want to create a hypertable on a table with events.
Originally I thought of following the conventional pattern of:
CREATE TABLE event (
id serial PRIMARY KEY,
ts timestamp with time zone NOT NULL,
details varchar(255) NOT NULL
);
CREATE INDEX event_ts_idx on event(ts);
However, when I tried to create the hypertable with the following query:
SELECT create_hypertable('event', 'ts');
I got: ERROR: cannot create a unique index without the column "ts" (used in partitioning)
After doing some research, it seems that the timestamp itself needs to be the (or part of the) primary key.
However, I do not want the timestamp ts to be unique. It is very likely that these high frequency events will coincide in the same microsecond (the maximum resolution of the timestamp type). It is the whole reason why I am looking into TimescaleDB in the first place.
What is the best practice in this case?
I was thinking of maybe keeping the serial id as part of the primary key, and making it composite like this:
CREATE TABLE event_hyper (
id serial,
ts timestamp with time zone NOT NULL,
details varchar(255) NOT NULL,
PRIMARY KEY (id, ts)
);
SELECT create_hypertable('event_hyper', 'ts');
This sort of works, but I am unsure if it is the right approach, or if I am creating a complicated primary key which will slow down inserts or create other problems.
What is the right approach when you have possible collision in timestamps when using TimescaleDB hypertables?
How to create TimescaleDB Hypertable with time partitioning on non unique timestamp?
There is no need to create unique constraint on time dimension (unique constraints are not required). This works:
CREATE TABLE event (
id serial,
ts timestamp with time zone NOT NULL,
details varchar(255) NOT NULL
);
SELECT create_hypertable('event', 'ts');
Note that the primary key on id is removed.
If you want to create unique constraint or primary key, then TimescaleDB requires that any unique constraint or primary key includes the time dimension. This is similar to limitation of PostgreSQL in declarative partitioning to include partition key into unique constraint:
Unique constraints (and hence primary keys) on partitioned tables must include all the partition key columns. This limitation exists because PostgreSQL can only enforce uniqueness in each partition individually.
TimescaleDB also enforces uniqueness in each chunk individually. Maintaining uniqueness across chunks can affect ingesting performance dramatically.
The most common approach to fix the issue with the primary key is to create a composite key and include the time dimension as proposed in the question. If the index on the time dimension is not needed (no queries only on time is expected), then the index on time dimension can be avoided:
CREATE TABLE event_hyper (
id serial,
ts timestamp with time zone NOT NULL,
details varchar(255) NOT NULL,
PRIMARY KEY (id, ts)
);
SELECT create_hypertable('event_hyper', 'ts', create_default_indexes => FALSE);
It is also possible to use an integer column as the time dimension. It is important that such column has time dimension properties: the value is increasing over time, which is important for insert performance, and queries will select a time range, which is critical for query performance over large database. The common case is for storing unix epoch.
Since id in event_hyper is SERIAL, it will increase with time. However, I doubt the queries will select the range on it. For completeness SQL will be:
CREATE TABLE event_hyper (
id serial PRIMARY KEY,
ts timestamp with time zone NOT NULL,
details varchar(255) NOT NULL
);
SELECT create_hypertable('event_hyper', 'id', chunk_time_interval => 1000000);
To build on #k_rus 's answer, it seems like the generated primary key here is not actually what you're looking for. What meaning does that id have? Isn't it just identifying a unique details, ts combination? Or can there meaningfully be two values that have the same timestamp and the same details but different ids that actually has some sort of semantic meaning. It seems to me that that is somewhat nonsensical, in which case, I would do a primary key on (details, ts) which should provide you the uniqueness condition that you need. I do not know if your ORM will like this, they tend to be overly dependent on generated primary keys because, among other things, not all databases support composite primary keys. But in general, my advice for cases like this is to actually use a composite primary key with logical meaning.
Now if you actually care about multiple messages with the same details at the same timestamp, I might suggest a table structure something like
CREATE TABLE event_hyper (
ts timestamp with time zone NOT NULL,
details varchar(255) NOT NULL,
count int,
PRIMARY KEY (details, ts)
);
with which you can do an INSERT ON CONFLICT DO UPDATE in order to increment it.
I wish that ORMs were better about doing this sort of thing, but you can usually trick ORMs into reading from other tables (or a view over them because then they think they can't update records there etc, which is why they need to have the generated PK). Then it just means that there's a little bit of custom ingest code to write that inserts into the hypertable. It's often better to do this anyway because, in general, I've found that ORMs don't always follow best practices for high volume inserts, and often don't use bulk loading techniques.
So a table like that, with a view that just select's * from the table should then allow you to use the ORM for reads, write a very small amount of custom code to do ingest into the timeseries table and voila - it works. The rest of your relational model, which is the part that the ORM excels at doing can live in the ORM and then have a minor integration here with a bit of custom SQL and a few custom methods.
The limitation is:
Need to make all partition columns (primary & secondary, if any) as a unique key of table.
Refer: https://github.com/timescale/timescaledb/issues/447#issuecomment-369371441
2 choices in my opinion:
partition by a single column, which is a unique key (e.g the primary key),
partition with a 2nd space partition key, need to make the 2 columns a combined unique key,
I got the same problem.
The solution was to avoid this field:
id: 'id'
I think I'm replying a little bit too late, but still.
You can try something like this:
CREATE TABLE event_hyper (
id serial,
ts timestamp with time zone NOT NULL,
details varchar(255) NOT NULL
);
SELECT create_hypertable('event_hyper', 'ts', partitioning_column => 'id', number_partitions => X);
Where X is the desirable number of hash partitions by column 'id'.
https://docs.timescale.com/api/latest/hypertable/create_hypertable/#optional-arguments
As you can also notice there's no PRIMARY KEY constraint in table 'event_hyper'.
Output of create_hypertable() operation should be:
create_hypertable
---------------------------
(1,public,event_hyper,t)

UNIQUE constraint on table "subscriber_historization" lacks column "processing_date" which is part of the partition key

I want to use constraint so i can use upsert. Because i don't want duplicate entry on customer_identifier_value.
on conflict (customer_identifier_value) do nothing
[42P10] ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification
When i create the constraint
alter table subscriber_historization
add constraint customer_identifier_value_unique unique (customer_identifier_value);
[0A000] ERROR: insufficient columns in UNIQUE constraint definition
Detail: UNIQUE constraint on table "subscriber_historization" lacks column "processing_date" which is part of the partition key.
Here is the DDL.
-- auto-generated definition
create table subscriber_historization
(
customer_identifier_value text not null,
product_value text,
contract_date_end date,
processing_date date not null,
constraint subscriber_historization_pk
primary key (processing_date, customer_identifier_value)
)
partition by RANGE (processing_date);
If i use
ON CONFLICT ON CONSTRAINT subscriber_historization_pk DO NOTHING
The row will be inserted if process_date is different. Then there will be duplicate entry on customer_identifier_value.
How to use upsert then?
Thanks for your help.
You cannot prevent that with partitioned tables, because all unique indexes must contain the partitioning key.
Your only way out is to use SERIALIZABLE transaction isolation throughout and verify the constraint with a trigger. This will be a performance hit, however.
This is a limitation of partitioning.