Is it possible to achieve Upsert in Postgres without using the On Conflict clause?
I have a requirement where I converted a normal table into a partitioned table with a partition key that was not part of the Primary Key when the table was non-partitioned.
Since the partition key is added to the primary key column list now, my Upsert statements are failing as the On Conflict clause is missing the partition key. But as per the requirement, I cannot add the partition key to the On Conflict clause as I will have more than one row for the previous primary key column combination in the partitioned table.
Hence, I want Upsert to be achieved without the On Conflict clause. Can someone suggest what alternatives would work.
Related
This question is for a database using PostgreSQL 12.3; we are using declarative partitioning and ON CONFLICT against the partitioned table is possible.
We had a single table representing application event data from client activity. Therefore, each row has fields client_id int4 and a dttm timestamp field. There is also an event_id text field and a project_id int4 field which together formed the basis of a composite primary key. (While rare, it was possible for two event records to have the same event_id but different project_id values for the same client_id.)
The table became non-performant, and we saw that queries most often targeted a single client in a specific timeframe. So we shifted the data into a partitioned table: first by LIST (client_id) and then each partition is further partitioned by RANGE(dttm).
We are running into problems shifting our upsert strategy to work with this new table. We used to perform a query of INSERT INTO table SELECT * FROM staging_table ON CONFLICT (event_id, project_id) DO UPDATE ...
But since the columns that determine uniqueness (event_id and project_id) are not part of the partitioning strategy (dttm and client_id), I can't do the same thing with the partitioned table. I thought I could get around this by building UNIQUE indexes on each partition on (project_id, event_id) but the ON CONFLICT is still not firing because there is no such unique index on the parent table (there can't be, since it doesn't contain all partitioning columns). So now a single upsert query appears impossible.
I've found two solutions so far but both require additional changes to the upsert script that seem like they'd be less performant:
I can still do an INSERT INTO table_partition_subpartition ... ON CONFLICT (event_id, project_id) DO UPDATE ... but that requires explicitly determining the name of the partition for each row instead of just INSERT INTO table ... once for the entire dataset.
I could implement the "old way" UPSERT procedure: https://www.postgresql.org/docs/9.4/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE but this again requires looping through all rows.
Is there anything else I could do to retain the cleanliness of a single, one-and-done INSERT INTO table SELECT * FROM staging_table ON CONFLICT () DO UPDATE ... while still keeping the partitioning strategy as-is?
Edit: if it matters, concurrency is not an issue here; there's just one machine executing the UPSERT into the main table from the staging table on a schedule.
postgres 14
I have some table:
CREATE TABLE sometable (
id integer NOT NULL PRIMARY KEY UNIQUE ,
a integer NOT NULL DEFAULT 1,
b varchar(32) UNIQUE)
PARTITION BY RANGE (id);
But when i try to execute it, i get
ERROR: unique constraint on partitioned table must include all partitioning columns
If i execute same table definition without PARTITION BY RANGE (id) and check indexes, i get:
tablename indexname indexdef
sometable, sometable_b_key, CREATE UNIQUE INDEX sometable_b_key ON public.sometable USING btree (b)
sometable, sometable_pkey, CREATE UNIQUE INDEX sometable_pkey ON public.sometable USING btree (id)
So... unique constraints exist
whats the problem? how can i fix it?
On partitioned tables, all primary keys, unique constraints and unique indexes must contain the partition expression. That is because indexes on partitioned tables are implemented by individual indexes on each partition, and there is no way to enforce uniqueness across different indexes.
If you want to use partitioning, you have to sacrifice some consistency guarantees. There is no way around that. What you can do is create unique constraints on the partitions. That will guarantee uniqueness within each partition, but not global uniqueness.
This limitation is also mentioned in the docs
5.11.2.3. Limitations The following limitations apply to partitioned tables:
Unique constraints (and hence primary keys) on partitioned tables must
include all the partition key columns. This limitation exists because
the individual indexes making up the constraint can only directly
enforce uniqueness within their own partitions; therefore, the
partition structure itself must guarantee that there are not
duplicates in different partitions.
There is no way to create an exclusion constraint spanning the whole
partitioned table. It is only possible to put such a constraint on
each leaf partition individually. Again, this limitation stems from
not being able to enforce cross-partition restrictions.
https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE
How we can eliminate duplicate insertion after postgres partitions.
As Partition key on unique constraint causing non key attribute duplicate.
EX: ID date
1 1-01-2022
1 02-01-2022
To make ID unique we have before insert trigger is the only option, Any other ways?
You cannot have a unique constraint on a partitioned table that does not contain the partitioning key. A trigger won't be a reliable solution either, because it would be subject to race conditions (concurrent inserts) unless you are running with the SERIALIZABLE isolation level.
The best you can have is unique constraints on each individual partitions.
The best you can do is fill the values from a sequence, so that the values are automatically unique, for example with an identity column.
For ON CONFLICT(col) clause in UPSERT, should there be unique constraint for the column or combination of columns.
for eg:
if I have a simple table create table test(id integer ,name text ),Will I not be able to do UPSERT ?the UNIQUENESS constraint have to be enforced ?
Please help as I am confused.
A unique constraint must be fulfilled at the end of a transaction. So it can't become non-unique - you would get an error and the transaction would be rolled back. UPSERT can't fail:
ON CONFLICT DO UPDATE guarantees an atomic INSERT or UPDATE outcome;
provided there is no independent error, one of those two outcomes is
guaranteed, even under high concurrency. This is also known as UPSERT
— “UPDATE or INSERT”.
Is it possible to do upsert in Postgres 9.5 when conflict happens on one of 2 columns in a table.? Basically I have 2 columns and if either column throws unique constraint violation, then I would like to perform update operation.
Yes, and this behaviour is default. Any unique constraint violation constitutes a conflict and then the UPDATE is performed if ON CONFLICT DO UPDATE is specified. The INSERT statement can have only a single ON CONFLICT clause, but the conflict_target of that clause can specify multiple column names each of which must have an index, such as a UNIQUE constraint. You are, however, limited to a single conflict_action and you will not have information on which constraint caused the conflict when processing that action. If you need that kind of information, or specific action depending on the constraint violation, you should write a trigger function but then you lose the all-important atomicity of the INSERT ... ON CONFLICT DO ... statement.
I think in Postgres 9.5 ON CONFLICT can have only one constraint or multiple column names but on that multiple columns must have combine one index