Avoid scan on attach partition with check constraint - postgresql

I am recreating an existing table as a partitioned table in PostgreSQL 11.
After some research, I am approaching it using the following procedure so this can be done online while writes are still happening on the table:
add a check constraint on the existing table, first as not valid and then validating
drop the existing primary key
rename the existing table
create the partitioned table under the prior table name
attach the existing table as a partition to the new partitioned table
My expectation was that the last step would be relatively fast, but I don't really have a number for this. In my testing, it's taking about 30s. I wonder if my expectations are incorrect or if I'm doing something wrong with the constraint or anything else.
Here's a simplified version of the DDL.
First, the inserted_at column is declared like this:
inserted_at timestamp without time zone not null
I want to have an index on the ID even after I drop the PK for existing queries and writes, so I create an index:
create unique index concurrently my_events_temp_id_index on my_events (id);
The check constraint is created in one transaction:
alter table my_events add constraint my_events_2022_07_events_check
check (inserted_at >= '2018-01-01' and inserted_at < '2022-08-01')
not valid;
In the next transaction, it's validated (and the validation is successful):
alter table my_events validate constraint my_events_2022_07_events_check;
Then before creating the partitioned table, I drop the primary key of the existing table:
alter table my_events drop constraint my_events_pkey cascade;
Finally, in its own transaction, the partitioned table is created:
alter table my_events rename to my_events_2022_07;
create table my_events (
id uuid not null,
... other columns,
inserted_at timestamp without time zone not null,
primary key (id, inserted_at)
) partition by range (inserted_at);
alter table my_events attach partition my_events_2022_07
for values from ('2018-01-01') to ('2022-08-01');
That last transaction blocks inserts and takes about 30s for the 12M rows in my test database.
Edit
I wanted to add that in response to the attach I see this:
INFO: partition constraint for table "my_events_2022_07" is implied by existing constraints
That makes me think I'm doing this right.

The problem is not the check constraint, it is the primary key.
If you make the original unique index include both columns:
create unique index concurrently my_events_temp_id_index on my_events (id,inserted_at);
And if you make the new table have a unique index rather than a primary key on those two columns, then the attach is nearly instantaneous.
These seem to me like unneeded restrictions in PostgreSQL, both that the unique index on one column can't be used to imply uniqueness on the both columns, and that the unique index on both columns cannot be used to imply the primary key (nor even a unique constraint--but only a unique index).

Related

PostgreSQL declarative partition - unique constraint on partitioned table must include all partitioning columns [duplicate]

This question already has an answer here:
ERROR: unique constraint on partitioned table must include all partitioning columns
(1 answer)
Closed last month.
I'm trying to create a partitioned table which refers to itself, creating a doubly-linked list.
CREATE TABLE test2 (
id serial NOT NULL,
category integer NOT NULL,
time timestamp(6) NOT NULL,
prev_event integer,
next_event integer
) PARTITION BY HASH (category);
Once I add primary key I get the following error.
alter table test2 add primary key (id);
ERROR: unique constraint on partitioned table must include all partitioning columns
DETAIL: PRIMARY KEY constraint on table "test2" lacks column "category" which is part of the partition key.
Why does the unique constrain require all partitioned columns to be included?
EDIT: Now I understand why this is needed: https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-LIMITATIONS
Once I add PK with both columns it works.
alter table test2 add primary key (id, category);
But then adding the FK to itself doesn't work.
alter table test2 add foreign key (prev_event) references test2 (id) on update cascade on delete cascade;
ERROR: there is no unique constraint matching given keys for referenced table "test2"
Since PK is not just id but id-category I can't create FK pointing to id.
Is there any way to deal with this or am I missing something?
I would like to avoid using inheritance partitioning if possible.
EDIT2: It seems this is a known problem. https://www.reddit.com/r/PostgreSQL/comments/di5mbr/postgresql_12_foreign_keys_and_partitioned_tables/f3tsoop/
Seems that there is no straightforward solution. PostgreSQL simply doesn't support this as of v14. One solution is to use triggers to enforce 'foreign key' behavior. Other is to use multi-column foreign keys. Both are far from optimal.

Postgres ALTER TABLE... unexpected performance when applying multiple changes

I am attempting to speed up a bulk load. The bulk load is performed into a table with all primary keys, indexes, and foreign keys dropped. After data finishes loading we go and apply the necessary constraints back to the database. As a simple test I have the following setup:
CREATE TABLE users
(
id int primary key
);
CREATE TABLE events
(
id serial,
user_id1 int,
user_id2 int,
unique_int1 int,
unique_int2 int,
unique_int3 int
);
INSERT INTO
users (id)
SELECT generate_Series(1,100000000);
INSERT INTO
events (user_id1,user_id2,unique_int1,unique_int2,unique_int3)
SELECT
generate_Series(1,100000000),
generate_Series(1,100000000),
generate_Series(1,100000000),
generate_Series(1,100000000),
generate_Series(1,100000000);
I then wish to apply the following constraints via 5 individual alter table statements:
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_pkey PRIMARY KEY (id);
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_user_id1_fkey FOREIGN KEY (user_id1) REFERENCES public.users(id);
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_user_id2_fkey FOREIGN KEY (user_id2) REFERENCES public.users(id);
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_unique_int1_me_key UNIQUE (unique_int1);
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_unique_int2_me_key UNIQUE (unique_int2);
ALTER TABLE ONLY public.events
ADD CONSTRAINT events_unique_int3_me_key UNIQUE (unique_int3);
Each of the above statements takes approximately 90 seconds to run for a total of 450 seconds. I expected that when I combined the 5 above statements into a single ALTER TABLE statement that the time would be reduced but in fact it has increased.
alter table only public.events
ADD CONSTRAINT events_pkey PRIMARY KEY (id),
ADD CONSTRAINT events_user_id1_fkey FOREIGN KEY (user_id1) REFERENCES public.users(id),
ADD CONSTRAINT events_user_id2_fkey FOREIGN KEY (user_id2) REFERENCES public.users(id),
ADD CONSTRAINT events_unique_int1_me_key UNIQUE (unique_int1),
ADD CONSTRAINT events_unique_int2_me_key UNIQUE (unique_int2),
ADD CONSTRAINT events_unique_int3_me_key UNIQUE (unique_int3);
This takes 520 seconds whereas I expected it to at least take less than 450 seconds. My reason for thinking this is from the Postgres documentation for the ALTER TABLE statement where in the Notes section it reads
The main reason for providing the option to specify multiple changes in a single ALTER TABLE is that multiple table scans or rewrites can thereby be combined into a single pass over the table.
Can anyone explain the above measurements or have any suggestions?
This case is not going to be helped much as each of the commands needs to do a pass to verify the conditions of the constraints FK or UNIQUE. Also the docs:
When multiple subcommands are given, the lock acquired will be the strictest one required by any subcommand.
So by combining the commands you are working on the strictest locking for the entire command.
The section you quoted is amplified further on:
For example, it is possible to add several columns and/or alter the type of several columns in a single command. This is particularly useful with large tables, since only one pass over the table need be made.
The conclusion is combining commands is not necessarily a time saver.
Unifying the table read is going to be particularly beneficial if there is no other IO to be done, like column type change or check constraint. But here you need to actually build the indexes or look up the foreign keys, and that is what will dominate your run time. So it is no surprise that you don't see a big gain.
For a small loss in performance, it could be that that is due to worse cache usage, less efficient parallel query execution, or interrupted stride length for readahead or for bulk writing to your IO devices.

PostgreSQL create partitions of table with existing rows and with referential integrity

I'm trying to partitioning a table with existing rows in Postgresql 10.8.
The structure is like this :
I'm trying to create partitions of table Item, it has around 5mill of rows.
I create the partitions with those commands:
CREATE TABLE item_1 (CHECK (id >0 AND id <1000001)) INHERITS (item);
CREATE TABLE item_2 (CHECK (id >1000000 AND id <2000001)) INHERITS (item);
...
Then the rules:
CREATE RULE item_1_rule AS ON INSERT TO item WHERE (id >0 AND id <1000001) DO INSTEAD INSERT INTO item_1 VALUES (NEW.*);
CREATE RULE item_2_rule AS ON INSERT TO item WHERE (id >1000000 AND id <2000001) DO INSTEAD INSERT INTO item_2 VALUES (NEW.*);
...
Then the migration to the partitioned tables:
INSERT INTO item_1 SELECT * FROM item WHERE (id >0 AND id <1000001);
INSERT INTO item_2 SELECT * FROM item WHERE (id >1000000 AND id <2000001);
...
And finally I try to Delete all the rows from the Item table:
DELETE FROM ONLY item;
But I get this error:
ERROR: update or delete on table "item" violates foreign key constraint >"item_audit_item_id_fkey" on table "item_audit"
SQL state: 23503
Detail: Key (id)=(1) is still referenced from table "item_audit".
So is it possible to drop the data from the main table Item to have only the rows on the partitions tables?
There are other alternatives to make a partition in a different way?
If you use inheritance partitioning, you won't be able to have foreign keys pointing to the partitioned table. You have to drop the constraint before you can delete the rows.
I recommend to use declarative partitioning. If you upgrade to v11 or later, you can have a foreign key pointing to a partitioned table, but since the primary key on a partitioned table has to contain the partitioning key, you'd have to add that column to the foreign key as well.
Since partitioning is mostly useful for mass data deletion, it might make sense to partition item_audit the same way.

How to safely reindex primary key on postgres?

We have a huge table that contains bloat on the primary key index. We constantly archive old records on that table.
We reindex other columns by recreating the index concurrently and dropping the old one. This is to avoid interfering with production traffic.
But this is not possible for a primary key since there are foreign keys depending on it. At least based on what we have tried.
What's the right way to reindex the primary key safely without blocking DML statements on the table?
REINDEX CONCURRENTLY seems to work as well. I tried it on my database and didn't get any error.
REINDEX INDEX CONCURRENTLY <indexname>;
I think it possibly does something similar to what #jlandercy has described in his answer. While the reindex was running I saw an index with suffix _ccnew and the existing one was intact as well. Eventually I guess that index was renamed as the original index after dropping the older one and I eventually see a unique primary index on my table.
I am using postgres v12.7.
You can use pg_repack for this.
pg_repack is a PostgreSQL extension which lets you remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes.
It doesn't hold exclusive locks during the whole process. It still does execute some locks, but this should be for a short period of time only. You can check the details here: https://reorg.github.io/pg_repack/
To perform repack on indexes, you can try:
pg_repack -t table_name --only-indexes
TL;DR
Just reindex it as other index using its index name:
REINDEX INDEX <indexname>;
MCVE
Let's create a table with a Primary Key constraint which is also an Index:
CREATE TABLE test(
Id BIGSERIAL PRIMARY KEY
);
Looking at the catalogue we see the constraint name:
SELECT conname FROM pg_constraint WHERE conname LIKE 'test%';
-- "test_pkey"
Having the name of the index, we can reindex it:
REINDEX INDEX test_pkey;
You can also fix the Constraint Name at the creation:
CREATE TABLE test(
Id BIGSERIAL NOT NULL
);
ALTER TABLE test ADD CONSTRAINT myconstraint PRIMARY KEY(Id);
If you must address concurrence, then use the method a_horse_with_no_name suggested, create a unique index concurrently:
-- Ensure Uniqueness while recreating the Primary Key:
CREATE UNIQUE INDEX CONCURRENTLY tempindex ON test USING btree(Id);
-- Drop PK:
ALTER TABLE test DROP CONSTRAINT myconstraint;
-- Recreate PK:
ALTER TABLE test ADD CONSTRAINT myconstraint PRIMARY KEY(Id);
-- Drop redundant Index:
DROP INDEX tempindex;
To check Index existence:
SELECT * FROM pg_index WHERE indexrelid::regclass = 'tempindex'::regclass

Altering table for adding foreign key constraint take very long time

I have one table manual_errors_archive. I need to add foreign key to this table keeping reference to values table, values table having 800,000 records and manual_errors_archive table does not have any records.
ALTER TABLE manual_errors_archive
ADD CONSTRAINT fk_manua_reference_value
FOREIGN KEY (value_id)
REFERENCES values;
Postgres version i am using 9.1
But this is running for more than 1 hr after that I canceled the process. Any idea how to optimize this process?