From the docs: "Indicates not to recurse creating indexes on partitions, if the table is partitioned. The default is to recurse.".
Am I understand correctly that index will not be created on existing partitons? What kind of index will be created then (on what)?
The objective is to build a partitioned index with as little locking as possible.
Normally, you'd use CREATE INDEX CONCURRENTLY to create an index on each partition, then CREATE INDEX on the partitioned table. If the index definitions match, the previously created indexes will become partitions of the partitioned index. See this related question.
The potential problem with that is that all partitions will be locked at the same time. Instead, you can do it one partition at a time:
create the index ONLY on the partitioned table (the index will be invalid)
use ALTER INDEX ... ATTACH PARTITION to attach the indexes on the partitions as partitions of the index
once all partitions are attached, the partitioned index will become valid
When CREATE INDEX is invoked on a partitioned table, the default
behavior is to recurse to all partitions to ensure they all have
matching indexes. Each partition is first checked to determine whether
an equivalent index already exists, and if so, that index will become
attached as a partition index to the index being created, which will
become its parent index. If no matching index exists, a new index will
be created and automatically attached; the name of the new index in
each partition will be determined as if no index name had been
specified in the command. If the ONLY option is specified, no
recursion is done, and the index is marked invalid. (ALTER INDEX ...
ATTACH PARTITION marks the index valid, once all partitions acquire
matching indexes.) Note, however, that any partition that is created
in the future using CREATE TABLE ... PARTITION OF will automatically
have a matching index, regardless of whether ONLY is specified.
small demo example:
create table index_part (a int, b int) partition by range (a, b);
create table index_part1 partition of index_part for values from (0,0) to (10, 10);
create table index_part2 partition of index_part for values from (10,10) to (20, 20);
create index index_part_a_b_idx on only index_part (a, b);
now is INVALID:
\d+ index_part_a_b_idx
---
btree, for table "public.index_part", invalid
Partitions: index_part2_a_b_idx
Access method: btree
create index idxpart1_a_b_idx on index_part1 (a, b);
alter index index_part_a_b_idx attach partition idxpart1_a_b_idx;
still INVALID.
\d+ index_part_a_b_idx
---
btree, for table "public.index_part", invalid
Partitions: idxpart1_a_b_idx
Access method: btree
then
create index idxpart2_a_b_idx on index_part2(a, b);
alter index index_part_a_b_idx attach partition idxpart2_a_b_idx;
now ISVALID.
select indisvalid from pg_index where indexrelid = 'idxpart2_a_b_idx'::regclass; ---return true.
Related
I'm running postgresql 13.
The below section of the postgres doc doc says I should be able to avoid a scan and ACCESS EXCLUSIVE lock to validate the partition constraint.
Before running the ATTACH PARTITION command, it is recommended to create a CHECK constraint on the table to be attached that matches the expected partition constraint, as illustrated above. That way, the system will be able to skip the scan which is otherwise needed to validate the implicit partition constraint. Without the CHECK constraint, the table will be scanned to validate the partition constraint while holding an ACCESS EXCLUSIVE lock on that partition.
But, when I create a new partition with a check constraint, insert data into it, and then attach it, an ACCESS EXCLUSIVE lock is held while the table is scanned.
The partitioned table:
CREATE TABLE IF NOT EXISTS tasks
(
task_time timestamp(6) with time zone not null,
task_sp_time timestamp(6) with time zone,
task_org_id text not null,
build_id text,
unit_id text,
unit_req numeric(12,2),
... 30 columns truncated ...,
constraint tasks_pkey1
primary key (task_org_id, task_time)
)
partition by RANGE(task_time);
task_time is not null and of type timestamp (6) with timezone.
-- create new empty partition table
CREATE TABLE tasks_partitions.tasks_20230111
(LIKE tasks INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
-- add CHECK constraint on new partition
ALTER TABLE tasks_partitions.tasks_20230111 ADD CONSTRAINT tmp_20230111
CHECK (task_time >= '2023-01-11 00:00:00+00' AND task_time <= '2023-01-11 23:59:59.999999+00');
-- select around 100 million rows into the new partition from an old default partition that has been detached.
INSERT INTO tasks_partitions.tasks_20230111
SELECT * FROM tasks_partitions.tasks_default_old where (task_time >= '2023-01-11 00:00:00+00' AND task_time <= '2023-01-11 23:59:59.999999+00');
-- attach partition
ALTER TABLE tasks ATTACH PARTITION tasks_partitions_tasks_20230111
FOR VALUES FROM ('2023-01-11 00:00:00+00') TO ('2023-01-11 23:59:59.999999+00')
Attaching the partition still holds the ACCESS EXLUSIVE lock and the entire table is scanned.
The tasks table did have a default partition at one point, but I detached it and renamed it in order to resolve another issue. I currently do not have a default partition attached to tasks.
When I attach the partition from the example above, I see an ACCESS EXCLUSIVE lock on the new partition and a seemingly random relation, 468140. I cannot insert any records into the tasks table while the partition is being attached and the locks are in place.
If it helps, the query I run to see locks is:
SELECT a.datname,
l.relation::regclass,
l.transactionid,
l.mode,
l.GRANTED,
l.usename,
a.query,
a.query_start,
age(now(), a.query_start) AS "age",
a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
ORDER BY a.query_start;
The check constraint you are creating does not match the partition boundaries. You missed this statement from the documentation:
When creating a range partition, the lower bound specified with FROM is an inclusive bound, whereas the upper bound specified with TO is an exclusive bound.
So you should define the constraint as
ALTER TABLE tasks_partitions.tasks_20230111 ADD
CHECK (task_time >= '2023-01-11 00:00:00+00' AND
task_time < '2023-01-12 00:00:00+00');
and attach the partition with
ALTER TABLE tasks ATTACH PARTITION tasks_partitions_tasks_20230111
FOR VALUES FROM ('2023-01-11 00:00:00+00')
TO ('2023-01-12 00:00:00+00');
Specifically, this section of the postgres doc doc says I should be able to avoid a scan of the default partition:
Before running the ATTACH PARTITION command, it is recommended to create a CHECK constraint on the table to be attached that matches the expected partition constraint, as illustrated above. That way, the system will be able to skip the scan which is otherwise needed to validate the implicit partition constraint. Without the CHECK constraint, the table will be scanned to validate the partition constraint while holding an ACCESS EXCLUSIVE lock on that partition. It is recommended to drop the now-redundant CHECK constraint after the ATTACH PARTITION is complete. If the table being attached is itself a partitioned table, then each of its sub-partitions will be recursively locked and scanned until either a suitable CHECK constraint is encountered or the leaf partitions are reached.
Similarly, if the partitioned table has a DEFAULT partition, it is recommended to create a CHECK constraint which excludes the to-be-attached partition's constraint. If this is not done then the DEFAULT partition will be scanned to verify that it contains no records which should be located in the partition being attached. This operation will be performed whilst holding an ACCESS EXCLUSIVE lock on the DEFAULT partition. If the DEFAULT partition is itself a partitioned table, then each of its partitions will be recursively checked in the same way as the table being attached, as mentioned above.
But, the below doesn't work for me:
task_time is not null and of type timestamp (6) with timezone.
-- create new empty partition table
CREATE TABLE tasks_partitions.tasks_20230111
(LIKE tasks INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
-- add CHECK constraint on new partition
ALTER TABLE tasks_partitions.tasks_20230111 ADD CONSTRAINT tmp_20230111
CHECK (task_time >= '2023-01-11 00:00:00+00' AND task_time <= '2023-01-11 23:59:59.999999+00');
-- add CHECK constraint on default partition that excludes new partition constraint
ALTER TABLE tasks_partitions.tasks_20230111 ADD CONSTRAINT tmp20230111_default
CHECK (task_time < '2023-01-11 00:00:00+00' and task time > '2023-01-11 23:59:59.999999+00') NOT VALID;
-- attach partition
ALTER TABLE tasks ATTACH PARTITION tasks_partitions_tasks_20230111
FOR VALUES FROM ('2023-01-11 00:00:00+00') TO ('2023-01-11 23:59:59.999999+00')
Attaching the partition still holds the AccessExclusiveLock.
This operation will always take an ACCESS EXCLUSIVE lock. The documentation only tells you how you can reduce the time the lock is held.
I am using postgresql 14.1, and I re-created my live database using parititons for some tables.
since i did that, i could create index when the server wasn't live, but when it's live i can only create the using concurrently but unfortunately when I try to create an index concurrently i get an error.
running this:
create index concurrently foo on foo_table(col1,col2,col3));
provides the error:
ERROR: cannot create index on partitioned table "foo_table" concurrently
now it's a live server and i cannot create indexes not concurrently and i need to create some indexes in order to improve performance. any ideas how do to that ?
thanks
No problem. First, use CREATE INDEX CONCURRENTLY to create the index on each partition. Then use CREATE INDEX to create the index on the partitioned table. That will be fast, and the indexes on the partitions will become the partitions of the index.
Step 1: Create an index on the partitioned (parent) table
CREATE INDEX foo_idx ON ONLY foo (col1, col2, col3);
This step creates an invalid index. That way, none of the table partitions will get the index applied automatically.
Step 2: Create the index for each partition using CONCURRENTLY and attach to the parent index
CREATE INDEX CONCURRENTLY foo_idx_1
ON foo_1 (col1, col2, col3);
ALTER INDEX foo_idx
ATTACH PARTITION foo_idx_1;
Repeat this step for every partition index.
Step 3: Verify that the parent index created at the beginning (Step 1) is valid. Once indexes for all partitions are attached to the parent index, the parent index is marked valid automatically.
SELECT * FROM pg_index WHERE pg_index.indisvalid = false;
The query should return zero results. If thats not the case then check your script for mistakes.
I have a Postgres table named: services, and it has columns called id, mac_addr, dns_name, hash, and it is partitioned based on mac_addr, so the partition tables names look like: services_3eeeea123e3 and so on. And there are around 20K partition based on mac_addrs
Q1: there was no index created when the tables were created. so now, when I am trying to add an index CREATE INDEX idx_services_id on services (id), it throws an error ERROR: cannot create an index on partitioned table "services"
But I am able to add indexes to individual partitioned tables CREATE INDEX idx_services_3eeeea123e3 on services_3eeeea123e3 (id).
So do I have to create an index on each partition table now? Is there a way to create an index on the base table(services) itself, which will automatically create an index on each partition table?
Q2: When I run a select query, it is fast when I use the direct partition table; however, using the base table is very slow. Any idea what could be the reason.
Fast: SELECT id, dns_name, hash from services_3eeeea123e3 where id='123232'
very slow: SELECT id,dns_name, hash from services where mac_addr='3eeeea123e3' and id='123232'
postgres 14
I have some table:
CREATE TABLE sometable (
id integer NOT NULL PRIMARY KEY UNIQUE ,
a integer NOT NULL DEFAULT 1,
b varchar(32) UNIQUE)
PARTITION BY RANGE (id);
But when i try to execute it, i get
ERROR: unique constraint on partitioned table must include all partitioning columns
If i execute same table definition without PARTITION BY RANGE (id) and check indexes, i get:
tablename indexname indexdef
sometable, sometable_b_key, CREATE UNIQUE INDEX sometable_b_key ON public.sometable USING btree (b)
sometable, sometable_pkey, CREATE UNIQUE INDEX sometable_pkey ON public.sometable USING btree (id)
So... unique constraints exist
whats the problem? how can i fix it?
On partitioned tables, all primary keys, unique constraints and unique indexes must contain the partition expression. That is because indexes on partitioned tables are implemented by individual indexes on each partition, and there is no way to enforce uniqueness across different indexes.
If you want to use partitioning, you have to sacrifice some consistency guarantees. There is no way around that. What you can do is create unique constraints on the partitions. That will guarantee uniqueness within each partition, but not global uniqueness.
This limitation is also mentioned in the docs
5.11.2.3. Limitations The following limitations apply to partitioned tables:
Unique constraints (and hence primary keys) on partitioned tables must
include all the partition key columns. This limitation exists because
the individual indexes making up the constraint can only directly
enforce uniqueness within their own partitions; therefore, the
partition structure itself must guarantee that there are not
duplicates in different partitions.
There is no way to create an exclusion constraint spanning the whole
partitioned table. It is only possible to put such a constraint on
each leaf partition individually. Again, this limitation stems from
not being able to enforce cross-partition restrictions.
https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE