Adding indexes to a postgres table after it is created and and having partitions - postgresql

I have a Postgres table named: services, and it has columns called id, mac_addr, dns_name, hash, and it is partitioned based on mac_addr, so the partition tables names look like: services_3eeeea123e3 and so on. And there are around 20K partition based on mac_addrs
Q1: there was no index created when the tables were created. so now, when I am trying to add an index CREATE INDEX idx_services_id on services (id), it throws an error ERROR: cannot create an index on partitioned table "services"
But I am able to add indexes to individual partitioned tables CREATE INDEX idx_services_3eeeea123e3 on services_3eeeea123e3 (id).
So do I have to create an index on each partition table now? Is there a way to create an index on the base table(services) itself, which will automatically create an index on each partition table?
Q2: When I run a select query, it is fast when I use the direct partition table; however, using the base table is very slow. Any idea what could be the reason.
Fast: SELECT id, dns_name, hash from services_3eeeea123e3 where id='123232'
very slow: SELECT id,dns_name, hash from services where mac_addr='3eeeea123e3' and id='123232'

Related

How do you manage UPSERTs on PostgreSQL partitioned tables for unique constraints on columns outside the partitioning strategy?

This question is for a database using PostgreSQL 12.3; we are using declarative partitioning and ON CONFLICT against the partitioned table is possible.
We had a single table representing application event data from client activity. Therefore, each row has fields client_id int4 and a dttm timestamp field. There is also an event_id text field and a project_id int4 field which together formed the basis of a composite primary key. (While rare, it was possible for two event records to have the same event_id but different project_id values for the same client_id.)
The table became non-performant, and we saw that queries most often targeted a single client in a specific timeframe. So we shifted the data into a partitioned table: first by LIST (client_id) and then each partition is further partitioned by RANGE(dttm).
We are running into problems shifting our upsert strategy to work with this new table. We used to perform a query of INSERT INTO table SELECT * FROM staging_table ON CONFLICT (event_id, project_id) DO UPDATE ...
But since the columns that determine uniqueness (event_id and project_id) are not part of the partitioning strategy (dttm and client_id), I can't do the same thing with the partitioned table. I thought I could get around this by building UNIQUE indexes on each partition on (project_id, event_id) but the ON CONFLICT is still not firing because there is no such unique index on the parent table (there can't be, since it doesn't contain all partitioning columns). So now a single upsert query appears impossible.
I've found two solutions so far but both require additional changes to the upsert script that seem like they'd be less performant:
I can still do an INSERT INTO table_partition_subpartition ... ON CONFLICT (event_id, project_id) DO UPDATE ... but that requires explicitly determining the name of the partition for each row instead of just INSERT INTO table ... once for the entire dataset.
I could implement the "old way" UPSERT procedure: https://www.postgresql.org/docs/9.4/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE but this again requires looping through all rows.
Is there anything else I could do to retain the cleanliness of a single, one-and-done INSERT INTO table SELECT * FROM staging_table ON CONFLICT () DO UPDATE ... while still keeping the partitioning strategy as-is?
Edit: if it matters, concurrency is not an issue here; there's just one machine executing the UPSERT into the main table from the staging table on a schedule.

Future index creation on partition tables postgres [duplicate]

I am using postgresql 14.1, and I re-created my live database using parititons for some tables.
since i did that, i could create index when the server wasn't live, but when it's live i can only create the using concurrently but unfortunately when I try to create an index concurrently i get an error.
running this:
create index concurrently foo on foo_table(col1,col2,col3));
provides the error:
ERROR: cannot create index on partitioned table "foo_table" concurrently
now it's a live server and i cannot create indexes not concurrently and i need to create some indexes in order to improve performance. any ideas how do to that ?
thanks
No problem. First, use CREATE INDEX CONCURRENTLY to create the index on each partition. Then use CREATE INDEX to create the index on the partitioned table. That will be fast, and the indexes on the partitions will become the partitions of the index.
Step 1: Create an index on the partitioned (parent) table
CREATE INDEX foo_idx ON ONLY foo (col1, col2, col3);
This step creates an invalid index. That way, none of the table partitions will get the index applied automatically.
Step 2: Create the index for each partition using CONCURRENTLY and attach to the parent index
CREATE INDEX CONCURRENTLY foo_idx_1
ON foo_1 (col1, col2, col3);
ALTER INDEX foo_idx
ATTACH PARTITION foo_idx_1;
Repeat this step for every partition index.
Step 3: Verify that the parent index created at the beginning (Step 1) is valid. Once indexes for all partitions are attached to the parent index, the parent index is marked valid automatically.
SELECT * FROM pg_index WHERE pg_index.indisvalid = false;
The query should return zero results. If thats not the case then check your script for mistakes.

Adding a partition from another DB to existing partitioned table in Postgres 12

Short description of a problem
I need to build a partitioned table "users" with 2 partitions located on separate servers (Moscow and Hamburg), each partition is table with columns:
id - integer primary key with auto increment
region - smallint partition key, which equals either 100 for Hamburg or 200 for Moscow
login - unique character varying with length of 100.
I intended to make sequences for id as n*1000 + 100 for Hamburg, and n*1000 + 200 for Moscow, so just looking on primary key I will know which partition it belongs to.
region is intended to be read only and never change after creation, so no records will move between partitions.
SELECT queries must be able to return records from all partitions and UPDATE queries must be able to modify records on all partitions, INSERT/DELETE queries must be able to add/delete records only to local partition, so data stored in them is not completely isolated.
What was done
Using pgAdmin4
I created a "test" table on Hamburg server, added all column info, marked it as partitioned table with partition key region and partition type List.
I created a "hamburg" partition in this table, adding primary key constraint as id,region and unique key constraint as login,region.
I created a "moscow" table on Moscow server with the same column info as "test"
I added postgres_fdw extension to Hamburg server, created Foreign server pointing to DB on Moscow server and User mapping.
I added "moscow" foreign table to Hamburg server pointing to "moscow" table on Moscow server.
What is my problem
I couldn't figure out how to attach this foreign table as second partition to "test" table.
When I tried to attach partition through pgAdmin dialog in "test" table partitions properties it shows me an error: cannot unpack non-iterable Response object
When I tried to add partition with query as follows:
ALTER TABLE public.test ATTACH PARTITION public.moscow FOR VALUES IN (200);
It shows me an error:
ERROR: cannot attach foreign table "moscow" as partition of partitioned table "test"
DETAIL: Table "test" contains unique indexes.
SQL state: 42809
I removed unique constraint from login column but it shows the same error.
When I make partitioned table with the same properties and both partitions initially located on the same server all works well, except for postgres watch for login uniqueness per-partition rather than in whole table, but I suggest this is its limitation.
So, how can I attach a table located on the second server as partition to partitioned table located on the first one?
The error message is pretty clear: Since you cannot create an index on a partitioned table, PostgreSQL cannot create a partition of the unique index. But the unique index is required to implement the constraint.
See this source comment:
/*
* If we're attaching a foreign table, we must fail if any of the indexes
* is a constraint index; otherwise, there's nothing to do here. Do this
* before starting work, to avoid wasting the effort of building a few
* non-unique indexes before coming across a unique one.
*/
Either drop the unique constraint or don't use foreign tables as partitions.
Ok, I was finally able to add a foreign table as partition to partitioned table.
It was necessary to drop primary key property on id and unique property on login columns for partitioned table
After that I was able to attach foreign table as partition to partitioned table
Later I have added primary key property on id and unique property on login columns for each local partition.
So in the end I have unique global id as it is generated by sequences for each DB with never intersected values. For login uniqueness I have to manually check global table if there is any record with it before inserting.
P.S. Hopefully, this partitioning mechanism in postgres is suitable for geographically distant regions.

Postgres table with thousands of partition

I have a postgres table (in postgres12) which is supposed to have thousands of partitions (200k at least) in near future.
Here is how I am creating the parent table:
create table if not exists content (
key varchar(20) not NULL,
value json not null default '[]'::json
) PARTITION BY LIST(key)
And then adding any given child tables like:
create table if not exists content_123 PARTITION OF content for VALUES in ('123');
Also I am adding an index on top of the child table for quick access (since I will be accessing the child table directly):
create index if not exists content_123_idx on content_123 using btree(key)
Here is my question: I have never in the past managed this many partitions in a postgres table so I am just wondering is there any downside of doing what I am doing? Also, (as mentioned above) I will not be querying from the parent table directly, but will read directly from individual child tables.
With these table definitions, the index is completely useless.
With 200000 partitions, query planning will become intolerably slow, and each SQL statement will need very many locks and open files. This won't work well.
Lump several keys together into a single partition (then the index might make sense).

Indexing on Timescaledb

I am testing some queries on Postgresql extension Timescaledb.
The table is called timestampdb and i run some queries on that seems like this
select id13 from timestampdb where timestamp1 >='2010-01-01 00:05:00' and timestamp1<='2011-01-01 00:05:00',
select avg(id13)::numeric(10,2) from timestasmpdb where timestamp1>='2015-01-01 00:05:00' and timestamp1<='2015-01-01 10:30:00'
When i create a hypertable i do this.
create hyper_table('timestampdb','timestamp1')
The thing is that now i want to create an index on id13.
should i try something like this?:
create hyper_table('timestampdb','timestamp1') ,import data of the table and then create index on timestampdb(id13)
or something like this:
create table timestampdb,then create hypertable('timestampdb',timestamp1') ,import the data and then CREATE INDEX ON timestampdb (timestamp1,id13)
What is the correct way to do this?
You can create an index without time dimension column, since you don't require it to be unique. Including time dimension column into an index is needed if an index contains UNIQUE or is PRIMARY KEY, since TimescaleDB partitions a hypertable into chunks on the time dimension column, which is timestamp1 in the question. If partitioning key will include space dimension columns in addition to time, they will need to be included too.
So in your case the following should be sufficient after the migration to hypertable:
create index on timestampdb(id13);
The question contains two queries and none of them need index on id13. It will be valuable to create the index on id13 if you expect different queries than in the question, which will contain condition or join on id13 column.