I wonder if there is a limit for partition table by list where each subpartition table contains only one element.
For example, I have this partition table:
CREATE TABLE whatever (
city_id int not null,
country_id int not null,
) PARTITION BY LIST (country_id);
And I create millions of subpartition tables:
CREATE TABLE whatever_1 PARTITION OF whatever
FOR VALUES IN (1);
CREATE TABLE whatever_2 PARTITION OF whatever
FOR VALUES IN (2);
# until millions...
CREATE TABLE whatever_10000000 PARTITION OF whatever
FOR VALUES IN (10000000);
Assuming an index on country_id, would that still work?
Or Will I hit the 65000 limit as described here?
Even with PostgreSQL v13, anything that goes beyond at most a few thousand partitions won't work well, and it's better to stay lower.
The reason is that when you use a partitioned table in an SQL statement, the optimizer has to consider all partitions separately. It has to figure out which of the partitions it has to use and which not, and for all partitions that it uses it has to come up with an execution plan. Consequently, planning time will go up as the number of partitions increases. This may not matter for large analytical queries, where execution time dominates, but it will considerably slow down the execution of small statements.
Use longer lists or use range partitioning.
Related
This question is for a database using PostgreSQL 12.3; we are using declarative partitioning and ON CONFLICT against the partitioned table is possible.
We had a single table representing application event data from client activity. Therefore, each row has fields client_id int4 and a dttm timestamp field. There is also an event_id text field and a project_id int4 field which together formed the basis of a composite primary key. (While rare, it was possible for two event records to have the same event_id but different project_id values for the same client_id.)
The table became non-performant, and we saw that queries most often targeted a single client in a specific timeframe. So we shifted the data into a partitioned table: first by LIST (client_id) and then each partition is further partitioned by RANGE(dttm).
We are running into problems shifting our upsert strategy to work with this new table. We used to perform a query of INSERT INTO table SELECT * FROM staging_table ON CONFLICT (event_id, project_id) DO UPDATE ...
But since the columns that determine uniqueness (event_id and project_id) are not part of the partitioning strategy (dttm and client_id), I can't do the same thing with the partitioned table. I thought I could get around this by building UNIQUE indexes on each partition on (project_id, event_id) but the ON CONFLICT is still not firing because there is no such unique index on the parent table (there can't be, since it doesn't contain all partitioning columns). So now a single upsert query appears impossible.
I've found two solutions so far but both require additional changes to the upsert script that seem like they'd be less performant:
I can still do an INSERT INTO table_partition_subpartition ... ON CONFLICT (event_id, project_id) DO UPDATE ... but that requires explicitly determining the name of the partition for each row instead of just INSERT INTO table ... once for the entire dataset.
I could implement the "old way" UPSERT procedure: https://www.postgresql.org/docs/9.4/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE but this again requires looping through all rows.
Is there anything else I could do to retain the cleanliness of a single, one-and-done INSERT INTO table SELECT * FROM staging_table ON CONFLICT () DO UPDATE ... while still keeping the partitioning strategy as-is?
Edit: if it matters, concurrency is not an issue here; there's just one machine executing the UPSERT into the main table from the staging table on a schedule.
What is the difference between a BRIN index and a table partition in PostgreSQL? When I should use one instead of another? It seems that they provide very similar benefits and also have similar use cases
Example
Suppose we have the following table structure
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
store_id INT,
client_id INT,
created_at timestamp,
information jsonb
)
that has the following characteristics:
orders can only be inserted, deletions are not allowed and updates are very rare and they don't involve the created_at column
the created_at column contains the timestamp of the insertion of the row in the database thus the values in the column are strictly increasing
almost every query use the created_at column in a condition and some of them may use the store_id and client_id columns
the most accessed rows are the most recent ones in terms of the created_at column
some queries may return a few records (example: analyzing a single record or the records created in a small time interval) while others may scan a vast amount of records (example: aggregate functions for a dashboard functionality)
I have chosen this example because it's very common and also both approach could be used (in my opinion). In this case which choice should I use between a BRIN index on the whole table or a partitioned table with maybe a btree index (or just a simple btree index without partitioning)? Does the table dimension influence the choice?
I have used both features (although I'll caveat that my experience with partitioning is from back when you had to use inheritance + constraints, before the introduction of CREATE TABLE ... PARTITION BY). You are correct that they seem similar-ish on a surface level, but they function by completely different mechanisms.
Table partitioning basically works as follows: replace all references to table with (select * from table_partition1 union all select * from table_partition2 /* repeat for all partitions */). The partitions will have a constraint on the partition columns, so that if those columns appear in a WHERE, the constraints can be applied up-front to prune which partitions are actually scanned. IOW, if table_partition1 has CHECK(client_id=1), and your WHERE Has client_id=2, table_partition1 will be skipped since the table constraint automatically excludes all rows from this partition from passing that WHERE.
BRIN indexes, in contrast, choose a block size for the table, and then for each block, records a min/max bound of the indexed column. This allows WHERE conditions to skip entire blocks when we can see, say, that the maximum created_at in a particular block of rows is below a created_at>={some_value} clause in your WHERE.
I can't tell you a definitive answer for your case as to which would work better. Well, that's not true, actually: the definitive answer is, "benchmark it for your own data" ;)
This is kind of fuzzy, but my general feeling is that BRIN is lightweight, and table partitioning is not. BRIN is something that can be added on to an existing table without much trouble, the indexes themselves are very small, and the impact on writes is not major (at least, not without inordinately many indices). Table partitioning, on the other hand, is a different way of representing the data on-disk; you are actually determining into which data files particular rows will be written. This requires a much more involved migration process when introducing it to an existing dataset.
However, the set of query optimizations available for table partitioning is much greater. Not only is there the constraint exclusion I described above, but you can also have indices (even BRIN ones!) on each individual partition. Of course, you can also have BRIN + other indices on a single-big-table, but I'm not sure that is particularly helpful IRL.
A few other thoughts: BRIN is good for monotonic data (timestamps, incremnting IDs, etc); the more correlated the on-disk ordering is to the indexed value, the more effective a BRIN index can be at pruning blocks to be scanned. Things like customer IDs, however, are unlikely to work well with BRIN; any given block of rows is likely to have at least one relatively low and relatively high ID. However, fields that like work quite well for partitioning: a partition-per-client, or partitioning on the modulus of a customer ID (which would more commonly be called sharding), is a good way of scaling horizontally, almost without bound.
Any update, even if it does not change the indexed column, will make a BRIN index pretty useless (unless it is a HOT update). Even without that, there are differences, for example:
partitioning allows you to get rid of lots of data efficiently, a BRIN index won't
a partitioned table allows one autovacuum worker per partition, which improves autovacuum performance
But if your only concern is to efficiently select all rows for a certain value of the index or partitioning key, both may offer about the same benefit.
I have one application to store and query the time series data from multiple sensors. The sensor readings of multiple months needed to be stored. And we also need add more and more sensors in the future. So I need consider the scalability in two dimensions, time and sensor ID. We are using postgresql db for data storage. In addition, to simplify the data query layer design, we want to use one table name to query all the data.
In order to improve the query efficiency and scalability, I am considering to use Partitions for this use case. And I want to create the partitions based on two columns. RANGE for the event time for the readings. And VALUE for the sensor ID. So under the paritioned table, I want to get some sub table as sensor_readings_1week_Oct_2020_ID1, sensor_readings_2week_Oct_2020_ID1, sensor_readings_1week_Oct_2020_ID2. I knew PostgreSQL supports multiple columns Partition, but from most examples I can only see RANGE are used for all the columns. One example is as below. How can I create the multiple column partitions, one is for the time RANGE, another is based on the specific sensor ID? Thanks!
CREATE TABLE p1 PARTITION OF tbl_range
FOR VALUES FROM (1, 110, 50) TO (20, 200, 200);
Or are there some better solutions besides Paritions for this use case?
The two level partitions is a good solution for my use case. It improves the efficiency a lot.
CREATE TABLE sensor_readings (
id bigserial NOT NULL,
create_date_time timestamp NULL DEFAULT now(),
value int8 NULL
) PARTITION BY LIST (id);
CREATE TABLE sensor_readings_id1
PARTITION OF sensor_readings
FOR VALUES IN (111) PARTITION BY RANGE(create_date_time);
CREATE TABLE sensor_readings_id1_wk1
PARTITION OF sensor_readings_id1
FOR VALUES FROM (''2020-10-01 00:00:00'') TO (''2020-10-20 00:00:00'');
I'm working on table partitioning in PostgreSQL.
I created a partition for my master table:
CREATE TABLE head_partition_table PARTITION OF master_table
FOR VALUES FROM (DATE_START) TO (DATE_END)
PARTITION BY RANGE (ENTITY_ID, GROUP_NAME);
After that, I want to divide head_partition_table into smaller partitions, so I wrote code:
CREATE TABLE subpartition_table OF head_partititon_table
FOR VALUES FROM ('0', 'A') TO ('0', 'Z');
I can't find how I can specify individual values rather than a range.
Something like
CREATE TABLE subpartition_table OF head_partititon_table
FOR VALUES ('0', 'A');
CREATE TABLE subpartition_table OF head_partititon_table
FOR VALUES ('0', 'Z');
I get a syntax error at or near "(".
Is this possible?
P.S. I tried PARTITION BY LIST, but in that case, I can use just one field.
You could partition these by list like you want by introducing another layer of partitions:
CREATE TABLE head_partition_table PARTITION OF master_table
FOR VALUES FROM ('2019-01-01') TO ('2020-01-01')
PARTITION BY LIST (entity_id);
CREATE TABLE subpartition1 PARTITION OF head_partition_table
FOR VALUES IN ('0')
PARTITION BY LIST (group_name);
CREATE TABLE subsubpartition1 PARTITION OF subpartition1
FOR VALUES IN ('A');
But this is more an academic exercise than something useful.
Anything exceeding a at most few hundred partitions will not perform well at all.
I have a table that is initially partitioned by day. At the end of every day no more records will be added to that partition, so I cluster the index and then I then do a lot of number crunching and aggregation on that table (using the index I clustered):
CLUSTER table_a_20181104 USING table_a_20181104_index1;
After a few days (typically a week) I merge the partition for one day into a larger partition that contains all the days data for that month. I use this SQL to achieve this:
WITH moved_rows AS
(
DELETE FROM table_a_20181104
RETURNING *
)
INSERT INTO table_a_201811
SELECT * FROM moved_rows;
After maybe a month or too I change the tablespace to move the data from an SSD disk to a conventional magnetic hard disk.
ALTER TABLE ... SET TABLESPACE ...
My initial clustering of the index at the end of the day definitely improves the performance of the queries run against it.
I know that clustering is a one-off command and needs to be repeated if new records are added/removed.
My questions are:
Do I need to repeat the clustering after merging the 'day' partition into the 'month' partition?
Do I need to repeat the clustering after altering the tablespace?
Do I need to repeat the clustering if I VACUUM the partition?
Moving the data from one partition to the other will destroy the clustering, so you'll need to re-cluster after it.
ALTER TABLE ... SET TABLESPACE will just copy the table files as they are, so clustering will be preserved.
VACUUM does not move the rows, so clustering will also be preserved.