NULL in column used for range partitioning in Postgres - postgresql

I have a table partitioned by range in Postgres 10.6. Is there a way to tell one of its partitions to accept NULL for the column used as partition key?
The reason I need this is: my table size is 200GB and it's actually not yet partitioned. I want to partition it going forward, so I thought I would create an initial partition including all of the current rows, and then at the start of each month I would create another partition for that month's data.
The issue is, currently this table doesn't have the column I'll use for partitioning, so I want to add the column (initially null) and then tell that initial partition to hold all rows that have null in the partitioning key.
Another option would be to not add the column as null but to set an initial date value, but that would be time and space consuming because of the size of that table.

I would upgrade to v11 and initially define the partitioned table with just a default partition that contains all the NULL values.
Then you can add other partitions and gradually move the data by updating the NULL values.

Related

Partitioned table - is adding an index on the partition column unnecessary?

We have a table partitioned on a date column.
Some of my colleagues believe that this means that column is automatically indexed. Having looked for evidence of this I don't believe that is so. Who is right?
The manual https://www.postgresql.org/docs/current/ddl-partitioning.html (section 5.11.2.1. Example) says:
Create an index on the key column(s), as well as any other indexes you
might want, on the partitioned table. (The key index is not strictly
necessary, but in most scenarios it is helpful.) This automatically
creates a matching index on each partition, and any partitions you
create or attach later will also have such an index. An index or
unique constraint declared on a partitioned table is “virtual” in the
same way that the partitioned table is: the actual data is in child
indexes on the individual partition tables.
This suggests to me we should create the index.
Each partition has ~350K rows. Since we often query by date range on that column would each partition get its own index? Or one massive one across all partitions?
Would adding an index on this column improve or degrade performance?
There is not automatically an index on the partition column.
If you did list partitioning and every list only contains one date (i.e. every date has its own partition) then I don't think also having an index on that column would be helpful. There is not extra information in the column beyond what the partitioning already knows about.
If you did range partitioning on quarter or year, but often query by a specific date, then the index would likely be useful as it provides a lot of extra specificity.

Postgres table partition by list limit

I wonder if there is a limit for partition table by list where each subpartition table contains only one element.
For example, I have this partition table:
CREATE TABLE whatever (
city_id int not null,
country_id int not null,
) PARTITION BY LIST (country_id);
And I create millions of subpartition tables:
CREATE TABLE whatever_1 PARTITION OF whatever
FOR VALUES IN (1);
CREATE TABLE whatever_2 PARTITION OF whatever
FOR VALUES IN (2);
# until millions...
CREATE TABLE whatever_10000000 PARTITION OF whatever
FOR VALUES IN (10000000);
Assuming an index on country_id, would that still work?
Or Will I hit the 65000 limit as described here?
Even with PostgreSQL v13, anything that goes beyond at most a few thousand partitions won't work well, and it's better to stay lower.
The reason is that when you use a partitioned table in an SQL statement, the optimizer has to consider all partitions separately. It has to figure out which of the partitions it has to use and which not, and for all partitions that it uses it has to come up with an execution plan. Consequently, planning time will go up as the number of partitions increases. This may not matter for large analytical queries, where execution time dominates, but it will considerably slow down the execution of small statements.
Use longer lists or use range partitioning.

How to create multiple column partitions for postgresql db, one for time range and one for specific sensor ID?

I have one application to store and query the time series data from multiple sensors. The sensor readings of multiple months needed to be stored. And we also need add more and more sensors in the future. So I need consider the scalability in two dimensions, time and sensor ID. We are using postgresql db for data storage. In addition, to simplify the data query layer design, we want to use one table name to query all the data.
In order to improve the query efficiency and scalability, I am considering to use Partitions for this use case. And I want to create the partitions based on two columns. RANGE for the event time for the readings. And VALUE for the sensor ID. So under the paritioned table, I want to get some sub table as sensor_readings_1week_Oct_2020_ID1, sensor_readings_2week_Oct_2020_ID1, sensor_readings_1week_Oct_2020_ID2. I knew PostgreSQL supports multiple columns Partition, but from most examples I can only see RANGE are used for all the columns. One example is as below. How can I create the multiple column partitions, one is for the time RANGE, another is based on the specific sensor ID? Thanks!
CREATE TABLE p1 PARTITION OF tbl_range
FOR VALUES FROM (1, 110, 50) TO (20, 200, 200);
Or are there some better solutions besides Paritions for this use case?
The two level partitions is a good solution for my use case. It improves the efficiency a lot.
CREATE TABLE sensor_readings (
id bigserial NOT NULL,
create_date_time timestamp NULL DEFAULT now(),
value int8 NULL
) PARTITION BY LIST (id);
CREATE TABLE sensor_readings_id1
PARTITION OF sensor_readings
FOR VALUES IN (111) PARTITION BY RANGE(create_date_time);
CREATE TABLE sensor_readings_id1_wk1
PARTITION OF sensor_readings_id1
FOR VALUES FROM (''2020-10-01 00:00:00'') TO (''2020-10-20 00:00:00'');

How to update a hash partitioned column in Oracle 12c?

I have a table that is partitioned by hash on a column.
CREATE TABLE TEST(
ACCOUNT_NUMBER VARCHAR(20)
)
PARTITION BY HASH (ACCOUNT_NUMBER)
PARTITIONS 16
Now, I want to update the account_number column itself in the table because of certain requirements.
As this column is partitioned, I'm not able to issue an update command on the table like
Update test set account_number = new_value
as it results to the below error:
Error is: ORA-14402: UPDATING PARTITION WOULD CAUSE PARTITION CHANGE.
Row movement is set to disable for the table.
The one way I know is to enable row movement but I also want to explore other options.
Could you please advise me on how to solve this?

DB2 Partitioning

I know how partitioning in DB2 works but I am unaware about where this partition values exactly get stored. After writing a create partition query, for example:
CREATE TABLE orders(id INT, shipdate DATE, …)
PARTITION BY RANGE(shipdate)
(
STARTING '1/1/2006' ENDING '12/31/2006'
EVERY 3 MONTHS
)
after running the above query we know that partitions are created on order for every 3 month but when we run a select query the query engine refers this partitions. I am curious to know where this actually get stored, whether in the same table or DB2 has a different table where partition value for every table get stored.
Thanks,
table partitions in DB2 are stored in tablespaces.
For regular tables (if table partitioning is not used) table data is stored in a single tablespace (not considering LOBs).
For partitioned tables multiple tablespaces can used for its partitions.
This is achieved by the "" clause of the CREATE TABLE statement.
CREATE TABLE parttab
...
in TBSP1, TBSP2, TBSP3
In this example the first partition will be stored in TBSP1, the second in TBSP2, The third in TBSP3, the fourth in TBSP1 and so on.
Table partitions are named in DB2 - by default PART1 ..PARTn - and all these details can be looked up in the system catalog view SYSCAT.DATAPARTITIONS including the specified partition ranges.
See also http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0021353.html?cp=SSEPGG_10.5.0%2F2-12-8-27&lang=en
The column used as partitioning key can be looked up in syscat.datapartitionexpression.
There is also a long syntax for creating partitioned tables where partition names can be explizitly specified as well as the tablespace where the partitions will get stored.
For applications partitioned tables look like a single normal table.
Partitions can be detached from a partitioned table. In this case a partition is "disconnected" from the partitioned table and converted to a table without moving the data (or vice versa).
best regards
Michael
After a bit of research I finally figure it out and want to share this information with others, I hope it may come useful to others.
How to see this key values ? => For LUW (Linux/Unix/Windows) you can see the keys in the Table Object Editor or the Object Viewer Script tab. For z/OS there is an Object Viewer tab called "Limit Keys". I've opened issue TDB-885 to create an Object Viewer tab for LUW tables.
A simple query to check this values:
SELECT * FROM SYSCAT.DATAPARTITIONS
WHERE TABSCHEMA = ? AND TABNAME = ?
ORDER BY SEQNO
reference: http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0021353.html?lang=en
DB2 will create separate Physical Locations for each partition. So each partition will have its own Table-space. When you SELECT on this partitioned Table your SQL may directly go to a single partition or it may span across many depending on how your SQL is. Also, this may allow your SQL to run in parallel i.e. many TS can be accessed concurrently to speed up the SELECT.