Use of Postgres table inheritance - postgresql

Since Postgres also supports partitioned tables, what is the use of child table.
Suppose there is a table of users which has a column created_date. We can store data in 2 ways:
We create many child tables of this user table and distribute the data of users on the basis of created_date (say, one table for every date, like user_jan01_21).
We can create a partitioned table with the partitioning key created_date
Then what is the difference between these solution?
Basically, I want to know what problem table inheritance can solve that partitioning cannot.
Another doubt I have: if I follow solution 1, and I query the user table without the ONLY keyword, will it scan all the child tables?
For example:
SELECT * FROM WHERE where created_date = current_date - 10;

If the objective is partitioning, as in your example, then there is no advantage in using table inheritance. Declarative partitioning is far superior in ease of use, performance and available features.
Table inheritance has uses that are unrelated to partitioning. Features that partitioning doesn't offer are:
the child table can have additional columns
a table can inherit from more than one table
With table inheritance, if you select from the parent table, you will also get all results from the child tables, just as if you had used UNION ALL to combine the results.

Related

How do you manage UPSERTs on PostgreSQL partitioned tables for unique constraints on columns outside the partitioning strategy?

This question is for a database using PostgreSQL 12.3; we are using declarative partitioning and ON CONFLICT against the partitioned table is possible.
We had a single table representing application event data from client activity. Therefore, each row has fields client_id int4 and a dttm timestamp field. There is also an event_id text field and a project_id int4 field which together formed the basis of a composite primary key. (While rare, it was possible for two event records to have the same event_id but different project_id values for the same client_id.)
The table became non-performant, and we saw that queries most often targeted a single client in a specific timeframe. So we shifted the data into a partitioned table: first by LIST (client_id) and then each partition is further partitioned by RANGE(dttm).
We are running into problems shifting our upsert strategy to work with this new table. We used to perform a query of INSERT INTO table SELECT * FROM staging_table ON CONFLICT (event_id, project_id) DO UPDATE ...
But since the columns that determine uniqueness (event_id and project_id) are not part of the partitioning strategy (dttm and client_id), I can't do the same thing with the partitioned table. I thought I could get around this by building UNIQUE indexes on each partition on (project_id, event_id) but the ON CONFLICT is still not firing because there is no such unique index on the parent table (there can't be, since it doesn't contain all partitioning columns). So now a single upsert query appears impossible.
I've found two solutions so far but both require additional changes to the upsert script that seem like they'd be less performant:
I can still do an INSERT INTO table_partition_subpartition ... ON CONFLICT (event_id, project_id) DO UPDATE ... but that requires explicitly determining the name of the partition for each row instead of just INSERT INTO table ... once for the entire dataset.
I could implement the "old way" UPSERT procedure: https://www.postgresql.org/docs/9.4/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE but this again requires looping through all rows.
Is there anything else I could do to retain the cleanliness of a single, one-and-done INSERT INTO table SELECT * FROM staging_table ON CONFLICT () DO UPDATE ... while still keeping the partitioning strategy as-is?
Edit: if it matters, concurrency is not an issue here; there's just one machine executing the UPSERT into the main table from the staging table on a schedule.

Postgresql: many one-to-one tables instead of one big table

Is it viable to use many tables that are in one-to-one relationship instead of one big table?
The main goal is to be able to easily change schema when needed. Create/drop a table when needed, instead of a new column or changing a column.
How scalable will this be with Postgresql?

Querying across multiple tables with identical schemas

I'm trying to run the same query over multiple tables in my Postgres database, that all have the same schema.
This question: Select from multiple tables without a join?
shows that this is possible, however they are hard-coding the set of tables.
I have another query that returns the five specific tables I would like my main query to run on. How can I go about using the result of this with the UNION approach?
In short, I want my query to see the five specific tables (determined by the outcome of another query) as one large table when it runs the query.
I understand that in many cases similar to my scenario you'd simply just want to merge the tables. I can not do this.
One way of doing this that may satisfy your constraints is using table inheritance. In short, you will need to create a parent table with the same schema, and for each child you want to query you must ALTER that_table INHERIT parent_table. Any queries against the parent table will query all of the child tables. If you need to query different tables in different circumstances, I think the best way would be to add a column named type or some such, and query only certain values of that table.

Postgres table partitioning with star schema

I have a schema with one table with the majority of data, customer, and three other tables with foreign key references to customer.entry_id which is a BIGSERIAL field. The three other tables are called location, devices and urls where we store various data related to a specific entry in the customer table.
I want to partition the customer table into monthly child tables, and have that part worked out; customer will stay as-is, each month will have a table customer_YYYY_MM that inherits from the master table with the right CHECK constraint and indexes will be created on each individual child table. Data will be moved to the correct child tables while the master table stays empty.
My question is about the other three tables, as I want to partition them as well. However, they have no date information (at all), only the reference to the primary key from the master table. How can I setup the constraints on these tables? Is it even meaningful or possible without date information?
My application logic knows where to insert all the data (it's fairly trivial), but I expect to be able to do simple SELECT queries without specifying which child tables to get it from. So this should work as you would expect from non-partitioned tables:
SELECT l.*
FROM customer c
JOIN location l USING entry_id
WHERE c.date_field > '2015-01-01'
I would partition them by the reference key. The foreign key is used in join conditions and is not usually subject to change so it fulfills the following important points:
Partition by the information that is used mostly in the WHERE clauses of the queries or other parts where partitioning can be used to filter out tables that don't need to be scanned. As one guide puts it:
The objective when defining partitions should be to allow as many queries as possible to fetch data from as few partitions as possible - ideally one.
Partition by information that is not going to be changed so that rows don't constantly need to be thrown from one subtable to another
This all depends of the size of the tables too of course. If the sizes stay small then there is no need to partition.
Read more about partitioning here.
Use views:
create view customer as
select * from customer_jan_15 union all
select * from customer_feb_15 union all
select * from customer_mar_15;
create view location as
select * from location_jan_15 union all
select * from location_feb_15 union all
select * from location_mar_15;

What should be the strategy to read from many tables having millions rows each in postgresql?

I have following scenario while using postgresql -
No of tables - 100 ,
No of rows per table - ~ 10 Million .
All the tables have same schema E.g. each table contains daily call records of a company. So 100 tables contain call records of 100 days.
I want to make following type of queries on these tables -
For each column of each table get count of records having null value in that column.
So considering above scenario, what can be the major optimizations in table structures ? How should i prepare my query and does there exist any efficient way of querying for such cases
If you're using Postgres table inheritance, a simple select count(*) from calls where foo is null will work fine. It will use an index on foo provided null foo rows aren't too common.
Internally, that will do what you'd do manually without table inheritance, i.e. union all the result for each individual child table.
If you need to run this repeatedly, maintain the count in memcached or in another table.