Querying Postgres INHERITED tables directly - postgresql

Postgres allows you to create a table using inheritance. We have a design where we have 1400 tables that inherit from one main table. These tables are for each of our vendor's inventory.
When I want to query stock for a vendor, I just query the main table. When running Explain, the explanation says that it is going through all 1400 indexes and quite a few of the inherited tables. This causes the query to run very slowly. If I query only the vendor's stock table, I cut the query time to less than 50% of the time by querying the main table.
We have a join on another table that pulls identifiers for the vendor's partner vendors and we also want to query their stock. Example:
SELECT
(select m2.company from sup.members m2 where m2.id = u.id) as company,
u.id,
u.item,
DATE_PART('day', CURRENT_TIMESTAMP - u.datein::timestamp) AS daysinstock,
u.grade as condition,
u.stockno AS stocknumber,
u.ic,
CASE WHEN u.rprice > 0 THEN
u.rprice
ELSE
NULL
END AS price,
u.qty
FROM pub.net u
LEFT JOIN sup.members m1
ON m1.id = u.id OR u.id = any(regexp_split_to_array(m1.partnerslist,','))
WHERE u.ic in ('01036') -- part to query
AND m1.id = 'N40' -- vendor to query
The n40_stock table has stock for the vendor with id = N40 and N40's partner vendors (partnerslist) are G01, G06, G21, K17, N49, V02, M16 so I would also want
to query the g01_stock, g06_stock, g21_stock, k17_stock, n49_stock, v02_stock, and m16_stock tables.
I know about the ONLY clause but is there away to modify this query to get the data from ONLY the specific inherited tables?
Edit
This decreases the time to under 800ms, but I'd like it less:
WITH cte as (
SELECT partnerslist as a FROM sup.members WHERE id = 'N40'
)
SELECT
(select m2.company from sup.members m2 where m2.id = u.id) as company,
u.id,
u.item,
DATE_PART('day', CURRENT_TIMESTAMP - u.datein::timestamp) AS daysinstock,
u.grade as condition,
u.stockno AS stocknumber,
u.ic,
CASE WHEN u.rprice > 0 THEN
u.rprice
ELSE
NULL
END AS price,
u.qty
FROM pub.net u
WHERE u.ic in ('01036') -- part to query
AND u.id = any(regexp_split_to_array('N40,'||(select a from cte), ','))
I cannot retrieve the company from sup.members in the cte because I need the one from the u.id, which is different when the partner changes in the where clause.

Inherited table lookups are based on the actual WHERE clause, which maps to the CHECK table constraint. Simply inheriting tables is not good enough.
https://www.postgresql.org/docs/9.6/static/ddl-partitioning.html
Caveat, you can not use a dynamically created variables where the actual value is not implemented in the raw query. This results in a check of all inherited tables.

Related

When is it better to use CTE or temp table postgres

I am doing a query on a very large data set and i am using WITH (CTE) syntax.. this seems to take a while and i was reading online that temp tables could be faster to use in these cases can someone advise me in which direction to go. In the CTE we join to a lot of tables then we filter on the CTE result..
Only interesting in postgres answers
What version of PostgreSQL are you using? CTEs perform differently in PostgreSQL versions 11 and older than versions 12 and above.
In PostgreSQL 11 and older, CTEs are optimization fences (outer query restrictions are not passed on to CTEs) and the database evaluates the query inside the CTE and caches the results (i.e., materialized results) and outer WHERE clauses are applied later when the outer query is processed, which means either a full table scan or a full index seek is performed and results in horrible performance for large tables. To avoid this, apply as much filters in the WHERE clause inside the CTE:
WITH UserRecord AS (SELECT * FROM Users WHERE Id = 100)
SELECT * FROM UserRecord;
PostgreSQL 12 addresses this problem by introducing query optimizer hints to enable us to control if the CTE should be materialized or not: MATERIALIZED, NOT MATERIALIZED.
WITH AllUsers AS NOT MATERIALIZED (SELECT * FROM Users)
SELECT * FROM AllUsers WHERE Id = 100;
Note: Text and code examples are taken from my book Migrating your SQL Server Workloads to PostgreSQL
Summary:
PostgreSQL 11 and older: Use Subquery
PostgreSQL 12 and above: Use CTE with NOT MATERIALIZED clause
My follow up comment is more than I can fit in a comment... so understand this may not be an answer to the OP per se.
Take the following query, which uses a CTE:
with sales as (
select item, sum (qty) as sales_qty, sum (revenue) as sales_revenue
from sales_data
where country = 'USA'
group by item
),
inventory as (
select item, sum (on_hand_qty) as inventory_qty
from inventory_data
where country = 'USA' and on_hand_qty != 0
group by item
)
select
a.item, a.description, s.sales_qty, s.sales_revenue,
i.inventory_qty, i.inventory_qty * a.cost as inventory_cost
from
all_items a
left join sales s on
a.item = s.item
left join inventory i on
a.item = i.item
There are times where I cannot explain why that the query runs slower than I would expect. Some times, simply materializing the CTEs makes it run better, as expected. Other times it does not, but when I do this:
drop table if exists sales;
drop table if exists inventory;
create temporary table sales as
select item, sum (qty) as sales_qty, sum (revenue) as sales_revenue
from sales_data
where country = 'USA'
group by item;
create temporary table inventory as
select item, sum (on_hand_qty) as inventory_qty
from inventory_data
where country = 'USA' and on_hand_qty != 0
group by item;
select
a.item, a.description, s.sales_qty, s.sales_revenue,
i.inventory_qty, i.inventory_qty * a.cost as inventory_cost
from
all_items a
left join sales s on
a.item = s.item
left join inventory i on
a.item = i.item;
Suddenly all is right in the world.
Temp tables may persist across sessions, but to my knowledge the data in them will be session-based. I'm honestly not even sure if the structures persist, which is why to be safe I always drop:
drop table if exists sales;
And use "if exists" to avoid any errors about the object not existing.
I rarely use these in common queries for the simple reason that they are not as portable as a simple SQL statement (you can't give the final query to another user without having the temp tables). My most common use case is when I am processing within a procedure/function:
create procedure sales_and_inventory()
language plpgsql
as
$BODY$
BEGIN
create temp table sales...
insert into sales_inventory
select ...
drop table sales;
END;
$BODY$
Hopefully this helps.
Also, to answer your question on indexes... typically I don't, but nothing says that's always the right answer. If I put data into a temp table, I assume I'm going to use all or most of it. That said, if you plan to query it multiple times with conditions where an index makes sense, then by all means do it.

Postgres SQL query group by get most recent record instead of an aggregate

This is a current postgres query I have:
sql = """
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
avg(cms.incremental_opens),
avg(cms.incremental_clicks),
avg(cms.incremental_conversions)
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id
"""
whereby I get the average incremental_opens, incremental_clicks, and incremental_conversions per campaign group from the cms table. However, instead of the average, I want the most recent values for the 3 fields. See the cms table screenshot below - I want the values from the record with the greatest (i.e. most recent) event_id (instead of an average for all records) for a given group).
How can I do this? Thanks
It sounds like you want a lateral join.
FROM
experiments.variant_metric_snapshot vms
CROSS JOIN LATERAL (select * from experiments.campaign_metric_snapshot cms where vms.campaign_id = cms.campaign_id order by event_id desc LIMIT 1) cms
WHERE...
If you are after a quick and dirty solution you can use array_agg function with minimal change to your query.
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
(array_agg(cms.incremental_opens ORDER BY cms.event_id DESC))[1] AS incremental_opens,
..
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id;

Postgres query filter by non column in table

i have a challenge whose consist in filter a query not with a value that is not present in a table but a value that is retrieved by a function.
let's consider a table that contains all sales on database
id, description, category, price, col1 , ..... col n
i have function that retrieve me a table of similar sales from one (based on rules and business logic) . This function performs a query again on all records in the sales table and match validation in some fields.
similar_sales (sale_id integer) - > returns a integer[]
now i need to list all similar sales for each one present in sales table.
select s.id, similar_sales (s.id)
from sales s
but the similar_sales can be null and i am interested only return sales which contains at least one.
select id, similar
from (
select s.id, similar_sales (s.id) as similar
from sales s
) q
where #similar > 1 (Pseudocode)
limit x
i can't do the limit in subquery because i don't know what sales have similar or not.
I just wanted do a subquery for a set of small rows and not all entire table to get query performance gains (pagination strategy)
you can try this :
select id, similar
from sales s
cross join lateral similar_sales (s.id) as similar
where not isempty(similar)
limit x

HQL update column with COUNT

I'm working with Hibernate thus HQL, linked to a PostgreSQL database.
I have a table users and a table teams that are linked with a ManyToMany condition throught the table teams_users.
I'd like to update or select the table team so the property usersCount takes the amount of users belonging to a team.
I do not want to add a #Formula to my Entity Class, because I don't want it to be executed all the time, that's too wastful on big JOIN FETCH query where I do not need the count.
I other words, I'd like to find the HQL equivalent of the following PSQL query
UPDATE teams t
SET users_count = (SELECT COUNT(ut.*)
FROM teams t1
LEFT JOIN teams_users tu
ON t1.id = tu.team_id
WHERE t1.id = t.id
GROUP BY t1.id);
OR
An equivalent of the following
SELECT t.*, count(tu.*) AS users_count
FROM teams t
LEFT JOIN teams_users tu
ON t.id = tu.team_id
GROUP BY t.id;
Unsuccessful tries (to get an idea)
UPDATE Team t SET
t.usersCount = COUNT(t.users)
UPDATE Team t SET
t.usersCount = (SELECT COUNT(t1.users) FROM Team t1 WHERE t1.id = t.id)
SELECT t, count(t.users) AS t.usersCount
FROM Team t
I've found the solution for the UPDATE query.
It simply is
UPDATE Team t
SET t.usersCount = (SELECT COUNT(u) from t.users u)
It makes an extra join on the table users whilst the table teams_users would be enought but well... It works.
If anyone has ths solution for the SELECT one, I'm still curious !

jpa avg(count(*))

Assum I have three tables User, Product and UserProduct. Once a user buys a product a record in the UserProduct table is created.
I would like to calculate an average number of products being bought by users. I can do this with plain SQL on mySql RDBMS like as follows:
select avg(c) from (select count(*) as c from user_product up join user u on (up.user_id = u.id) group by u.id) as res
Any chance I can do the same with JPA?
Update: I know that subqueries may be used in the WHERE or HAVING clause only, but still...