Postgres SQL query group by get most recent record instead of an aggregate

Postgres SQL query group by get most recent record instead of an aggregate - postgresql

This is a current postgres query I have:
sql = """
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
avg(cms.incremental_opens),
avg(cms.incremental_clicks),
avg(cms.incremental_conversions)
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id
"""
whereby I get the average incremental_opens, incremental_clicks, and incremental_conversions per campaign group from the cms table. However, instead of the average, I want the most recent values for the 3 fields. See the cms table screenshot below - I want the values from the record with the greatest (i.e. most recent) event_id (instead of an average for all records) for a given group).
How can I do this? Thanks

It sounds like you want a lateral join.
FROM
experiments.variant_metric_snapshot vms
CROSS JOIN LATERAL (select * from experiments.campaign_metric_snapshot cms where vms.campaign_id = cms.campaign_id order by event_id desc LIMIT 1) cms
WHERE...

If you are after a quick and dirty solution you can use array_agg function with minimal change to your query.
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
(array_agg(cms.incremental_opens ORDER BY cms.event_id DESC))[1] AS incremental_opens,
..
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id;

Related

DB2 SQL: query for discovery if a group has only one client[CD_CLI]

I would need a query to return which groups have only one client and if possible if a client [CD_CLI] in a group with only him.

A try with the limited information provided
SELECT CD_GR_PSZD, CD_CLI
FROM CLI_GR_PSZD
WHERE CD_GR_PSZD in (
SELECT CD_GR_PSZD
FROM CLI_GR_PSZD
GROUP BY CD_GR_PSZD
HAVING count(*) = 1)
The subselect checks which group have only a single entry in the CLI_GR_PSZD table and the outer SELECT is ment to select whatever columns you need.

Is there a way to optimize this T-SQL query to use less spool space?

Running out of spool space wondering if the query can be optimized.
I've tried running a DISTINCT and UNION ALL, Group By doesn't make sense.
SELECT DISTINCT T1.EMAIL, T2.BILLG_STATE_CD, T2.BILLG_ZIP_CD
FROM
(SELECT EMAIL
FROM CAT
UNION ALL
SELECT EMAIL
FROM DOG
UNION ALL
SELECT email As EMAIL
FROM MOUSE) As T1
LEFT JOIN HAMSTER As T2 ON T1.EMAIL =T2.EMAIL_ADDR;
I will need to do this same type of data pull often, looking for a viable solution other than doing three separate joins.
I need to union multiple tables (T1) and join columns from another table (T2) on (T1).

WHERE T2.ord_creatd_dt > DATE '2019-01-01' and T2.ord_creatd_dt < DATE '2019-11-08'

Querying Postgres INHERITED tables directly

Postgres allows you to create a table using inheritance. We have a design where we have 1400 tables that inherit from one main table. These tables are for each of our vendor's inventory.
When I want to query stock for a vendor, I just query the main table. When running Explain, the explanation says that it is going through all 1400 indexes and quite a few of the inherited tables. This causes the query to run very slowly. If I query only the vendor's stock table, I cut the query time to less than 50% of the time by querying the main table.
We have a join on another table that pulls identifiers for the vendor's partner vendors and we also want to query their stock. Example:
SELECT
(select m2.company from sup.members m2 where m2.id = u.id) as company,
u.id,
u.item,
DATE_PART('day', CURRENT_TIMESTAMP - u.datein::timestamp) AS daysinstock,
u.grade as condition,
u.stockno AS stocknumber,
u.ic,
CASE WHEN u.rprice > 0 THEN
u.rprice
ELSE
NULL
END AS price,
u.qty
FROM pub.net u
LEFT JOIN sup.members m1
ON m1.id = u.id OR u.id = any(regexp_split_to_array(m1.partnerslist,','))
WHERE u.ic in ('01036') -- part to query
AND m1.id = 'N40' -- vendor to query
The n40_stock table has stock for the vendor with id = N40 and N40's partner vendors (partnerslist) are G01, G06, G21, K17, N49, V02, M16 so I would also want
to query the g01_stock, g06_stock, g21_stock, k17_stock, n49_stock, v02_stock, and m16_stock tables.
I know about the ONLY clause but is there away to modify this query to get the data from ONLY the specific inherited tables?
Edit
This decreases the time to under 800ms, but I'd like it less:
WITH cte as (
SELECT partnerslist as a FROM sup.members WHERE id = 'N40'
)
SELECT
(select m2.company from sup.members m2 where m2.id = u.id) as company,
u.id,
u.item,
DATE_PART('day', CURRENT_TIMESTAMP - u.datein::timestamp) AS daysinstock,
u.grade as condition,
u.stockno AS stocknumber,
u.ic,
CASE WHEN u.rprice > 0 THEN
u.rprice
ELSE
NULL
END AS price,
u.qty
FROM pub.net u
WHERE u.ic in ('01036') -- part to query
AND u.id = any(regexp_split_to_array('N40,'||(select a from cte), ','))
I cannot retrieve the company from sup.members in the cte because I need the one from the u.id, which is different when the partner changes in the where clause.

Inherited table lookups are based on the actual WHERE clause, which maps to the CHECK table constraint. Simply inheriting tables is not good enough.
https://www.postgresql.org/docs/9.6/static/ddl-partitioning.html
Caveat, you can not use a dynamically created variables where the actual value is not implemented in the raw query. This results in a check of all inherited tables.

Difference between subquery and correlated subquery for given code

Example for correlated subquery given in a book is as follows;
Customers who placed orders on February 12, 2007
SELECT custid, companyname
FROM Sales.Customers AS C
WHERE EXISTS
(SELECT *
FROM Sales.Orders AS O
WHERE O.custid = C.custid
AND O.orderdate = '20070212');
But, I wrote following code for the same purpose using simple subquery
SELECT custid, companyname
FROM Sales.Customers
WHERE custid IN
(SELECT [custid] FROM [Sales].[Orders]
WHERE [orderdate] ='20070212')
Both gives identical output. Which method is better? and why? and I do not understand the use of EXISTS here in the first set of codes

I tried similar queries on my own data on SQL Server 2016 SP!:
select
*
from EXT.dbo_CUSTTABLE
where ACCOUNTNUM in
(select CUSTACCOUNT from EXT.dbo_SALESLINE b
where b.CREATEDDATETIME between '20170101 00:00' and '20170102 23:59');
select
*
from EXT.dbo_CUSTTABLE a
where exists
(select * from EXT.dbo_SALESLINE b
where a.ACCOUNTNUM=b.CUSTACCOUNT
and b.CREATEDDATETIME between '20170101 00:00' and '20170102 23:59');
Look at the execution plans, they are identical!
If I add a clustered index on the customer table, and an index on the salesline, we get a more efficient query, with index seek and inner join, in stead of table scans and hash joins, but still identical!:
Now if you are using another version of SQL server youre results may vary, since the query optimizer changes between versions.

Grouping by attributes and counting, postgreSQL

I have written the following code that counts how many instances of each book_id there are in the table soldBooks.
SELECT book_id, sum(counter) AS no_of_books_sold, sum(retail_price) AS generated_revenue
FROM(
SELECT book_id,1 AS counter, retail_price
FROM shipments
LEFT JOIN editions ON (shipments.isbn = editions.isbn)
LEFT JOIN stock ON (shipments.isbn = stock.isbn)
) AS soldBooks
GROUP BY book_id
As you can see, I used a "counter" in order to solve my problem. But I am sure there must be a better, more built in way of achieving the same result! There must be some way to group a table together by a given attribute, and to create a new column displaying the count of EACH attribute. Can somebody share this with me?
Thanks!

SELECT book_id,
COUNT(book_id) AS no_books_sold,
SUM(retail_price) AS gen_rev
FROM shipments
JOIN editions ON (shipments.isbn=editions.isbn)
JOIN stock ON (shipments.isbn=stock.isbn)
GROUP BY book_id

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgres SQL query group by get most recent record instead of an aggregate - postgresql

It sounds like you want a lateral join. FROM experiments.variant_metric_snapshot vms CROSS JOIN LATERAL (select * from experiments.campaign_metric_snapshot cms where vms.campaign_id = cms.campaign_id order by event_id desc LIMIT 1) cms WHERE...

Related

DB2 SQL: query for discovery if a group has only one client[CD_CLI]

Is there a way to optimize this T-SQL query to use less spool space?

Querying Postgres INHERITED tables directly

Difference between subquery and correlated subquery for given code

Grouping by attributes and counting, postgreSQL

Categories

Resources