Aggregate function with Date on Postgres - postgresql

I'm kind of rusty on my SQL, maybe you can help me out on this query.
I have these two tables for a tickets system (I'm omitting some fields):
table tickets
id - bigint
subject - text
user_id - bigint
closed - boolean
first_message - bigint
(foreign key, for next table's id)
last_message - bigint
(same as before)
table ticket_messages
creation_date
I need to query the closed tickets, and make an average of the time spent between the first message creation_date and the last message creation_date. This is what I've done so far:
SELECT t.id, t.subject, tm.creation_date
FROM tickets AS t
INNER JOIN ticket_messages AS tm
ON tm.id = t.first_message
OR tm.id = t.last_message
WHERE t.closed = true
I'm looking for some group by or aggregate function to get all the data from the table, and try to calculate the time spent between last and first, also trying to display the dates for the first and last message.
UPDATE I added an inner Join with the second table instead of "OR", now I get both dates, and I can find the sum from my application:
SELECT t.id, t.subject, tm.creation_date, tm2.creation_date
FROM tickets AS t
INNER JOIN ticket_messages AS tm
ON tm.id = t.first_message
INNER JOIN ticket_messages as tm2
ON tm2.id = t.last_message
WHERE t.closed = true
I think that did it...

Something like this should do for getting the nr of days elapsed. You might need to put this in a subquery to easily pull out more fields from 'tickets'.
SELECT t.id,AVG(tlast.creation_date - tfirst.creation_date)
FROM tickets AS t
INNER JOIN ticket_messages AS tfirst
ON tm.id = t.first_message
INNER JOIN ticket_messages AS tlast
ON tm.id = t.last_message
WHERE t.closed = true
GROUP BY t.id
Which might lead to(not tested..) e.g.
select t.id,t.subject,sub.nr_days
FROM (
SELECT t.id,AVG(tlast.creation_date - tfirst.creation_date) as nr_days
FROM tickets AS t
INNER JOIN ticket_messages AS tfirst
ON tm.id = t.first_message
INNER JOIN ticket_messages AS tlast
ON tm.id = t.last_message
WHERE t.closed = true
GROUP BY t.id ) AS sub
INNER JOIN tickets AS t
ON sub.id = t.id;

You are trying to combine two queries into one and trying to get the data from three rows of data from two tables. Both need to be fixed.
First of all, you should not attempt to mix aggregate data (such as averages) with the details for single items - you need separate queries for that. You can do it, but the output is repetitious and therefore wasteful (all the single items in a group will have the same aggregate data).
Secondly, you need to find the first message and the last message for a given ticket. Hence, that query is:
SELECT t.id, t.subject, tm1.creation_date as start, tm2.creation_date as end,
tm2.creation_date - tm1.creation_date as close_interval
FROM tickets AS t
INNER JOIN ticket_messages AS tm1 ON t.last_message = tm1.id
INNER JOIN ticket_messages AS tm2 ON t.last_message = tm2.id
WHERE t.closed = true
This gives you three rows of data per result row - as required. The computed value should be an interval type - assuming that PostgreSQL actually has that type. (In Informix, the type would effectively be INTERVAL DAY(n) for a suitable n, such as 9.)
You can average those intervals, now. You can't average dates because dates cannot be added together and cannot be divided; averaging involves both summing and dividing. Intervals can be added and divided.

Related

SQL left join on maximum date

I have two tables: contracts and contract_descriptions.
On contract_descriptions there is a column named contract_id which is equal on contracts table records.
I am trying to join the latest record on contract_descriptions:
SELECT *
FROM contracts c
LEFT JOIN contract_descriptions d ON d.contract_id = c.contract_id
AND d.date_description =
(SELECT MAX(date_description)
FROM contract_descriptions t
WHERE t.contract_id = c.contract_id)
It works, but is it the performant way to do it? Is there a way to avoid the second SELECT?
You could also alternatively use DISTINCT ON:
SELECT * FROM contracts c LEFT JOIN (
SELECT DISTINCT ON (cd.contract_id) cd.* FROM contract_descriptions cd
ORDER BY cd.contract_id, cd.date_description DESC
) d ON d.contract_id = c.contract_id
DISTINCT ON selects only one row per contract_id while the sort clause cd.date_description DESC ensures that it is always the last description.
Performance depends on many values (for example, table size). In any case, you should compare both approaches with EXPLAIN.
Your query looks okay to me. One typical way to join only n rows by some order from the other table is a lateral join:
SELECT *
FROM contracts c
CROSS JOIN LATERAL
(
SELECT *
FROM contract_descriptions cd
WHERE cd.contract_id = c.contract_id
ORDER BY cd.date_description DESC
FETCH FIRST 1 ROW ONLY
) cdlast;

Get distinct row by primary key, but use value from another column

I'm trying to get the sum of the total time that was spent sending all emails within a campaign.
Because of the joins in my query I end up with the 'processing_time' column duplicated over many rows. So running sum(s.processing_time) as send_time will always over represent how long it took to run.
select
c.id,
c.sender,
c.subject,
count(*) as total_items,
count(distinct s.id) as sends,
sum(s.processing_time) as send_time,
from campaigns c
left join sends s on c.id = s.campaigns_id
left join opens o on s.id = o.sends_id
group by c.id;
I'd ideally like to do something like sum(s.processing_time when distinct s.id) but I can't quite work out how to achieve that.
I have made other attempts using case but I always run into the same issue, I need to get the distinct rows based on the ID column, but work with another column.
Since you want statistics related to distinct s.id as well as c.id, group by both columns. Collect the (intermediate) data that you need,
and use this table as the inner table in a nested sub-select query.
In the outer select, group by c.id alone.
Since the inner select groups by s.id, values which are unique per s.id will not get double-counted when you sum/group by c.id.
SELECT id
, sender
, subject
, sum(total_items) as total_items
, sum(sends) as sends
, sum(processing_time) as send_time
FROM (
SELECT
c.id
, s.id as sid
, count(*) as total_items
, 1 as sends
, s.processing_time
, c.sender
, c.subject
FROM campaigns c
LEFT JOIN sends s on c.id = s.campaigns_id
LEFT JOIN opens o on s.id = o.sends_id
GROUP BY c.id, c.sender, c.subject, s.processing_time, s.id) t
GROUP BY id, sender, subject
ORDER BY id
Since the final table includes sender and subject, you'll need to group by these columns as well to avoid an error such as:
ERROR: column "c.sender" must appear in the GROUP BY clause or be used in an aggregate function
LINE 14: , c.sender

Finding Detail Based on date in SSRS

I have a Table that I am using to pull order details in SSRS that has when the price of a product number was changed. It has Data Changed and Updated Cost.
I am pairing up two different tables to create a report that is the cost of the package at the time of the order. Here is how I am pulling my data:
SELECT
WAREHOUSE.ActPkgCostHist.ItemNo AS [ActPkgCostHist ItemNo]
,WAREHOUSE.ActPkgCostHist.ActPkgCostDate
,WAREHOUSE.ActPkgCostHist.ActPkgCost
,ORDER.OrderHist.OrderNo
,ORDER.OrderHist.ItemNo AS [OrderHist ItemNo]
,ORDER.OrderHist.DispenseDt
FROM
WAREHOUSE.ActPkgCostHist
INNER JOIN ORDER.OrderHist
ON WAREHOUSE.ActPkgCostHist.ItemNo = ORDER.OrderHist.ItemNo
Catalog=ShippedOrders
ActPkgCostHist Table has What the cost of an Item was and what date the cost was changed.
OrderHist Table has the complete details of the order except the ActPkgCost at the time of the purchase.
I am attempting to create a table that Has order number, the date of the order and the package cost at the time of the order.
The ROW_NUMBER function is very useful for cases like this.
SELECT WAREHOUSE.ActPkgCostHist.ItemNo AS [ActPkgCostHist ItemNo]
,WAREHOUSE.ActPkgCostHist.ActPkgCostDate
,WAREHOUSE.ActPkgCostHist.ActPkgCost
,ORDER.OrderHist.OrderNo
,ORDER.OrderHist.ItemNo AS [OrderHist ItemNo]
,ORDER.OrderHist.DispenseDt
FROM ORDER.OrderHist
INNER JOIN (
SELECT ItemNo, ActPkgCostDate, ActPkgCost
, ROW_NUMBER() OVER (PARTITION BY ItemNo ORDER BY ActPkgCostDate DESC) as RN
FROM WAREHOUSE.ActPkgCostHist
--if there are future dated changes, limit ActPkgCostDate to be <= the current date
) ActPkgCostHist on ActPkgCostHist.ItemNo = OrderHist.ItemNo
WHERE RN = 1
What this subquery does is group the cost history by ItemNo. Then for each one, it ranks the changes by recency with the most recent change being 1. Then in the main query you filter it to just rows with a 1.
For each item in each order you have to find the latest cost date and use it when joining with the cost table
SELECT C.ItemNo AS [ActPkgCostHist ItemNo],
C.ActPkgCostDate,
C.ActPkgCost,
O.OrderNo,
O.ItemNo AS [OrderHist ItemNo],
O.DispenseDt
FROM WAREHOUSE.ActPkgCostHist AS C
-- JOIN order detail with cost table in order to define the cost date per item/order
INNER JOIN (SELECT Max(CH.ActPkgCostDate) AS ItemCostDate,
OH.OrderNo,
OH.ItemNo,
OH.DispenseDt
FROM WAREHOUSE.ActPkgCostHist AS CH
INNER JOIN ORDER.OrderHist AS OH
ON CH.ItemNo = OH.ItemNo
-- Get the latest cost date only from dates before order date
WHERE CH.ActPkgCostDate <= OH.DispenseDt
GROUP BY OH.OrderNo,
OH.ItemNo,
OH.DispenseDt) AS O
ON C.ItemNo = O.ItemNo
AND C.ActPkgCostDate = O.ItemCostDate

Row counts across aggregated tables

I have multiple tables in a postgres database that hold perfectly unique information. The information, when properly joined together in a query, will produce all every possible combination that I'm looking. The information I'm looking for are complete SKUs.
To generate a complete SKUs, this query produces the desired results:
Functional Query
SELECT
materials.code,
"part_base_parts".code as part_base_parts_id,
shanks.code AS shank_id,
measurements.description
FROM
"part_base_parts"
LEFT JOIN "part_types" ON "part_base_parts"."part_type_id" = "part_types"."id"
RIGHT JOIN "parts_to_shanks" ON "part_base_parts"."id" = "parts_to_shanks"."part_base_part_id"
RIGHT JOIN "parts_to_measurements" ON "part_base_parts"."id" = "parts_to_measurements"."part_base_part_id"
RIGHT JOIN "parts_to_materials" ON "part_base_parts"."id" = "parts_to_materials"."part_base_part_id"
JOIN materials ON "parts_to_materials"."material_id" = materials."id"
JOIN shanks ON "parts_to_shanks"."shank_id" = shanks."id"
JOIN measurements ON "parts_to_measurements"."measurement_id" = measurements."id"
ORDER BY
part_base_parts_id ASC,
materials.code ASC,
shank_id ASC,
measurements.description ASC
Given this query, I produce 32,640 records (without indexing applied) with a query time of .82 seconds. Something like this...
Given Output
code part_base_parts_id shank_id description
AA 5105 A 03.0
.
. 32,638 rows in here.
.
ST 6939 D 9/16
This is only getting me half way there, though. I need to take the results back from the query and produce the total number of counts from each column. So the result that I need to have would be:
Desired Results
code: AA - ###0
...
ST - ###0
part_base_parts_id: 5105 - ###0
...
6939 - ###0
shank_id: A - ###0
...
D - ###0
description: 03.0 - ###0
...
9/16 - ###0
Is there a way to produce the "desired results" from Postgres?
If you want them in rows then sure.
WITH cte AS(
SELECT
materials.code,
"part_base_parts".code as part_base_parts_id,
shanks.code AS shank_id,
measurements.description
FROM
"part_base_parts"
LEFT JOIN "part_types" ON "part_base_parts"."part_type_id" = "part_types"."id"
RIGHT JOIN "parts_to_shanks" ON "part_base_parts"."id" = "parts_to_shanks"."part_base_part_id"
RIGHT JOIN "parts_to_measurements" ON "part_base_parts"."id" = "parts_to_measurements"."part_base_part_id"
RIGHT JOIN "parts_to_materials" ON "part_base_parts"."id" = "parts_to_materials"."part_base_part_id"
JOIN materials ON "parts_to_materials"."material_id" = materials."id"
JOIN shanks ON "parts_to_shanks"."shank_id" = shanks."id"
JOIN measurements ON "parts_to_measurements"."measurement_id" = measurements."id"
ORDER BY
part_base_parts_id ASC,
materials.code ASC,
shank_id ASC,
measurements.description ASC
)
SELECT key, value, count(*)
FROM(
SELECT 'code' AS key, code AS value
FROM cte
UNION ALL
SELECT 'part_base_parts_id', code
FROM cte
UNION ALL
SELECT 'shank_id', shank_id
FROM cte
UNION ALL
SELECT 'description', description
FROM cte
) AS q
GROUP BY key, value
ORDER BY key, value

TSQL Msg 1013 "Use correlation names to distinguish them."

I looked trough many suggestions and can't figure how to solve this one for the last two hours.
SET DATEFORMAT DMY
DECLARE #Source DATETIME = '01/01/2001'
DECLARE #Destenaition DATETIME = '01/01/2020'
SELECT ST.[Group],
ST.Shop,
SUM(ST.Purchased) AS Total,
CHG.Charged
FROM (SELECT Personals.Groups.[Name] AS 'Group',
Cards.vPurchases.PersonalID,
Personals.Registry.[Name],
SUM(Cards.vPurchases.Ammont) AS Purchased,
Cards.vPurchases.ShopName AS Shop
FROM Cards.vPurchases
INNER JOIN Personals.Registry
ON Personals.Registry.Id = Cards.vPurchases.PersonalID
INNER JOIN Personals.Groups
ON Personals.Registry.[Group] = Personals.Groups.Id
INNER JOIN Personals.Groups
ON Personals.Groups.Id = CHG.GroupID
WHERE Cards.vPurchases.[TimeStamp] >= #Source
AND Cards.vPurchases.[TimeStamp] <= #Destenaition
GROUP BY Cards.vPurchases.PersonalID,
Personals.Registry.[Name],
Personals.Groups.[Name],
Cards.vPurchases.ShopName) ST,
(SELECT PG.Id AS GroupID,
SUM(Cards.vCharges.Amount) AS Charged
FROM Cards.vCharges
INNER JOIN Personals.Registry
ON Personals.Registry.Id = Cards.vCharges.PersonalID
INNER JOIN Personals.Groups AS PG
ON Personals.Registry.[Group] = PG.Id
WHERE Cards.vCharges.[TimeStamp] >= #Source
AND Cards.vCharges.[TimeStamp] <= #Destenaition
GROUP BY Personals.Groups.[Name]) AS CHG
GROUP BY ST.Shop,
ST.[Group]
And then I get this error:
Msg 1013, Level 16, State 1, Line 6 The objects "Personals.Groups" and
"Personals.Groups" in the FROM clause have the same exposed names. Use
correlation names to distinguish them.
Thanks.
You are using the table Personals.Groups two times in the first sub query.
If you really mean to have the table Personals.Groups you need to give them an alias that you then use instead of the table names in the rest of the query.
INNER JOIN Personals.Groups as PG1
and
INNER JOIN Personals.Groups as PG2
If you only need one you can combine the on clauses to use just one instead.
INNER JOIN Personals.Groups
ON Personals.Registry.[Group] = Personals.Groups.Id and
Personals.Groups.Id = CHG.GroupID