Row counts across aggregated tables - postgresql

I have multiple tables in a postgres database that hold perfectly unique information. The information, when properly joined together in a query, will produce all every possible combination that I'm looking. The information I'm looking for are complete SKUs.
To generate a complete SKUs, this query produces the desired results:
Functional Query
SELECT
materials.code,
"part_base_parts".code as part_base_parts_id,
shanks.code AS shank_id,
measurements.description
FROM
"part_base_parts"
LEFT JOIN "part_types" ON "part_base_parts"."part_type_id" = "part_types"."id"
RIGHT JOIN "parts_to_shanks" ON "part_base_parts"."id" = "parts_to_shanks"."part_base_part_id"
RIGHT JOIN "parts_to_measurements" ON "part_base_parts"."id" = "parts_to_measurements"."part_base_part_id"
RIGHT JOIN "parts_to_materials" ON "part_base_parts"."id" = "parts_to_materials"."part_base_part_id"
JOIN materials ON "parts_to_materials"."material_id" = materials."id"
JOIN shanks ON "parts_to_shanks"."shank_id" = shanks."id"
JOIN measurements ON "parts_to_measurements"."measurement_id" = measurements."id"
ORDER BY
part_base_parts_id ASC,
materials.code ASC,
shank_id ASC,
measurements.description ASC
Given this query, I produce 32,640 records (without indexing applied) with a query time of .82 seconds. Something like this...
Given Output
code part_base_parts_id shank_id description
AA 5105 A 03.0
.
. 32,638 rows in here.
.
ST 6939 D 9/16
This is only getting me half way there, though. I need to take the results back from the query and produce the total number of counts from each column. So the result that I need to have would be:
Desired Results
code: AA - ###0
...
ST - ###0
part_base_parts_id: 5105 - ###0
...
6939 - ###0
shank_id: A - ###0
...
D - ###0
description: 03.0 - ###0
...
9/16 - ###0
Is there a way to produce the "desired results" from Postgres?

If you want them in rows then sure.
WITH cte AS(
SELECT
materials.code,
"part_base_parts".code as part_base_parts_id,
shanks.code AS shank_id,
measurements.description
FROM
"part_base_parts"
LEFT JOIN "part_types" ON "part_base_parts"."part_type_id" = "part_types"."id"
RIGHT JOIN "parts_to_shanks" ON "part_base_parts"."id" = "parts_to_shanks"."part_base_part_id"
RIGHT JOIN "parts_to_measurements" ON "part_base_parts"."id" = "parts_to_measurements"."part_base_part_id"
RIGHT JOIN "parts_to_materials" ON "part_base_parts"."id" = "parts_to_materials"."part_base_part_id"
JOIN materials ON "parts_to_materials"."material_id" = materials."id"
JOIN shanks ON "parts_to_shanks"."shank_id" = shanks."id"
JOIN measurements ON "parts_to_measurements"."measurement_id" = measurements."id"
ORDER BY
part_base_parts_id ASC,
materials.code ASC,
shank_id ASC,
measurements.description ASC
)
SELECT key, value, count(*)
FROM(
SELECT 'code' AS key, code AS value
FROM cte
UNION ALL
SELECT 'part_base_parts_id', code
FROM cte
UNION ALL
SELECT 'shank_id', shank_id
FROM cte
UNION ALL
SELECT 'description', description
FROM cte
) AS q
GROUP BY key, value
ORDER BY key, value

Related

How to get unique rows by one column but sort by the second

There is an example request in which there are several joins.
SELECT DISTINCT ON(a.id_1) 1, a.name, b.task, c.created_at
FROM a
INNER JOIN b ON a.id_2 = b.id
INNER JOIN c ON a.ID_2 = c.id
WHERE a.deleted_at IS NULL
ORDER BY a.id_1 desc
In this case, the query will work, sorting by unique values ​​of id_1 will take place. But I need to sort by the column a.name. In this case, postresql will swear with the words ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions.
The following query can serve as a solution to the problem:
SELECT *
FROM(
SELECT DISTINCT ON(a.id_1) a.name, b.task, c.created_at
FROM a
INNER JOIN b ON a.id_2 = b.id
INNER JOIN c ON a.ID_2 = c.id
WHERE a.deleted_at IS NULL
)
ORDER_BY a.name desc
But in reality the database is very large and such a query is not optimal. Are there other ways to sort by the selected column while keeping one uniqueness?

SQL left join on maximum date

I have two tables: contracts and contract_descriptions.
On contract_descriptions there is a column named contract_id which is equal on contracts table records.
I am trying to join the latest record on contract_descriptions:
SELECT *
FROM contracts c
LEFT JOIN contract_descriptions d ON d.contract_id = c.contract_id
AND d.date_description =
(SELECT MAX(date_description)
FROM contract_descriptions t
WHERE t.contract_id = c.contract_id)
It works, but is it the performant way to do it? Is there a way to avoid the second SELECT?
You could also alternatively use DISTINCT ON:
SELECT * FROM contracts c LEFT JOIN (
SELECT DISTINCT ON (cd.contract_id) cd.* FROM contract_descriptions cd
ORDER BY cd.contract_id, cd.date_description DESC
) d ON d.contract_id = c.contract_id
DISTINCT ON selects only one row per contract_id while the sort clause cd.date_description DESC ensures that it is always the last description.
Performance depends on many values (for example, table size). In any case, you should compare both approaches with EXPLAIN.
Your query looks okay to me. One typical way to join only n rows by some order from the other table is a lateral join:
SELECT *
FROM contracts c
CROSS JOIN LATERAL
(
SELECT *
FROM contract_descriptions cd
WHERE cd.contract_id = c.contract_id
ORDER BY cd.date_description DESC
FETCH FIRST 1 ROW ONLY
) cdlast;

TSQL show only first row

I have the following TSQL query:
SELECT DISTINCT MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John' ORDER BY MyTable1.Date DESC
It retrieves a long list of Dates, but I only need the first one, the one in the first row.
How can I get it?
Thanks a ton!
In SQL Server you can use TOP:
SELECT TOP 1 MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John'
ORDER BY MyTable1.Date DESC
If you need to use DISTINCT, then you can use:
SELECT TOP 1 x.Date
FROM
(
SELECT DISTINCT MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John'
) x
ORDER BY x.Date DESC
Or even:
SELECT MAX(MyTable1.Date)
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John'
--ORDER BY MyTable1.Date DESC
There are several options here. You can use TOP(1) as Taryn mentioned. But according to docs for the purposes of limiting the rows returned it is better to use OFFSET and FETCH.
We recommend that you use the OFFSET and FETCH clauses instead of the TOP clause to implement a query paging solution and limit the number of rows sent to a client application.
Using OFFSET and FETCH as a paging solution requires running the query one time for each "page" of data returned to the client application. For example, to return the results of a query in 10-row increments, you must execute the query one time to return rows 1 to 10 and then run the query again to return rows 11 to 20 and so on.
Assuming, the solution for your problem using OFFSET and FETCH approach could be:
SELECT DISTINCT MyTable1.Date
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Id = MyTable2.Id
WHERE Name = 'John' ORDER BY MyTable1.Date DESC
OFFSET 0 ROWS
FETCH NEXT 1 ROW ONLY

How to design a SQL recursive query?

How would I redesign the below query so that it will recursively loop through entire tree to return all descendants from root to leaves? (I'm using SSMS 2008). We have a President at the root. under him are the VPs, then upper management, etc., on down the line. I need to return the names and titles of each. But this query shouldn't be hard-coded; I need to be able to run this for any selected employee, not just the president. This query below is the hard-coded approach.
select P.staff_name [Level1],
P.job_title [Level1 Title],
Q.license_number [License 1],
E.staff_name [Level2],
E.job_title [Level2 Title],
G.staff_name [Level3],
G.job_title [Level3 Title]
from staff_view A
left join staff_site_link_expanded_view P on P.people_id = A.people_id
left join staff_site_link_expanded_view E on E.people_id = C.people_id
left join staff_site_link_expanded_view G on G.people_id = F.people_id
left join facility_view Q on Q.group_profile_id = P.group_profile_id
Thank you, this was most closely matching what I needed. Here is my CTE query below:
with Employee_Hierarchy (staff_name, job_title, id_number, billing_staff_credentials_code, site_name, group_profile_id, license_number, region_description, people_id)
as
(
select C.staff_name, C.job_title, C.id_number, C.billing_staff_credentials_code, C.site_name, C.group_profile_id, Q.license_number, R.region_description, A.people_id
from staff_view A
left join staff_site_link_expanded_view C on C.people_id = A.people_id
left join facility_view Q on Q.group_profile_id = C.group_profile_id
left join regions R on R.regions_id = Q.regions_id
where A.last_name = 'kromer'
)
select C.staff_name, C.job_title, C.id_number, C.billing_staff_credentials_code, C.site_name, C.group_profile_id, Q.license_number, R.region_description, A.people_id
from staff_view A
left join staff_site_link_expanded_view C on C.people_id = A.people_id
left join facility_view Q on Q.group_profile_id = C.group_profile_id
left join regions R on R.regions_id = Q.regions_id
WHERE C.STAFF_NAME IS NOT NULL
GROUP BY C.STAFF_NAME, C.job_title, C.id_number, C.billing_staff_credentials_code, C.site_name, C.group_profile_id, Q.license_number, R.region_description, A.people_id
ORDER BY C.STAFF_NAME
But I am wondering what is the purpose of the "Employee_Hierarchy"? When I replaced "staff_view" in the outer query with "Employee_Hierarchy", it only returned one record = "Kromer". So when/where can we use "Employee_Hierarchy"?
See:
SQL Server - Simple example of a recursive CTE
MSDN: Recursive Queries using Common Table Expression
SQL Server recursive CTE (this seems pretty much like exactly what you are working on!)
Update:
A proper recursive CTE consist of basically three things:
an anchor SELECT to begin with; that can select e.g. the root level employees (where the Reports_To is NULL), or it can select any arbitrary employee that you define, e.g. by a parameter
a UNION ALL
a recursive SELECT statement that selects from the same, typically self-referencing table and joins with the recursive CTE being currently built up
This gives you the ability to recursively build up a result set that you can then select from.
If you look at the Northwind sample database, it has a table called Employees which is self-referencing: Employees.ReportsTo --> Employees.EmployeeID defines who reports to whom.
Your CTE would look something like this:
;WITH RecursiveCTE AS
(
-- anchor query; get the CEO
SELECT EmployeeID, FirstName, LastName, Title, 1 AS 'Level', ReportsTo
FROM dbo.Employees
WHERE ReportsTo IS NULL
UNION ALL
-- recursive part; select next Employees that have ReportsTo -> cte.EmployeeID
SELECT
e.EmployeeID, e.FirstName, e.LastName, e.Title,
cte.Level + 1 AS 'Level', e.ReportsTo
FROM
dbo.Employees e
INNER JOIN
RecursiveCTE cte ON e.ReportsTo = cte.EmployeeID
)
SELECT *
FROM RecursiveCTE
ORDER BY Level, LastName
I don't know if you can translate your sample to a proper recursive CTE - but that's basically the gist of it: anchor query, UNION ALL, recursive query

Aggregate function with Date on Postgres

I'm kind of rusty on my SQL, maybe you can help me out on this query.
I have these two tables for a tickets system (I'm omitting some fields):
table tickets
id - bigint
subject - text
user_id - bigint
closed - boolean
first_message - bigint
(foreign key, for next table's id)
last_message - bigint
(same as before)
table ticket_messages
creation_date
I need to query the closed tickets, and make an average of the time spent between the first message creation_date and the last message creation_date. This is what I've done so far:
SELECT t.id, t.subject, tm.creation_date
FROM tickets AS t
INNER JOIN ticket_messages AS tm
ON tm.id = t.first_message
OR tm.id = t.last_message
WHERE t.closed = true
I'm looking for some group by or aggregate function to get all the data from the table, and try to calculate the time spent between last and first, also trying to display the dates for the first and last message.
UPDATE I added an inner Join with the second table instead of "OR", now I get both dates, and I can find the sum from my application:
SELECT t.id, t.subject, tm.creation_date, tm2.creation_date
FROM tickets AS t
INNER JOIN ticket_messages AS tm
ON tm.id = t.first_message
INNER JOIN ticket_messages as tm2
ON tm2.id = t.last_message
WHERE t.closed = true
I think that did it...
Something like this should do for getting the nr of days elapsed. You might need to put this in a subquery to easily pull out more fields from 'tickets'.
SELECT t.id,AVG(tlast.creation_date - tfirst.creation_date)
FROM tickets AS t
INNER JOIN ticket_messages AS tfirst
ON tm.id = t.first_message
INNER JOIN ticket_messages AS tlast
ON tm.id = t.last_message
WHERE t.closed = true
GROUP BY t.id
Which might lead to(not tested..) e.g.
select t.id,t.subject,sub.nr_days
FROM (
SELECT t.id,AVG(tlast.creation_date - tfirst.creation_date) as nr_days
FROM tickets AS t
INNER JOIN ticket_messages AS tfirst
ON tm.id = t.first_message
INNER JOIN ticket_messages AS tlast
ON tm.id = t.last_message
WHERE t.closed = true
GROUP BY t.id ) AS sub
INNER JOIN tickets AS t
ON sub.id = t.id;
You are trying to combine two queries into one and trying to get the data from three rows of data from two tables. Both need to be fixed.
First of all, you should not attempt to mix aggregate data (such as averages) with the details for single items - you need separate queries for that. You can do it, but the output is repetitious and therefore wasteful (all the single items in a group will have the same aggregate data).
Secondly, you need to find the first message and the last message for a given ticket. Hence, that query is:
SELECT t.id, t.subject, tm1.creation_date as start, tm2.creation_date as end,
tm2.creation_date - tm1.creation_date as close_interval
FROM tickets AS t
INNER JOIN ticket_messages AS tm1 ON t.last_message = tm1.id
INNER JOIN ticket_messages AS tm2 ON t.last_message = tm2.id
WHERE t.closed = true
This gives you three rows of data per result row - as required. The computed value should be an interval type - assuming that PostgreSQL actually has that type. (In Informix, the type would effectively be INTERVAL DAY(n) for a suitable n, such as 9.)
You can average those intervals, now. You can't average dates because dates cannot be added together and cannot be divided; averaging involves both summing and dividing. Intervals can be added and divided.