Grouping by attributes and counting, postgreSQL - postgresql

I have written the following code that counts how many instances of each book_id there are in the table soldBooks.
SELECT book_id, sum(counter) AS no_of_books_sold, sum(retail_price) AS generated_revenue
FROM(
SELECT book_id,1 AS counter, retail_price
FROM shipments
LEFT JOIN editions ON (shipments.isbn = editions.isbn)
LEFT JOIN stock ON (shipments.isbn = stock.isbn)
) AS soldBooks
GROUP BY book_id
As you can see, I used a "counter" in order to solve my problem. But I am sure there must be a better, more built in way of achieving the same result! There must be some way to group a table together by a given attribute, and to create a new column displaying the count of EACH attribute. Can somebody share this with me?
Thanks!

SELECT book_id,
COUNT(book_id) AS no_books_sold,
SUM(retail_price) AS gen_rev
FROM shipments
JOIN editions ON (shipments.isbn=editions.isbn)
JOIN stock ON (shipments.isbn=stock.isbn)
GROUP BY book_id

Related

Postgres SQL query group by get most recent record instead of an aggregate

This is a current postgres query I have:
sql = """
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
avg(cms.incremental_opens),
avg(cms.incremental_clicks),
avg(cms.incremental_conversions)
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id
"""
whereby I get the average incremental_opens, incremental_clicks, and incremental_conversions per campaign group from the cms table. However, instead of the average, I want the most recent values for the 3 fields. See the cms table screenshot below - I want the values from the record with the greatest (i.e. most recent) event_id (instead of an average for all records) for a given group).
How can I do this? Thanks
It sounds like you want a lateral join.
FROM
experiments.variant_metric_snapshot vms
CROSS JOIN LATERAL (select * from experiments.campaign_metric_snapshot cms where vms.campaign_id = cms.campaign_id order by event_id desc LIMIT 1) cms
WHERE...
If you are after a quick and dirty solution you can use array_agg function with minimal change to your query.
SELECT
vms.campaign_id,
avg(vms.open_rate_uplift) as open_rate_average,
avg(vms.click_rate_uplift) as click_rate_average,
avg(vms.conversion_rate_uplift) as conversion_rate_average,
(array_agg(cms.incremental_opens ORDER BY cms.event_id DESC))[1] AS incremental_opens,
..
FROM
experiments.variant_metric_snapshot vms
INNER JOIN experiments.campaign_metric_snapshot cms ON vms.campaign_id = cms.campaign_id
WHERE
vms.campaign_id IN %(campaign_ids)s
GROUP BY
vms.campaign_id;

how to fetch data quickly in join query?

I have 3 tables users, orders and comments every tables has 10087250,24949600 and 26532000 much records, I made this query to counts comments on every order but it is taking more than half an hour to execute, how to speed up this query.
Note: there is already index on foreig_key columns.
select users.user_name, orders.id, count(comments.order_id)
from orders
inner join users on users.id=orders.user_id
inner join comments on orders.id=comments.order_id
group by comments.order_id, users.user_name, orders.id
limit 2;
For the first - probably yuo need ORDER BY clause to use it with LIMIT
If you need most commented pair you can ORDER BY count DESC
The second things comments.order_id = orders.id. Why do you use both for GROUP?
group by comments.order_id, users.user_name, orders.id
May be you can help something like this:
WITH grouped AS (
SELECT order_id AS id, count(*)
FROM comments
GROUP BY 1
ORDER BY 2 DESC
LIMIT 2
)
SELECT u.user_name, g.id, g.count
FROM grouped AS g
JOIN orders AS o ON
o.id = g.id
JOIN users AS u ON
u.id = o.user_id
This allows to avoid join all tables before filtering and grouping
You can try to use temporary tables before aggregating the records. This might help to reduce the query time. Something like this...
CREATE TEMPORARY TABLE temp_table(
...
);
INSERT INTO temp_table
SELECT users.user_name, orders.id, comments.order_id
FROM orders INNER JOIN users ON users.id = orders.user_id INNER JOIN comments ON orders.id = comments.order_id;
SELECT user_name, id, count(order_id) FROM temp_table group by order_id, user_name, id;
I think you need to reduce a unneccessary join between orders and comments tables. All you want to get from table comments is how many comments of an order, so you need to do denormalization.
It means you need to add a comments_count column into orders table, and when every a comment is added to an order, just increase it or decrease it if a comment of order is deleted.
After you add new comments_count column, you need to update comments_count for each order.
Then you can just load orders table and you already have comments count for each order.

Postgres: averaging a column on distinct of another column in already grouped query

Is there a way to average a column only on a distinct of another column when the query is already grouped for another purpose without using a subquery? I know it can be done through subqueries, but trying to avoid restructuring an old query unless it is absolutely necessary.
The existing query, while complex, has more or less the same structure as the example below. As you can see, a library has any number of books, a book has any number of chapters, and a chapter has any number of paragraphs while the query returns the total numbers of books and paragraphs for each library.
SELECT libraries.name,
COUNT(DISTINCT books.id) AS num_books,
COUNT(paragraphs.id) AS num_paragraphs
FROM libraries
LEFT JOIN books ON books.library_id = libraries.id
LEFT JOIN chapters ON chapters.book_id = books.id
LEFT JOIN paragraphs ON paragraphs.chapter_id = chapters.id
GROUP BY libraries.name
Now suppose the table books has a column publish_year and I want the average year books in the library were published. Obviously I can't simply add AVERAGE(books.publish_year) since books with more chapters and paragraphs would skew the average.
Is there a good way of averaging books.publish_year based upon distinct books.id again without restructuring the query or is restructuring the query inevitable?
A window function before joining
select
l.name,
count(distinct b.id) as num_books,
count(p.id) as num_paragraphs,
min(year_avg) as year_avg
from
libraries l
left join (
select *, avg(publish_year) over(partition by library_id) as year_avg
from books
) b on b.library_id = l.id
left join chapters c on c.book_id = b.id
left join paragraphs p on p.chapter_id = c.id
group by l.name

Inner join with count and group by

I have 2 tables
Timetable :
pupil_id, staff_id, subject, lesson_id
Staff_info :
staff_id, surname
The timetable table contains 1000s of rows because each student's ID is listed under each period they do.
I want to list all the teacher's names, and the number of lessons they do (count). So I have to do SELECT with DISTINCT.
SELECT DISTINCT TIMETABLE.STAFF_ID,
COUNT(TIMETABLE.LESSON_ID),
STAFF.SURNAME
FROM STAFF
INNER JOIN TIMETABLE ON TIMETABLE.STAFF_ID = STAFF.STAFF_ID
GROUP BY TIMETABLE.STAFF_ID
However I get the error:
Column 'STAFF.SURNAME' is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
This should do what you want:
SELECT s.STAFF_ID, COUNT(tt.LESSON_ID),
s.SURNAME
FROM STAFF s INNER JOIN
TIMETABLE tt
ON tt.STAFF_ID = s.STAFF_ID
GROUP BY s.STAFF_ID, s.SURNAME;
Notes:
You don't need DISTINCT unless there are duplicates in either table. That seems unlikely with this data structure, but if a staff member could have two of the same lesson, you would use COUNT(DISTINCT tt.LESSON_ID).
Table aliases make the query easier to write and to read.
You should include STAFF.SURNAME in the GROUP BY as well as the id.
I have a preference for taking the STAFF_ID column from the table where it is the primary key.
If you wanted staff with no lessons, you would change the INNER JOIN to LEFT JOIN.
SELECT T.STAFF_ID,
T.CNT,
S.SURNAME
FROM STAFF S
JOIN (
SELECT STAFF_ID, CNT = COUNT(/*DISTINCT*/ LESSON_ID)
FROM TIMETABLE
GROUP BY STAFF_ID
) T ON T.STAFF_ID = S.STAFF_ID
Another option:
SELECT DISTINCT si.staff_id, surname, COUNT(lesson_id) OVER(PARTITION BY staff_Id)
FROM Staff_info si
INNER JOIN Timetable tt ON si.staff_id = tt.staff_id
When using Aggregate function(Count, Sum, Min, Max, Avg) in the Select column's list, any other columns that are in the Select column's list but not in a aggregate function, should be mentioned in GROUP BY section too. So you need to change your query as follow and add STAFF.SURNAME to GROUP BY section too:
SELECT TIMETABLE.STAFF_ID,
COUNT(TIMETABLE.LESSON_ID),
STAFF.SURNAME
FROM STAFF
INNER JOIN TIMETABLE ON TIMETABLE.STAFF_ID = STAFF.STAFF_ID
GROUP BY TIMETABLE.STAFF_ID,STAFF.SURNAME
Distinct is useless also in your scenario. and also as you are going to show the teachers name and Count lessons, you do not need to add TIMETABLE.STAFF_ID to Select's column's list,, but it should remain in Group By section to prevent duplicate names.
SELECT COUNT(TIMETABLE.LESSON_ID),
STAFF.SURNAME
FROM STAFF
INNER JOIN TIMETABLE ON TIMETABLE.STAFF_ID = STAFF.STAFF_ID
GROUP BY TIMETABLE.STAFF_ID,STAFF.SURNAME
You may need to take a look at this W3C post for more info

T-SQL - How to write query to get records that match ALL records in a many to many join

(I don't think I have titled this question correctly - but I don't know how to describe it)
Here is what I am trying to do:
Let's say I have a Person table that has a PersonID field. And let's say that a Person can belong to many Groups. So there is a Group table with a GroupID field and a GroupMembership table that is a many-to-many join between the two tables and the GroupMembership table has a PersonID field and a GroupID field. So far, it is a simple many to many join.
Given a list of GroupIDs I would like to be able to write a query that returns all of the people that are in ALL of those groups (not any one of those groups). And the query should be able to handle any number of GroupIDs. I would like to avoid dynamic SQL.
Is there some simple way of doing this that I am missing?
Thanks,
Corey
select person_id, count(*) from groupmembership
where group_id in ([your list of group ids])
group by person_id
having count(*) = [size of your list of group ids]
Edited: thank you dotjoe!
Basically you are looking for Persons for whom there is no group he is not a member of, so
select *
from Person p
where not exists (
select 1
from Group g
where not exists (
select 1
from GroupMembership gm
where gm.PersonID = p.ID
and gm.GroupID = g.ID
)
)
You're basically not going to avoid "dynamic" SQL in the sense of dynamically generating the query at query time. There's no way to hand a list around in SQL (well, there is, table variables, but getting them into the system from C# is either impossible (2005 & below) or else annoying (2008)).
One way that you could do it with multiple queries is to insert your list into a work table (probably a process-keyed table) and join against that table. The only other option would be to use a dynamic query such as the ones specified by Jonathan and hongliang.