Join 4 tables with group by, 2 having and where clause - postgresql

I have the database consists of 4 tables:
users(id, "name", surname, birthdate)
friendships(userid1, userid2, "timestamp")
posts(id, userid, "text", "timestamp")
likes(postid, userid, "timestamp")
I need to get a result set of unique usernames having more than 3 friendships within January of 2018 and their "likes" average per "post" in the range of [10; 35).
I wrote this statement for the first step:
select distinct u."name"
from users u
join friendships f on u.id = f.userid1
where f."timestamp" between '2018-01-01'::timestamp and '2018-01-31'::timestamp
group by u.id
having count(f.userid1) > 3;
It's working fine and returns 3 rows. But when I'm adding the second part this way:
select distinct u."name"
from users u
join friendships f on u.id = f.userid1
join posts p on p.userid = u.id
join likes l on p.id = l.postid
where f."timestamp" between '2018-01-01'::timestamp and '2018-01-31'::timestamp
group by u.id
having count(f.userid1) > 3
and ((count(l.postid) / count(distinct l.postid)) >= 10
and (count(l.postid) / count(distinct l.postid)) < 35);
I'm getting crazy 94 rows. I don't know why.
Will be thankful for possible help.

You don't need distinct in u.name because aggregate will remove the duplicate.
select
u."name"
from
users u
inner join friendships f on u.id = f.userid1
inner join posts p on u.id = p.userid
inner join likes l on p.id = l.postid
where
f."timestamp" >= '2018-01-01'::timestamp
and f."timestamp" < '2018-02-01'::timestamp
group by
u."name"
having
count(distinct f.userid1) > 3
and ((count(l.postid) / count(distinct l.postid)) >= 10
and (count(l.postid) / count(distinct l.postid)) < 35);
As in comment stated. Not good idea when you use between for date to do range.
f."timestamp" >= '2018-01-01'::timestamp
and f."timestamp" < '2018-02-01'::timestamp
Will give you a full month of January.

Try the below! The issue with using "count(f.userid1) > 3" is that if a user has , e.g. 2 friends and 6 posts and 3 likes they'll get 2 x 6 = 12 rows, so 12 records with non-null f.userid1. By counting distinct f.userid2 you can count distinct friends. Similar issues appear for the other counts used for filtering.
select u."name"
from users u
join friendships f on u.id = f.userid1
join posts p on p.userid = u.id
left join likes l on p.id = l.postid
where f."timestamp" > '2018-01-01'::timestamp and f."timestamp" < '2018-02-01'::timestamp
group by u.id, u."name"
having
--at least three distinct friends
count( distinct f.userid2) > 3
--distinct likes / distinct posts
--we use l.* to count distinct likes since there's no primary key
and ((count(distinct l.*) / count(distinct p.id)) >= 10
and ((count(distinct l.*) / count(distinct p.id)) < 35);

Related

How to filter database table by a multiple join records from another one table but different types?

I have a products table and corresponding ratings table which contains a foreign key product_id, grade(int) and type which is an enum accepting values robustness and price_quality_ratio
The grades accept values from 1 to 10. So for example, how would the query look like, if I wanted to filter the products where minimum grade for robustness would be 7 and minimum grade for price_quality_ratio would be 8?
You can join twice, once per rating. The inner joins eliminate the products that fail any rating criteria,
select p.*
from products p
inner join rating r1
on r1.product_id = p.product_id
and r1.type = 'robustness'
and r1.rating >= 7
inner join rating r2
on r2.product_id = p.product_id
and r2.type = 'price_quality_ratio'
and r2.rating >= 8
Another option is to use do conditional aggregation. This requires only one join, then a group by; the rating criteria are checked in the having clause.
select p.product_id, p.product_name
from products p
inner join rating r
on r.product_id = p.product_id
and r.type in ('robustness', 'price_quality_ratio')
group by p.product_id, p.product_name
having
min(case when r.type = 'robustness' then r.rating end) >= 7
and min(case when r.type = 'price_quality_ratio then r.rating end) >= 8
The JOIN proposed by #GMB would've been my first suggestion as well. If that gets too complicated with having to maintain too many rX.ratings, you can also use a nested query:
SELECT *
FROM (
SELECT p.*, r1.rating as robustness, r2.rating as price_quality_ratio
FROM products p
JOIN rating r1 ON (r1.product_id = p.product_id AND r1.type = 'robustness')
JOIN rating r2 ON (r2.product_id = p.product_id AND r2.type = 'price_quality_ratio')
) AS tmp
WHERE robustness >= 7
AND price_quality_ratio >= 8
-- ORDER BY (price_quality_ratio DESC, robustness DESC) -- etc

Subqueries and Combining Queries together

I have a problem I've been working on. I've broken it down to a couple of steps below. I have trouble combining all the queries together to solve the following:
Find members who have spent over $1000 in departments that have
brought in more than $10000 total ordered by the members' id.
Schema:
departments(id, name)
products (id, name, price)
members(id, name, number, email, city, street_name, street_address)
sales(id, department_id, product_id, member_id, transaction_date
Step 1)
I found the departments that have brought in more than 10,000$
select s.department_id
from sales s join products p on
s.product_id = p.id
group by s.department_id
having sum(price) > '10000'
Step 2) I found the members and the departments that they shop in
select *
from members m
join sales s
on m.id = s.member_id
join departments d
on d.id = s.department_id
Step 3) I combined 1 and 2 to find members taht shop in departments that have brought in more than 10,000
select *
from members m
join sales s
on m.id = s.member_id
join departments d
on d.id = s.department_id
where s.department_id in
(select s.department_id
from sales s join products p on
s.product_id = p.id
group by s.department_id
having sum(price) > '10000')
Step 4) I found members and their id, email, total_spending > 1,000$
select m.id, m.name, m.email, sum(price) as total_spending
from members m join sales s on
m.id = s.member_id
join products p on
p.id = s.product_id
group by m.id
having sum(price) > '1000'
Step 5)
All of the steps work individually but when I put them together in my attempt:
select m.id, m.name, m.email, sum(price) as total_spending
from members m join sales s on
m.id = s.member_id
join products p on
p.id = s.product_id
where m.id in (select distinct m.id
from members m
join sales s
on m.id = s.member_id
join departments d
on d.id = s.department_id
where s.department_id in
(select s.department_id
from sales s join products p on
s.product_id = p.id
group by s.department_id
having sum(price) > '10000'))
group by m.id
having sum(price) > '1000'
The output is wrong. (This is on CodeWars) If someone could point me in the right direction that would be really great! Thank you.
Try to group by member_id and department_id:
select s.member_id,s.department_id,sum(p.price) as total_spending
from members m
join sales s on m.id = s.member_id
join products p on p.id = s.product_id
where s.department_id in (
select s.department_id
from sales s
join products p on s.product_id = p.id
group by s.department_id
having sum(p.price) > 10000 -- the departments which brought in more than $10000 total
)
group by s.member_id,s.department_id
having sum(p.price) > 1000 -- who have spent over $1000 in one department
And if you need you will able to calc how much spent each of members:
select member_id,sum(total_spending) total
from
(
-- the first query is here
) q
group by member_id

SUM(CASE WHEN ...) returns a greater number than COUNT(DISTINCT..)

I have written a query in two models, but I can't figure out why the second query returns a greater number than the first one; while the number that the first one, COUNT(DISTINCT...) returns is correct:
WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[])),
date_gen64 AS
(
SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017', interval
'1 day') AS date) as days ORDER BY days)
SELECT cl.class_date AS c_date,
count(DISTINCT (CASE WHEN co.id = 1 THEN p.id END)),
count(DISTINCT (CASE WHEN co.id = 2 THEN p.id END))
FROM person p
JOIN envelope e ON e.personID = p.id
JOIN "class" cl on cl.id = p.classID
JOIN course co ON co.id = cl.course_id AND co.id = 1
JOIN types ON cr.type_id = ANY (types.id)
RIGHT JOIN date_gen64 dg ON dg.days = cl.class_date
GROUP BY cl.class_date
ORDER BY cl.class_date
The above query returns 26 but following query returns 27!
The reason why I rewrote it with SUM is that the first query
was too slow. But my question is that why the second one counts more?
WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[]))
SELECT tmpcl.days,
SUM(CASE WHEN tmp80.course_id = 1 THEN 1
ELSE 0 END),
SUM(CASE WHEN tmp80.course_id = 2 THEN 1
ELSE 0 END)
FROM (
SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017',
interval '1 day') AS date) as days ORDER BY days) tmpcl
LEFT JOIN (
SELECT DISTINCT p.id AS "person_id",
cl.class_date AS c_date,
co.id AS "course_id"
FROM person p
JOIN envelope e ON e.personID = p.id
JOIN "class" cl on cl.id = p.classID
JOIN course co ON co.id = cl.course_id
JOIN types ON cr.type_id = ANY (types.id)
WHERE co.id IN ( 1 , 2 )
) tmp80 ON tmpcl.days = tmp80.class_date
GROUP BY tmpcl.days
ORDER BY tmpcl.days
You can theoretically have multiple people enrolled in the same class on the same day. Indeed that would seem to be the main point of having classes. So each time there are multiple people assigned to the same class on the same day you can have a higher count than you would in your first query. Does that make sense?
You don't appear to be using p.id in that inner query so simply remove it and your counts should match.
WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[]))
SELECT tmpcl.days,
SUM(CASE WHEN tmp80.course_id = 1 THEN 1
ELSE 0 END),
SUM(CASE WHEN tmp80.course_id = 2 THEN 1
ELSE 0 END)
FROM (
SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017',
interval '1 day') AS date) as days ORDER BY days) tmpcl
LEFT JOIN (
SELECT DISTINCT cl.class_date AS c_date,
co.id AS "course_id"
FROM person p
JOIN envelope e ON e.personID = p.id
JOIN "class" cl on cl.id = p.classID
JOIN course co ON co.id = cl.course_id
JOIN types ON cr.type_id = ANY (types.id)
WHERE co.id IN ( 1 , 2 )
) tmp80 ON tmpcl.days = tmp80.class_date
GROUP BY tmpcl.days
ORDER BY tmpcl.days

Postgres: Getting a total related count based on a condition from a related table

My sql-fu is not strong, and I'm sure I'm missing something simple in trying to get this working. I have a fairly standard group of tables:
users
-----
id
name
carts
-----
id
user_id
purchased_at
line_items
----------
id
cart_id
product_id
products
--------
id
permalink
I want to get a total count of purchased carts for each user, if that user has purchased a particular product. That is: if at least one of their purchased carts has a product with a particular permalink, I'd like a count of the total number of purchased carts, regardless of their contents.
The definition a purchased cart is when carts.purchased_at is not null.
select
u.id,
count(c2.*) as purchased_carts
from users u
inner join carts c on u.id = c.user_id
inner join line_items li on c.id = li.cart_id
inner join products p on p.id = li.product_id
left join carts c2 on u.id = c2.user_id
where
c.purchased_at is not NULL
and
c2.purchased_at is not NULL
and
p.permalink = 'product-name'
group by 1
order by 2 desc
The numbers that are coming up for purchased_carts are strangely high, possibly related to the total number of line items multiplied by the number of carts? Maybe? I'm pretty stumped at the result. Any help would be greatly appreciated.
This ought to help:
select u.id,
count(*)
from users u join
carts c on c.user_id = u.id
where c.purchased_at is not NULL and
exists (
select null
from carts c2
join line_items l on l.cart_id = c2.id
join products p on p.id = l.product_id
where c2.user_id = u.id and
c2.purchased_at is not NULL
p.permalink = 'product-name')
group by u.id
order by count(*) desc;
The exists predicate is a semi-join.
bool_or is what you need
select
u.id,
count(distinct c.id) as purchased_carts
from
users u
inner join
carts c on u.id = c.user_id
inner join
line_items li on c.id = li.cart_id
inner join
products p on p.id = li.product_id
where c.purchased_at is not NULL
group by u.id
having bool_or (p.permalink = 'product-name')
order by 2 desc

How to display rollup data in new column?

I have the following query which returns the number of android questions per each day on StackOverflow in the year of 2011. I want to get the sum of all the questions asked during the year 2011. For this I am using ROLLUP.
select
year(p.CreationDate) as [Year],
month(p.CreationDate) as [Month],
day(p.CreationDate) as [Day],
count(*) as [QuestionsAskedToday]
from Posts p
inner join PostTags pt on p.id = pt.postid
inner join Tags t on t.id = pt.tagid
where
t.tagname = 'android' and
p.CreationDate > '2011-01-01 00:00:00'
group by year(p.CreationDate), month(p.CreationDate),day(p.CreationDate)
​with rollup
order by year(p.CreationDate), month(p.CreationDate) desc,day(p.CreationDate) desc​
This is the output:
The sum of all questions asked on each day in 2011 is being displayed in the QuestionsAskedToday column itself.
Is there a way to display the rollup in a new column with an alias?
Link to the query
To show this as a column rather than a row you can use SUM(COUNT(*)) OVER () instead of ROLLUP. (Online Demo)
SELECT YEAR(p.CreationDate) AS [Year],
MONTH(p.CreationDate) AS [Month],
DAY(p.CreationDate) AS [Day],
COUNT(*) AS [QuestionsAskedToday],
SUM(COUNT(*)) OVER () AS [Total]
FROM Posts p
INNER JOIN PostTags pt
ON p.id = pt.postid
INNER JOIN Tags t
ON t.id = pt.tagid
WHERE t.tagname = 'android'
AND p.CreationDate > '2011-01-01 00:00:00'
GROUP BY YEAR(p.CreationDate),
MONTH(p.CreationDate),
DAY(p.CreationDate)
ORDER BY YEAR(p.CreationDate),
MONTH(p.CreationDate) DESC,
DAY(p.CreationDate) DESC
You could take an approach like this: Example
SELECT
YEAR(p.CreationDate) AS 'Year'
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 0
THEN CAST(MONTH(p.CreationDate) AS VARCHAR(2))
ELSE 'Totals:'
END AS 'Month'
, CASE
WHEN GROUPING(DAY(p.CreationDate)) = 0
THEN CAST(DAY(p.CreationDate) AS VARCHAR(2))
ELSE 'Totals:'
END AS [DAY]
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 0
AND GROUPING(DAY(p.CreationDate)) = 0
THEN COUNT(1)
END AS 'QuestionsAskedToday'
, CASE
WHEN GROUPING(MONTH(p.CreationDate)) = 1
OR GROUPING(DAY(p.CreationDate)) = 1
THEN COUNT(1)
END AS 'Totals'
FROM Posts AS p
INNER JOIN PostTags AS pt ON p.id = pt.postid
INNER JOIN Tags AS t ON t.id = pt.tagid
WHERE t.tagname = 'android'
AND p.CreationDate >= '2011-01-01'
GROUP BY ROLLUP(YEAR(p.CreationDate)
, MONTH(p.CreationDate)
, DAY(p.CreationDate))
ORDER BY YEAR(p.CreationDate)
, MONTH(p.CreationDate) DESC
, DAY(p.CreationDate) DESC​​​​​​​
If this is what you wanted, the same technique can be applied to Years as well to total them in the new column, or their own column, if you want to query for multiple years and aggregate them.