postgresql RIGHT Join: limit returned rows - postgresql

I have the following schema:
expenses
id
name, varchar
cost, double
date, DATE
category_id, int f_key
user_id, int f_key
1
Pizza
22.9
22/08/2022
1
1
2
Pool
34.9
23/08/2022
2
1
categories
id
name, varchar
1
Food
2
Leisure
3
Medicine
4
Fancy food
users_categories(user_id int foreign key, category_id foreign key)
user_id int f_key
category_id int f_key
1
1
1
2
1
3
2
4
And two users with id 1 and 2.
Relation between user and category is many to many.
Problem:
I want to get statistics (total cost amount and count) for all categories. For categories where there are no expenses I want to return 0. Here is my query:
SELECT categories.name as name, count(expenses.name) as count, round(SUM(price)::numeric,2) as sum
FROM expenses
Right JOIN categories ON expenses.category_id = categories.id
and expenses.category_id in (
select users_categories.category_id from users_categories where users_categories.user_id = 1
)
and expenses.id in(
Select expenses.id from expenses
join users_categories on expenses.category_id = users_categories.category_id
and expenses.user_id = 1
AND (extract(year from date) = 2022 OR CAST(2022 AS int) is null)
AND (extract(month from date) = 8 OR CAST(8 AS int) is null)
)
GROUP BY categories.id ORDER BY categories.id
The response is:
name
count
sum
Food
1
22.9
Leisure
1
33.9
Medicine
0
null
Fancy food
0
null
How I should edit my query to eliminate the last row, because this category doesn't belong to the user 1.

In your query you used user_categories as subquery so it will not filter category ids,
Try this Query
SELECT categories.name as name,count(expenses.name) as count, coalesce(round(SUM(price)::numeric,2),0) as sum from
categories
left join users_categories on users_categories.category_id= categories.id
left join expenses ON expenses.category_id = categories.id
AND (extract(year from date) = 2022 OR CAST(2022 AS int) is null)
AND (extract(month from date) = 8 OR CAST(8 AS int) is null)
where users_categories.user_id='1'
GROUP BY categories.name,categories.id ORDER BY categories.id
OUTPUT :
name count sum
Food 1 22.90
Leisure 1 34.90
Medicine 0 0

You want to move expenses.category_id in ... out of the ON condition and into a WHERE clause.
When it is in the ON clause, that means rows which were removed by the in-test just get NULL-fabricated anyway. You want to remove those rows after the NULL-fabrication is done, so that they remain removed. But why do you use that in-test anyway? Seems like it would be much simpler written as another join.

What I understand is that you are trying to get the count and sum of expenses for all the categories related to the user_id 1 within the month of august 2022.
Please try out the following query.
WITH statistics
AS (SELECT e.category_id,
Count(e.*) AS count,
Round(Sum(e.cost), 2) AS sum
FROM expenses e
WHERE e.user_id = 1
AND ( e.date BETWEEN '01/08/2022' AND '31/08/2022' )
GROUP BY e.category_id),
user_category
AS (SELECT uc.category_id,
COALESCE(s.count, 0) AS count,
COALESCE(s.sum, 0) AS sum
FROM users_categories uc
LEFT JOIN statistics s
ON uc.category_id = s.id
WHERE uc.user_id = 1)
SELECT c.NAME,
u.count,
u.sum
FROM categories c
INNER JOIN user_category u
ON u.category_id = c.id;

Related

How to add sum to recursive query

I have this query
the table flights also contains price column. I'd like to sum it all up and display. How can I solve this?
Can I do this by taking the values from SELECT * from get_cities; somehow or it should be done in the query?
Table img
I am trying to solve this
Write a query finding all the names of the cities City name can be reached by plane with 3 stops. Display all the cities where the stop took place and the total cost of the trip. Also sum up the journey cost.
WITH RECURSIVE get_cities AS (
SELECT 0 as count, city, cid from cities where CITY = 'Agat'
UNION ALL
SELECT c.count + 1, b.city, b.cid from get_cities c
JOIN flights t on t.departure = c.cid
JOIN cities b on t.arrival = b.cid
WHERE COUNT < 3
)
SELECT cid, sum(price) from get_cities
JOIN flights f on f.fid = cid
GROUP BY cid
;
You can sum the prices directly in the recursive cte :
WITH RECURSIVE get_cities AS (
SELECT 0 as count, array[city] as city_path, array[cid] as cid_path, 0 as total_price
FROM cities
WHERE CITY = 'Agat'
UNION ALL
SELECT c.count + 1, c.city_path || b.city, c.cid_path || b.cid, c.total_price + t.price
FROM get_cities c
JOIN flights t on t.departure = c.cid
JOIN cities b on t.arrival = b.cid
WHERE COUNT < 3
)
SELECT *
FROM get_cities
WHERE count = 2 -- select only the journey with 3 stops ;

Selecting other columns not in count, group by

So I have a table as follows
product_id sender_id timestamp ...other columns...
1 2 1222
1 2 3423
1 2 1231
2 2 890
3 4 234
2 3 234234
I want to get rows where sender_id = 2, but I want to count and group by product_id and sort by timestamp descending. This means I need the following result
product_id sender_id timestamp count ...other columns...
1 2 3423 3
2 2 890 1
I tried the following query:
SELECT product_id, sender_id, timestamp, count(product_id), ...other columns...
FROM table
WHERE sender_id = 2
GROUP BY product_id
But I get the following error Error in query: ERROR: column "table.sender_id" must appear in the GROUP BY clause or be used in an aggregate function
Seems like I cannot SELECT columns that are not in the GROUP BY. Another method which I found online was to join
SELECT product_id, sender_id, timestamp, count, ...other columns...
FROM table
JOIN (
SELECT product_id, COUNT(product_id) AS count
FROM table
GROUP BY (product_id)
) table1 ON table.product_id = table1.product_id
WHERE sender_id = 2
GROUP BY product_id
But doing this simply lists all rows without grouping or counting. My guess is that the ON part simply extends table again.
Try grouping using product_id, sender_id
select product_id, sender_id, count(product_id), max(timestamp) maxtm
from t
where sender_id = 2
group by product_id, sender_id
order by maxtm desc
If you want other columns too:
select t.*, t1.product_count
from t
inner join (
select product_id, sender_id, count(product_id) product_count, max(timestamp) maxtm
from t
where sender_id = 2
group by product_id, sender_id
) t1
on t.product_id = t1.product_id and t.sender_id = t1.sender_id and t.timestamp = t1.maxtm
order by t1.maxtm desc
Just do a workout with your data:
CREATE TABLE products (product_id INTEGER,
sender_id INTEGER,
time_stamp INTEGER)
INSERT INTO products VALUES
(1,2,1222),
(1,2,3423),
(1,2,1231),
(2,2,890),
(3,4,234),
(2,3,234234)
SELECT product_id,sender_id,string_agg(time_stamp::text,','),count(product_id)
FROM products
WHERE sender_id=2
GROUP BY product_id,sender_id
Here you have distinct time_stamp ,so you need to apply some aggregate or just remove that column in select statement.
If you remove time_stamp in select statement then it would be very easy like below :
SELECT product_id,sender_id,count(product_id)
FROM products
WHERE sender_id=2
GROUP BY product_id,sender_id

How to normalize group by count results?

How can the results of a "group by" count be normalized by the count's sum?
For example, given:
User Rating (1-5)
----------------------
1 3
1 4
1 2
3 5
4 3
3 2
2 3
The result will be:
User Count Percentage
---------------------------
1 3 .42 (=3/7)
2 1 .14 (=1/7)
3 2 .28 (...)
4 1 .14
So for each user the number of ratings they provided is given as the percentage of the total ratings provided by everyone.
SELECT DISTINCT ON (user) user, count(*) OVER (PARTITION BY user) AS cnt,
count(*) OVER (PARTITION BY user) / count(*) OVER () AS percentage;
The count(*) OVER (PARTITION BY user) is a so-called window function. Window functions let you perform some operation over a "window" created by some "partition" which is here made over the user id. In plain and simple English: the partitioned count(*) is calculated for each distinct user value, so in effect it counts the number of rows for each user value.
Without using a windowing function or variables, you will need to cross join a grouped subquery on a second "maxed" subquery then select again to return a subset you can work with.
SELECT
B.UserID,
B.UserCount,
A.CountAll
FROM
(
SELECT
CountAll=SUM(UserCount)
FROM
(
SELECT
UserCount=COUNT(*)
FROM
MyTable
GROUP BY
UserID
) AS A
)AS C
CROSS JOIN(
SELECT
UserID,
UserCount=COUNT(*)
FROM
MyTable
GROUP BY
UserID
)AS B

Postgres Select Query: SELECT number of appearance in two other tables

I want to select the total number of appearance of a single player in two other tables.
Here is my database structure (postgres):
Table: Player
id integer
Table: World Champions
id integer
year date
player_id integer
Table: European Champions
id integer
year date
player_id integer
The id on table player is also available in table "World Champions" and "European Champions" (player_id).
I want to select the data as following:
player.id worldChampionTitles europeanChampionTitles
1 3 4
2 1 0
3 0 0
4 1 1
But I have no Idea how to write my select query for that.
Easy with subqueries:
SELECT p.id
, (SELECT count(*) FROM "World Champions" AS c WHERE c.player_id = p.id)
+ (SELECT count(*) FROM "European Champions" AS c WHERE c.player_id = p.id)
FROM Player AS p

How to count detail rows on nested categories?

Let us consider that we have Categories (with PK as CategoryId) and Products (with PK as ProductId). Also, assume that every Category can relate to its parent category (using ParentCategoryId column in Categories).
How can I get Category wise product count? The parent category should include the count of all products of all of its sub-categories as well.
Any easier way to do?
sounds like what you are asking for would be a good use for with rollup
select cola, colb, SUM(colc) AS sumc
from table
group by cola, colb
with rollup
This would give a sum for colb and a rollup sum for cola. Example result below. Hope the formatting works. The null values are the rollup sums for the group.
cola colb sumc
1 a 1
1 b 4
1 NULL 5
2 c 2
2 d 3
2 NULL 5
NULL NULL 10
Give it a go and let me know if that has worked.
--EDIT
OK i think ive got this as it is working on a small test set i am using. Ive started to see a place where i need this myself so thanks for asking the question. I will admit this is a bit messy but should work for any number of levels and will only return the sum at the highest level.
I made an assumption that there is a number field in products.
with x
as (
select c.CategoryID, c.parentid, p.number, cast(c.CategoryID as varchar(8000)) as grp, c.CategoryID as thisid
from Categories as c
join Products as p on p.Categoryid = c.CategoryID
union all
select c.CategoryID, c.parentid, p.number, cast(c.CategoryID as varchar(8000))+'.'+x.grp , x.thisid
from Categories as c
join Products as p on p.Categoryid = c.CategoryID
join x on x.parentid = c.CategoryID
)
select x.CategoryID, SUM(x.number) as Amount
from x
left join Categories a on (a.CategoryID = LEFT(x.grp, case when charindex('.',x.grp)-1 > 0 then charindex('.',x.grp)-1 else 0 end))
or (a.CategoryID = x.thisid)
where a.parentid = 0
group by x.CategoryID
Assuming that Products can only point to a subcategory, here's a probable solution to the problem:
SELECT
cp.CategoryId,
ProductCount = COUNT(*)
FROM Products p
INNER JOIN Categories cc ON p.CategoryId = cc.CategoryId
INNER JOIN Categories cp ON cc.ParentCategoryId = cp.CategoryId
GROUP BY cp.CategoryId
But if the above assumption is wrong and a product can reference a parent category directly as well as a subcategory, then here's how you could count the products in this case:
SELECT
CategoryId = ISNULL(c2.CategoryId, c1.CategoryId),
ProductCount = COUNT(*)
FROM Products p
INNER JOIN Categories c1 ON p.CategoryId = c1.CategoryId
LEFT JOIN Categories c2 ON c1.ParentCategoryId = c2.CategoryId
GROUP BY ISNULL(c2.CategoryId, c1.CategoryId)
EDIT
This should work for 3 levels of hierarchy of categories (category, sub-category, sub-sub-category).
SELECT
CategoryId = COALESCE(c3.CategoryId, c2.CategoryId, c1.CategoryId),
ProductCount = COUNT(*)
FROM Products p
INNER JOIN Categories c1 ON p.CategoryId = c1.CategoryId
LEFT JOIN Categories c2 ON c1.ParentCategoryId = c2.CategoryId
LEFT JOIN Categories c3 ON c2.ParentCategoryId = c3.CategoryId
GROUP BY ISNULL(c3.CategoryId, c2.CategoryId, c1.CategoryId)
COALESCE picks the first non-NULL component. If the category is a child, it picks c3.Category, which is its grand-parent, if a parent, then its parent c2.Category is chosen, otherwise it's a grand-parent (c1.CategoryId).
In the end, it selects only grand-parent categories, and shows product count for them that includes all the subcategories of all levels.