How to count detail rows on nested categories? - tsql

Let us consider that we have Categories (with PK as CategoryId) and Products (with PK as ProductId). Also, assume that every Category can relate to its parent category (using ParentCategoryId column in Categories).
How can I get Category wise product count? The parent category should include the count of all products of all of its sub-categories as well.
Any easier way to do?

sounds like what you are asking for would be a good use for with rollup
select cola, colb, SUM(colc) AS sumc
from table
group by cola, colb
with rollup
This would give a sum for colb and a rollup sum for cola. Example result below. Hope the formatting works. The null values are the rollup sums for the group.
cola colb sumc
1 a 1
1 b 4
1 NULL 5
2 c 2
2 d 3
2 NULL 5
NULL NULL 10
Give it a go and let me know if that has worked.
--EDIT
OK i think ive got this as it is working on a small test set i am using. Ive started to see a place where i need this myself so thanks for asking the question. I will admit this is a bit messy but should work for any number of levels and will only return the sum at the highest level.
I made an assumption that there is a number field in products.
with x
as (
select c.CategoryID, c.parentid, p.number, cast(c.CategoryID as varchar(8000)) as grp, c.CategoryID as thisid
from Categories as c
join Products as p on p.Categoryid = c.CategoryID
union all
select c.CategoryID, c.parentid, p.number, cast(c.CategoryID as varchar(8000))+'.'+x.grp , x.thisid
from Categories as c
join Products as p on p.Categoryid = c.CategoryID
join x on x.parentid = c.CategoryID
)
select x.CategoryID, SUM(x.number) as Amount
from x
left join Categories a on (a.CategoryID = LEFT(x.grp, case when charindex('.',x.grp)-1 > 0 then charindex('.',x.grp)-1 else 0 end))
or (a.CategoryID = x.thisid)
where a.parentid = 0
group by x.CategoryID

Assuming that Products can only point to a subcategory, here's a probable solution to the problem:
SELECT
cp.CategoryId,
ProductCount = COUNT(*)
FROM Products p
INNER JOIN Categories cc ON p.CategoryId = cc.CategoryId
INNER JOIN Categories cp ON cc.ParentCategoryId = cp.CategoryId
GROUP BY cp.CategoryId
But if the above assumption is wrong and a product can reference a parent category directly as well as a subcategory, then here's how you could count the products in this case:
SELECT
CategoryId = ISNULL(c2.CategoryId, c1.CategoryId),
ProductCount = COUNT(*)
FROM Products p
INNER JOIN Categories c1 ON p.CategoryId = c1.CategoryId
LEFT JOIN Categories c2 ON c1.ParentCategoryId = c2.CategoryId
GROUP BY ISNULL(c2.CategoryId, c1.CategoryId)
EDIT
This should work for 3 levels of hierarchy of categories (category, sub-category, sub-sub-category).
SELECT
CategoryId = COALESCE(c3.CategoryId, c2.CategoryId, c1.CategoryId),
ProductCount = COUNT(*)
FROM Products p
INNER JOIN Categories c1 ON p.CategoryId = c1.CategoryId
LEFT JOIN Categories c2 ON c1.ParentCategoryId = c2.CategoryId
LEFT JOIN Categories c3 ON c2.ParentCategoryId = c3.CategoryId
GROUP BY ISNULL(c3.CategoryId, c2.CategoryId, c1.CategoryId)
COALESCE picks the first non-NULL component. If the category is a child, it picks c3.Category, which is its grand-parent, if a parent, then its parent c2.Category is chosen, otherwise it's a grand-parent (c1.CategoryId).
In the end, it selects only grand-parent categories, and shows product count for them that includes all the subcategories of all levels.

Related

Postgres count total matches per group

Input data
I have the following association table:
AssociationTable
- Item ID: Integer
- Tag ID: Integer
Referring to the following example data
Item Tag
1 1
1 2
1 3
2 1
and some input list of tags T (e.g. [1, 2])
What I want
For each item, I would like to know which tags were not provided in the input list T.
With our sample data, we'd get:
Item Num missing
1 1
2 0
My thoughts
The best I've done so far is: select "ItemId", count("TagId") as "Num missing" from "AssociationTab" where "TagId" not in (1) group by "ItemId";
The problem here is that items where all tags match will not be included in the output.
You could use a calendar table with anti-join approach:
WITH cte AS (
SELECT t1.Item, t2.Tag
FROM (SELECT DISTINCT Item FROM AssociationTable) t1
CROSS JOIN (SELECT 1 AS Tag UNION ALL SELECT 2) t2
)
SELECT
t1.Item,
COUNT(*) FILTER (WHERE t2.Item IS NULL) AS num_missing
FROM cte t1
LEFT JOIN AssociationTable t2
ON t1.Item = t2.Item AND
t1.Tag = t2.Tag AND
t2.Tag IN (1, 2)
GROUP BY
t1.Item;
Demo
The strategy here is to build a calendar/reference table in the first CTE which contains all combinations of items and tags. Then, we left join this CTE to your association table, aggregate by item, and then detect how many tags are missing for each item.
Simplest solution is
SELECT
ItemId,
count(*) FILTER (WHERE TagId NOT IN (1,2))
FROM AssociationTab
GROUP BY ItemId
Alternatively, if you already have an Items table with the item list, you could do this:
SELECT
i.ItemId,
count(a.TagId)
FROM Items i
LEFT JOIN AssociationTab a ON a.ItemId = i.ItemId AND a.TagId NOT IN (1,2)
GROUP BY i.ItemId
The key is that LEFT JOIN does not remove the Items row if no tags match.

How to filter database table by a multiple join records from another one table but different types?

I have a products table and corresponding ratings table which contains a foreign key product_id, grade(int) and type which is an enum accepting values robustness and price_quality_ratio
The grades accept values from 1 to 10. So for example, how would the query look like, if I wanted to filter the products where minimum grade for robustness would be 7 and minimum grade for price_quality_ratio would be 8?
You can join twice, once per rating. The inner joins eliminate the products that fail any rating criteria,
select p.*
from products p
inner join rating r1
on r1.product_id = p.product_id
and r1.type = 'robustness'
and r1.rating >= 7
inner join rating r2
on r2.product_id = p.product_id
and r2.type = 'price_quality_ratio'
and r2.rating >= 8
Another option is to use do conditional aggregation. This requires only one join, then a group by; the rating criteria are checked in the having clause.
select p.product_id, p.product_name
from products p
inner join rating r
on r.product_id = p.product_id
and r.type in ('robustness', 'price_quality_ratio')
group by p.product_id, p.product_name
having
min(case when r.type = 'robustness' then r.rating end) >= 7
and min(case when r.type = 'price_quality_ratio then r.rating end) >= 8
The JOIN proposed by #GMB would've been my first suggestion as well. If that gets too complicated with having to maintain too many rX.ratings, you can also use a nested query:
SELECT *
FROM (
SELECT p.*, r1.rating as robustness, r2.rating as price_quality_ratio
FROM products p
JOIN rating r1 ON (r1.product_id = p.product_id AND r1.type = 'robustness')
JOIN rating r2 ON (r2.product_id = p.product_id AND r2.type = 'price_quality_ratio')
) AS tmp
WHERE robustness >= 7
AND price_quality_ratio >= 8
-- ORDER BY (price_quality_ratio DESC, robustness DESC) -- etc

How do I do LIMIT within GROUP in the same table?

I can't figure out how to do limit within group although I've read all similar questions here. Reading PSQL doc didn't help either :( Consider the following:
CREATE TABLE article_relationship
(
article_from INT NOT NULL,
article_to INT NOT NULL,
score INT
);
I want to get a list of top 5 related articles per given article IDs sorted by score.
Here is what I tried:
select DISTINCT o.article_from
from article_relationship o
join lateral (
select i.article_from, i.article_to, i.score from article_relationship i
order by score desc
limit 5
) p on p.article_from = o.article_from
where o.article_from IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.article_from;
And it returns nothing. I was under impression that outer query is like loop so I guess I only need source IDs there.
Also what if I want to join on articles table where there are columns id and title and get titles of related articles in resultset?
I added join in inner query:
select o.id, p.*
from articles o
join lateral (
select a.title, i.article_from, i.article_to, i.score
from article_relationship i
INNER JOIN articles a on a.id = i.article_to
where i.article_from = o.id
order by score desc
limit 5
) p on true
where o.id IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.id;
But it made it very very slow.
The problem with no rows returning from your query is that your join condition is wrong: ON p.article_from = o.article_from; this should obviously be ON p.article_from = o.article_to.
That issue aside, your query will not return the top 5 scoring relations per article id; instead it will return the article IDs that reference one of the 5 top rated referenced articles throughout the table and (also) at least 1 of the 5 referenced articles for which you specify the id.
You can get the top 5 rated referenced articles per referencing article with a window function to rank the scores in a sub-select and then select only the top 5 in the main query. Specifying a list of referenced article IDs effectively means that you will rank how these referenced articles are scored for each referencing article:
SELECT article_from, article_to, score
FROM (
SELECT article_from, article_to, score,
rank() OVER (PARTITION BY article_from ORDER BY score DESC) AS rnk
FROM article_relationship
WHERE article_to IN (18329382, 61913904, 66538293, 66540477, 66496909) ) a
WHERE rnk < 6
ORDER BY article_from, score DESC;
This is different from your code in that it returns up to 5 records for each article_from but it is consistent with your initial description.
Adding columns from table articles is trivially done in the main query:
SELECT a.article_from, a.article_to, a.score, articles.*
FROM (
SELECT article_from, article_to, score,
rank() OVER (PARTITION BY article_from ORDER BY score DESC) AS rnk
FROM article_relationship
WHERE article_to IN (18329382, 61913904, 66538293, 66540477, 66496909) ) a
JOIN articles ON articles.id = a.article_to
WHERE a.rnk < 6
ORDER BY a.article_from, a.score DESC;
Version with join lateral
select o.id as from_id, p.article_to as to_id, a.title, a.journal_id, a.pub_date_p from articles o
join lateral (
select i.article_to from article_relationship i
where i.article_from = o.id
order by score desc
limit 5
) p on true
INNER JOIN articles a on a.id = p.article_to
where o.id IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.id;

How to sum items from subtable in SQL

Let's say I have table orders
id name
1 order1
2 order2
3 order3
and subtable items
id parent amount price
1 1 1 10
2 1 3 20
3 2 2 5
4 2 5 1
I would like to create query with order with added column value. it should calculate order with all relevant items
id name value
1 order1 70
2 order2 15
3 order3 0
Is this possible with TSQL
GROUP BY and SUM would do it, need to use left join and isnull as you don't have items for all orders.
SELECT o.id, o.name, isnull(sum(i.amount*i.price),0) as value
FROM orders o
left join items i
on o.id = i.parent
group by o.id, o.name
I think you're looking for something like this
SELECT o.name, i.Value FROM orders o WITH (NOLOCK)
LEFT JOIN (SELECT parent, SUM(price) AS Value FROM items WITH (NOLOCK) GROUP BY parent) i
ON o.id = i.parent
...seems like RADAR beat me to the answer.
EDIT: missing the ON line.

How to get the Customer Detail + whether he has (an) order or not

I have 2 tables. Customers and Orders.
My requirement is...
I would like to get the result like the following
Customer Detail + HasOrders + Count(Orders)
I wrote
SELECT Customers.*
, CASE WHEN o.CustomerID IS NOT NULL THEN 1 ELSE 0 END HasOrders
FROM Customers c
LEFT JOIN Orders o
ON c.CustomerID = o.CustomersID
But it returns many rows. If the customer has 5 orders, it returns 5 rows for each Customer.
Could you please advise me? Thanks.
You need to do the counting in derived table.
SELECT c.*
, case when o.CustomerID is not null
then 1
else 0
end HasOrders
, o.NumberOfOrders
FROM Customers c
LEFT JOIN
(
SELECT CustomerID
, count(*) NumberOfOrders
FROM Orders
GROUP BY CustomerID
) o
ON c.CustomerID = o.CustomersID