Using Derived Tables and CTEs to Display Details? - tsql

I am teaching myself T-SQL and am struggling to comprehend the following example..
Suppose you want to display several nonaggregated columns along with
some aggregate expressions that apply to the entire result set or to a
larger grouping level. For example, you may need to display several
columns from the Sales.SalesOrderHeader table and calculate the
percent of the TotalDue for each sale compared to the TotalDue for all
the customer’s sales. If you group by CustomerID, you can’t include
other nonaggregated columns from Sales.SalesOrderHeader unless you
group by those columns. To get around this, you can use a derived
table or a CTE.
Here are two examples given...
SELECT c.CustomerID, SalesOrderID, TotalDue, AvgOfTotalDue,
TotalDue/SumOfTotalDue * 100 AS SalePercent
FROM Sales.SalesOrderHeader AS soh
INNER JOIN
(SELECT CustomerID, SUM(TotalDue) AS SumOfTotalDue,
AVG(TotalDue) AS AvgOfTotalDue
FROM Sales.SalesOrderHeader
GROUP BY CustomerID) AS c ON soh.CustomerID = c.CustomerID
ORDER BY c.CustomerID;
WITH c AS
(SELECT CustomerID, SUM(TotalDue) AS SumOfTotalDue,
AVG(TotalDue) AS AvgOfTotalDue
FROM Sales.SalesOrderHeader
GROUP BY CustomerID)
SELECT c.CustomerID, SalesOrderID, TotalDue,AvgOfTotalDue,
TotalDue/SumOfTotalDue * 100 AS SalePercent
FROM Sales.SalesOrderHeader AS soh
INNER JOIN c ON soh.CustomerID = c.CustomerID
ORDER BY c.CustomerID;
Why doesn't this query produce the same result..
SELECT CustomerID, SalesOrderID, TotalDue, AVG(TotalDue) AS AvgOfTotalDue,
TotalDue/SUM(TotalDue) * 100 AS SalePercent
FROM Sales.SalesOrderHeader
GROUP BY CustomerID, SalesOrderID, TotalDue
ORDER BY CustomerID
I'm looking for someone to explain the above examples in another way or step through it logically so I can understand how they work?

The aggregates in this statement (i.e. SUM and AVG) don't do anything:
SELECT CustomerID, SalesOrderID, TotalDue, AVG(TotalDue) AS AvgOfTotalDue,
TotalDue/SUM(TotalDue) * 100 AS SalePercent
FROM Sales.SalesOrderHeader
GROUP BY CustomerID, SalesOrderID, TotalDue
ORDER BY CustomerID
The reason for this is you're grouping by TotalDue, so all records in the same group have the same value for this field. In the case of AVG this means you're guarenteed for AvgOfTotalDue to always equal TotalDue. For SUM it's possible you'd get a different result, but as you're also grouping by SalesOrderID (which I'd imagine is unique in the SalesOrderHeader table) you will only have one record per group, so again this will always equal the TotalDue value.
With the CTE example you're only grouping by CustomerId; as a customer may have many sales orders associated with it, these aggregate values will be different to the TotalDue.
EDIT
Explanation of the aggregate of field included in group by:
When you group by a value, all rows with that same value are collected together and aggregate functions are performed over them. Say you had 5 rows with a total due of 1 and 3 with a total due of 2 you'd get two result lines; one with the 1s and one with the 2s. Now if you perform a sum on these you have 3*1 and 2*2. Now divide by the number of rows in that result line (to get the average) and you have 3*1/3 and 2*2/2; so things cancel out leaving you with 1 and 2.
select totalDue, avg(totalDue)
from (
select 1 totalDue
union all select 1 totalDue
union all select 1 totalDue
union all select 2 totalDue
union all select 2 totalDue
) x
group by totalDue
select uniqueId, totalDue, avg(totalDue), sum(totalDue)
from (
select 1 uniqueId, 1 totalDue
union all select 2 uniqueId, 1 totalDue
union all select 3 uniqueId, 1 totalDue
union all select 4 uniqueId, 2 totalDue
union all select 5 uniqueId, 2 totalDue
) x
group by uniqueId
Runnable Example: http://sqlfiddle.com/#!2/d41d8/21263

Related

How to query two tables with same schema but different time ranges in a date column

I have two tables:
main_products
old_products
They have the same info and schema with only one difference:
main_products has min(date) = 2022-01 and max(date) = 2022-05
and
old_products has min(date) = 2020-01 and max(date) = 2020-12
How can I query to get all records from old_products + all records from main_products to get products from 2020-01 to 2022-05 ?
The product on both tables has and product_id field.
I tried to join both tables on product_id but the output is a table with twice number of columns.
select t1.*, t2.* from t1
inner join t2
one t1.product_id = t2.product_id
I think you are looking for a UNION or UNION ALL:
SELECT *
FROM t1
WHERE ...
UNION ALL
SELECT *
FROM t2
WHERE ...
If the columns in t1 and t2 are the same (same number of columns and same types), this will pull the data from both of them. Use UNION if you want duplicates removed or UNION ALL to include duplicates. (In your case it won't make a functional difference since the tables don't overlap by date, but UNION ALL will be faster.)
In the above example, you can put your condition (to only get 2022-01 to 2022-05) in both WHERE conditions. If you don't like repeating the condition, you can use the UNION ALL query in a subquery with the condition outside:
SELECT *
FROM
(
SELECT *
FROM t1
UNION ALL
SELECT *
FROM t2
) sq
WHERE ...

Selecting other columns not in count, group by

So I have a table as follows
product_id sender_id timestamp ...other columns...
1 2 1222
1 2 3423
1 2 1231
2 2 890
3 4 234
2 3 234234
I want to get rows where sender_id = 2, but I want to count and group by product_id and sort by timestamp descending. This means I need the following result
product_id sender_id timestamp count ...other columns...
1 2 3423 3
2 2 890 1
I tried the following query:
SELECT product_id, sender_id, timestamp, count(product_id), ...other columns...
FROM table
WHERE sender_id = 2
GROUP BY product_id
But I get the following error Error in query: ERROR: column "table.sender_id" must appear in the GROUP BY clause or be used in an aggregate function
Seems like I cannot SELECT columns that are not in the GROUP BY. Another method which I found online was to join
SELECT product_id, sender_id, timestamp, count, ...other columns...
FROM table
JOIN (
SELECT product_id, COUNT(product_id) AS count
FROM table
GROUP BY (product_id)
) table1 ON table.product_id = table1.product_id
WHERE sender_id = 2
GROUP BY product_id
But doing this simply lists all rows without grouping or counting. My guess is that the ON part simply extends table again.
Try grouping using product_id, sender_id
select product_id, sender_id, count(product_id), max(timestamp) maxtm
from t
where sender_id = 2
group by product_id, sender_id
order by maxtm desc
If you want other columns too:
select t.*, t1.product_count
from t
inner join (
select product_id, sender_id, count(product_id) product_count, max(timestamp) maxtm
from t
where sender_id = 2
group by product_id, sender_id
) t1
on t.product_id = t1.product_id and t.sender_id = t1.sender_id and t.timestamp = t1.maxtm
order by t1.maxtm desc
Just do a workout with your data:
CREATE TABLE products (product_id INTEGER,
sender_id INTEGER,
time_stamp INTEGER)
INSERT INTO products VALUES
(1,2,1222),
(1,2,3423),
(1,2,1231),
(2,2,890),
(3,4,234),
(2,3,234234)
SELECT product_id,sender_id,string_agg(time_stamp::text,','),count(product_id)
FROM products
WHERE sender_id=2
GROUP BY product_id,sender_id
Here you have distinct time_stamp ,so you need to apply some aggregate or just remove that column in select statement.
If you remove time_stamp in select statement then it would be very easy like below :
SELECT product_id,sender_id,count(product_id)
FROM products
WHERE sender_id=2
GROUP BY product_id,sender_id

Can't solve this SQL query

I have a difficulty dealing with a SQL query. I use PostgreSQL.
The query says: Show the customers that have done at least an order that contains products from 3 different categories. The result will be 2 columns, CustomerID, and the amount of orders. I have written this code but I don't think it's correct.
select SalesOrderHeader.CustomerID,
count(SalesOrderHeader.SalesOrderID) AS amount_of_orders
from SalesOrderHeader
inner join SalesOrderDetail on
(SalesOrderHeader.SalesOrderID=SalesOrderDetail.SalesOrderID)
inner join Product on
(SalesOrderDetail.ProductID=Product.ProductID)
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
from Product
group by ProductCategoryID
having count(DISTINCT ProductCategoryID)>=3)
group by SalesOrderHeader.CustomerID;
Here are the database tables needed for the query:
where SalesOrderDetail.SalesOrderDetailID in
(select DISTINCT count(ProductCategoryID)
Is never going to give you a result as an ID (SalesOrderDetailID) will never logically match a COUNT (count(ProductCategoryID)).
This should get you the output I think you want.
SELECT soh.CustomerID, COUNT(soh.SalesOrderID) AS amount_of_orders
FROM SalesOrderHeader soh
INNER JOIN SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Product p ON sod.ProductID = p.ProductID
HAVING COUNT(DISTINCT p.ProductCategoryID) >= 3
GROUP BY soh.CustomerID
Try this :
select CustomerID,count(*) as amount_of_order from
SalesOrder join
(
select SalesOrderID,count(distinct ProductCategoryID) CategoryCount
from SalesOrderDetail JOIN Product using (ProductId)
group by 1
) CatCount using (SalesOrderId)
group by 1
having bool_or(CategoryCount>=3) -- At least on CategoryCount>=3

T-SQL End of Month sum

I have a table with some transaction fields, primary id is a CUSTomer field and a TXN_DATE and for two of them, NOM_AMOUNT and GRS_AMOUNT I need an EndOfMonth SUM (no rolling, just EOM, can be 0 if no transaction in the month) for these two amount fields. How can I do it? I need also a 0 reported for months with no transactions..
Thank you!
If you group by the expresion month(txn_date) you can calculate the sum. If you use a temporary table with a join on month you can determine which months have no records and thus report a 0 (or null if you don't use the coalesce fiunction).
This will be your end result, I assume you are able to add the other column you need to sum and adapt for your schema.
select mnt as month
, sum(coalesce(NOM_AMOUNT ,0)) as NOM_AMOUNT_EOM
, sum(coalesce(GRS_AMOUNT ,0)) as GRS_AMOUNT_EOM
from (
select 1 as mnt
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9
union all select 10
union all select 11
union all select 12) as m
left outer join Table1 as t
on m.mnt = month(txn_date)
group by mnt
Here is the initial working sqlfiddle

Merging tables in t-sql

I have a table holding periods and prices, something like this
itemid periodid periodstart periodend price
1 1 2011/01/01 2011/05/01 50.00
1 2 2011/05/02 2011/08/01 80.00
1 3 2011/08/02 2011/12/31 50.00
Now I have a second table that can hold single dates or periods
itemid periodid periodstart periodend price
1 8 2011/07/01 2011/07/17 70.00
Now, how can I do a query that would return the following result?
itemid periodid periodstart periodend price
1 1 2011/01/01 2011/05/01 50.00
1 2 2011/05/02 2011/06/30 80.00 ****
1 8 2011/07/01 2011/07/17 70.00 ***
1 2 2011/07/18 2011/08/01 80.00 ****
1 3 2011/08/02 2011/12/31 50.00
EDIT -- Highlight the fact that the merge is modifying the dates around it
How about something like
select
t1.itemid,t1.periodid,t1.periodstart, coalesce(dateadd(d,-1,t2.periodstart),t1.periodend) as periodend, t1.price
from t1
left outer join t2 on t1.periodstart < t2.periodstart and t1.periodend>t2.periodstart and t1.itemid=t2.itemid
union
select
t2.itemid,t2.periodid,t2.periodstart, t2.periodend, t2.price
from t1
inner join t2 on t1.periodstart < t2.periodstart and t1.periodend>t2.periodstart and t1.itemid=t2.itemid
union
select
t1.itemid,t1.periodid,dateAdd(d,1,t2.periodend), t1.periodend, t1.price
from t1
inner join t2 on t1.periodstart < t2.periodend and t1.periodend>t2.periodend and t1.itemid=t2.itemid
order by periodstart
Use a Union?
Select itemid, periodid,periodstart, periodend,price FROM table1
UNION
SELECT itemid, periodid,periodstart, periodend,price FROM table2
Are you trying to do some sort of join though? the result set doesn't match the two tables you supplied.
Are you accounting for entries that line up or are you just trying to combine the rows?
if the latter, you could just do a Union
Select itemid, periodid, periodstart, periodend, price
From Table1
Union
Select itemid, periodid, periodstart, periodend, price
From Table2