t-sql group by category and get top n values

t-sql group by category and get top n values - tsql

Imagine I have this table:
Month | Person | Value
----------------------
Jan | P1 | 1
Jan | P2 | 2
Jan | P3 | 3
Feb | P1 | 5
Feb | P2 | 4
Feb | P3 | 3
Feb | P4 | 2
...
How can I build a t-sql query to get the top 2 value rows and a third with the sum of others?
Something like this:
RESULT:
Month | Person | Value
----------------------
Jan | P3 | 3
Jan | P2 | 2
Jan | Others | 1 -(sum of the bottom value - in this case (Jan, P1, 1))
Feb | P1 | 5
Feb | P2 | 4
Feb | Others | 5 -(sum of the bottom values - in this case (Feb, P3, 3) and (Feb, P4, 2))
Thanks

In the assumption you are using SQL Server 2005 or higher, using a CTE would do the trick.
Attach a ROW_NUMBER to each row, starting with the highest value, resetting for each month.
SELECT the top 2 rows for each month from this query (rownumber <= 2)
UNION with the remaining rows (rownumber > 2)
SQL Statement
;WITH Months (Month, Person, Value) AS (
SELECT 'Jan', 'P1', 1 UNION ALL
SELECT 'Jan', 'P2', 2 UNION ALL
SELECT 'Jan', 'P3', 3 UNION ALL
SELECT 'Feb', 'P1', 5 UNION ALL
SELECT 'Feb', 'P2', 4 UNION ALL
SELECT 'Feb', 'P3', 3 UNION ALL
SELECT 'Feb', 'P4', 2
),
q AS (
SELECT Month
, Person
, Value
, RowNumber = ROW_NUMBER() OVER (PARTITION BY Month ORDER BY Value DESC)
FROM Months
)
SELECT Month
, Person
, Value
FROM (
SELECT Month
, Person
, Value
, RowNumber
FROM q
WHERE RowNumber <= 2
UNION ALL
SELECT Month
, Person = 'Others'
, SUM(Value)
, MAX(RowNumber)
FROM q
WHERE RowNumber > 2
GROUP BY
Month
) q
ORDER BY
Month DESC
, RowNumber
Kudo's go to Andriy for teaching me some new tricks.

;WITH atable (Month, Person, Value) AS (
SELECT 'Jan', 'P1', 1 UNION ALL
SELECT 'Jan', 'P2', 2 UNION ALL
SELECT 'Jan', 'P3', 3 UNION ALL
SELECT 'Feb', 'P1', 5 UNION ALL
SELECT 'Feb', 'P2', 4 UNION ALL
SELECT 'Feb', 'P3', 3 UNION ALL
SELECT 'Feb', 'P4', 2
),
numbered AS (
SELECT
Month, Person, Value,
rownum = ROW_NUMBER() OVER (PARTITION BY Month ORDER BY Value DESC)
FROM atable
),
grouped AS (
SELECT
Month, Person, Value,
Grp = CASE WHEN rownum < 3 THEN rownum ELSE 3 END
FROM numbered
)
SELECT
Month,
Person = CASE Grp WHEN 3 THEN 'Others' ELSE MAX(Person) END,
Value = SUM(Value)
FROM grouped
GROUP BY Month, Grp
ORDER BY Month DESC, Grp

WITH NTable AS
(
SELECT [Month],
Person,
Value,
ROW_NUMBER() OVER (PARTITION BY [Month] ORDER BY Value DESC)
AS Rownumber
FROM MyTable
)
SELECT t.[Month],
CASE Rownumber WHEN 1 THEN t.Person WHEN 2 THEN t.Person ELSE 'Others' END As Person,
SUM(t.Value) As [Sum]
FROM NTable t
GROUP BY t.[Month], CASE Rownumber WHEN 1 THEN t.Person WHEN 2 THEN t.Person ELSE 'Others' END
ORDER BY t.[Month]

Related

BigQuery SQL: Group rows with shared ID that occur within 7 days of each other, and return values from most recent occurrence

I have a table of datestamped events that I need to bundle into 7-day groups, starting with the earliest occurrence of each event_id.
The final output should return each bundle's start and end date and 'value' column of the most recent event from each bundle.
There is no predetermined start date, and the '7-day' windows are arbitrary, not 'week of the year'.
I've tried a ton of examples from other posts but none quite fit my needs or use things I'm not sure how to refactor for BigQuery
Sample Data;
Event_Id
Event_Date
Value
1
2022-01-01
010203
1
2022-01-02
040506
1
2022-01-03
070809
1
2022-01-20
101112
1
2022-01-23
131415
2
2022-01-02
161718
2
2022-01-08
192021
3
2022-02-12
212223
Expected output;
Event_Id
Start_Date
End_Date
Value
1
2022-01-01
2022-01-03
070809
1
2022-01-20
2022-01-23
131415
2
2022-01-02
2022-01-08
192021
3
2022-02-12
2022-02-12
212223

You might consider below.
CREATE TEMP FUNCTION cumsumbin(a ARRAY<INT64>) RETURNS INT64
LANGUAGE js AS """
bin = 0;
a.reduce((c, v) => {
if (c + Number(v) > 6) { bin += 1; return 0; }
else return c += Number(v);
}, 0);
return bin;
""";
WITH sample_data AS (
select 1 event_id, DATE '2022-01-01' event_date, '010203' value union all
select 1 event_id, '2022-01-02' event_date, '040506' value union all
select 1 event_id, '2022-01-03' event_date, '070809' value union all
select 1 event_id, '2022-01-20' event_date, '101112' value union all
select 1 event_id, '2022-01-23' event_date, '131415' value union all
select 2 event_id, '2022-01-02' event_date, '161718' value union all
select 2 event_id, '2022-01-08' event_date, '192021' value union all
select 3 event_id, '2022-02-12' event_date, '212223' value
),
binning AS (
SELECT *, cumsumbin(ARRAY_AGG(diff) OVER w1) bin
FROM (
SELECT *, DATE_DIFF(event_date, LAG(event_date) OVER w0, DAY) AS diff
FROM sample_data
WINDOW w0 AS (PARTITION BY event_id ORDER BY event_date)
) WINDOW w1 AS (PARTITION BY event_id ORDER BY event_date)
)
SELECT event_id,
MIN(event_date) start_date,
ARRAY_AGG(
STRUCT(event_date AS end_date, value) ORDER BY event_date DESC LIMIT 1
)[OFFSET(0)].*
FROM binning GROUP BY event_id, bin;

Oracle SQL return value from child table with minimum row number with values in specific list

I have a need to select all rows from a table (main table) and join to another table (child table). In the results set, I want to include one column from the child table, that is only the first row / line number with a column value in a specified list. If there is no match for the specified list, it should be (null)
Desired Result:
ORDER_NO
ORDER_DATE
ORDER CUST
ORDER_VALUE
ITEM
1
02/14/2022
12345
$1,000.00
APPLES
2
02/13/2022
67890
$5,000.00
(null)
3
02/12/2022
45678
$100.00
PEARS
Example:
Main Table: Order Table
Order Number (Handle)
Order Date,
Order Customer,
Order Value
ORDER_NO
ORDER_DATE
ORDER CUST
ORDER_VALUE
1
02/14/2022
12345
$1,000.00
2
02/13/2022
67890
$5,000.00
3
02/12/2022
45678
$100.00
Child Table: Order Details Tbl
Order Number (Handle)
Line Number = Order Line No
Ordered Item,
Ordered Qty
ORDER_NO
LINE_NO
ITEM
1
10
APPLES
1
20
ORANGES
1
30
LETTUCE
2
10
BROCCOLI
2
20
CAULIFLOWER
2
30
LETTUCE
3
10
KALE
3
20
RADISHES
3
30
PEARS
In this example, the returned column is essentially the first line of the order that is a fruit, not a vegetable. And if the order includes no matching fruit, null is returned.
What my code is thus far:
SELECT
MAIN.ORDER_NO,
MAIN.ORDER_DATE,
MAIN.ORDER_CUST,
MAIN.ORDER_VALUE,
B.ITEM
FROM
MAIN
LEFT JOIN
(
SELECT
CHILD.ORDER_NO,
CHILD.LINE_NO,
CHILD.ITEM
FROM
CHILD
WHERE
CHILD.ORDER_NO||'_'||LINE_NO IN
(
SELECT
CHILD.ORDER_NO||'_'||MIN(LINE_NO) AS ORDER_LINE_NO
FROM
CHILD
WHERE
CHILD.ITEM IN ('APPLES','ORANGES','PEACHES','PEARS','GRAPES')
GROUP BY
CHILD.ORDER_NO
)
) B ON MAIN.ORDER_NO = B.ORDER_NO
'''
This code is of course not working as desired, as table 'B' is including all results from CHILD.

From Oracle 12, you can use:
SELECT o.*,
d.item
FROM orders o
LEFT OUTER JOIN LATERAL(
SELECT *
FROM order_details d
WHERE o.order_no = d.order_no
AND item IN ('APPLES','ORANGES','PEACHES','PEARS','GRAPES')
ORDER BY line_no ASC
FETCH FIRST ROW ONLY
) d
ON (1 = 1)
In earlier versions you can use:
SELECT o.*,
d.item
FROM orders o
LEFT OUTER JOIN(
SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY order_no ORDER BY line_no ASC)
AS rn
FROM order_details d
WHERE item IN ('APPLES','ORANGES','PEACHES','PEARS','GRAPES')
) d
ON (o.order_no = d.order_no AND rn = 1)
Which, for the sample data:
CREATE TABLE orders (ORDER_NO, ORDER_DATE, ORDER_CUST, ORDER_VALUE) AS
SELECT 1, DATE '2022-02-14', 12345, 1000.00 FROM DUAL UNION ALL
SELECT 2, DATE '2022-02-13', 67890, 5000.00 FROM DUAL UNION ALL
SELECT 3, DATE '2022-02-12', 45678, 100.00 FROM DUAL;
CREATE TABLE Order_Details (ORDER_NO, LINE_NO, ITEM) AS
SELECT 1, 10, 'APPLES' FROM DUAL UNION ALL
SELECT 1, 20, 'ORANGES' FROM DUAL UNION ALL
SELECT 1, 30, 'LETTUCE' FROM DUAL UNION ALL
SELECT 2, 10, 'BROCCOLI' FROM DUAL UNION ALL
SELECT 2, 20, 'CAULIFLOWER' FROM DUAL UNION ALL
SELECT 2, 30, 'LETTUCE' FROM DUAL UNION ALL
SELECT 3, 10, 'KALE' FROM DUAL UNION ALL
SELECT 3, 20, 'RADISHES' FROM DUAL UNION ALL
SELECT 3, 30, 'PEARS' FROM DUAL;
Both output:
ORDER_NO
ORDER_DATE
ORDER_CUST
ORDER_VALUE
ITEM
1
2022-02-14 00:00:00
12345
1000
APPLES
2
2022-02-13 00:00:00
67890
5000
null
3
2022-02-12 00:00:00
45678
100
PEARS
db<>fiddle here

Select dates missing data in a range

I have a postgres table test_table that looks like this:
date | test_hour
------------+-----------
2000-01-01 | 1
2000-01-01 | 2
2000-01-01 | 3
2000-01-02 | 1
2000-01-02 | 2
2000-01-02 | 3
2000-01-02 | 4
2000-01-03 | 1
2000-01-03 | 2
I need to select all the dates which don't have test_hour = 1, 2, and 3, so it should return
date
------------
2000-01-03
Here is what I have tried:
SELECT date FROM test_table WHERE test_hour NOT IN (SELECT generate_series(1,3));
But that only returns dates that have extra hours beyond 1, 2, 3

You can use aggregation and conditional HAVING clauses, like so:
SELECT mydate
FROM mytable
GROUP BY mydate
HAVING
MAX(CASE WHEN test_hour = 1 THEN 1 END) != 1
OR MAX(CASE WHEN test_hour = 2 THEN 1 END) != 1
OR MAX(CASE WHEN test_hour = 3 THEN 1 END) != 1

Another possibility would be to join it against the series (or another subquery containing the hours) and do a [distinct] count on the hours aggregatet per date:
select date from tst
inner join (select generate_series(1,3) "hour") hours on hours.hour = tst.hour
group by tst.date
having count(distinct tst.hour) < 3;
or
select date from tst
where hour in (select generate_series(1,3))
group by date
having count(distinct tst.hour) < 3;
[You don't need the distinct if date/hour combinations in Your table are unique]

A solution using set difference, giving you exactly the rows that are missing:
(SELECT DISTINCT
date, all_hour
FROM test_table
CROSS JOIN generate_series(1,3) all_hour)
EXCEPT
(TABLE test_table)
And a solution using an array aggregate and the array contains operator:
SELECT date
FROM test_table
GROUP BY date
HAVING NOT array_agg(test_hour) #> ARRAY(SELECT generate_series(1,3))
(online demos)

Get Data Week Wise in SQL Server

I have a Table with columns ProductId, DateofPurchase, Quantity.
I want a report in which week it belongs to.
Suppose if I give March Month I can get the quantity for the march month.
But I want as below if I give date as parameter.
Here Quantity available for March month on 23/03/2018 is 100
Material Code Week1 Week2 Week3 Week4
12475 - - - 100
The logic is 1-7 first week, 8-15 second week, 16-23 third week, 24-30 fourth week

#Sasi, this can get you started. YOu will need to use CTE to build a template table that describes what happens yearly. Then using your table with inner join you can link it up and do a pivot to group the weeks.
Let me know if you need any tweaking.
DECLARE #StartDate DATE='20180101'
DECLARE #EndDate DATE='20180901'
DECLARE #Dates TABLE(
Workdate DATE Primary Key
)
DECLARE #tbl TABLE(ProductId INT, DateofPurchase DATE, Quantity INT);
INSERT INTO #tbl
SELECT 12475, '20180623', 100
;WITH Dates AS(
SELECT Workdate=#StartDate,WorkMonth=DATENAME(MONTH,#StartDate),WorkYear=YEAR(#StartDate), WorkWeek=datename(wk, #StartDate )
UNION ALL
SELECT CurrDate=DateAdd(WEEK,1,Workdate),WorkMonth=DATENAME(MONTH,DateAdd(WEEK,1,Workdate)),YEAR(DateAdd(WEEK,1,Workdate)),datename(wk, DateAdd(WEEK,1,Workdate)) FROM Dates D WHERE Workdate<#EndDate ---AND (DATENAME(MONTH,D.Workdate))=(DATENAME(MONTH,D.Workdate))
)
SELECT *
FROM
(
SELECT
sal.ProductId,
GroupWeek='Week'+
CASE
WHEN WorkWeek BETWEEN 1 AND 7 THEN '1'
WHEN WorkWeek BETWEEN 8 AND 15 THEN '2'
WHEN WorkWeek BETWEEN 16 AND 23 THEN '3'
WHEN WorkWeek BETWEEN 24 AND 30 THEN '4'
WHEN WorkWeek BETWEEN 31 AND 37 THEN '5'
WHEN WorkWeek BETWEEN 38 AND 42 THEN '6'
END,
Quantity
FROM
Dates D
JOIN #tbl sal on
sal.DateofPurchase between D.Workdate and DateAdd(DAY,6,Workdate)
)T
PIVOT
(
SUM(Quantity) FOR GroupWeek IN (Week1, Week2, Week3, Week4, Week5, Week6, Week7, Week8, Week9, Week10, Week11, Week12, Week13, Week14, Week15, Week16, Week17, Week18, Week19, Week20, Week21, Week22, Week23, Week24, Week25, Week26, Week27, Week28, Week29, Week30, Week31, Week32, Week33, Week34, Week35, Week36, Week37, Week38, Week39, Week40, Week41, Week42, Week43, Week44, Week45, Week46, Week47, Week48, Week49, Week50, Week51, Week52
/*add as many as you need*/)
)p
--ORDER BY
--1
option (maxrecursion 0)

Sample Data :
DECLARE #Products TABLE(Id INT PRIMARY KEY,
ProductName NVARCHAR(50))
DECLARE #Orders TABLE(ProductId INT,
DateofPurchase DATETIME,
Quantity BIGINT)
INSERT INTO #Products(Id,ProductName)
VALUES(1,N'Product1'),
(2,N'Product2')
INSERT INTO #Orders( ProductId ,DateofPurchase ,Quantity)
VALUES (1,'2018-01-01',130),
(1,'2018-01-09',140),
(1,'2018-01-16',150),
(1,'2018-01-24',160),
(2,'2018-01-01',30),
(2,'2018-01-09',40),
(2,'2018-01-16',50),
(2,'2018-01-24',60)
Query :
SELECT P.Id,
P.ProductName,
Orders.MonthName,
Orders.Week1,
Orders.Week2,
Orders.Week3,
Orders.Week4
FROM #Products AS P
INNER JOIN (SELECT O.ProductId,
SUM((CASE WHEN DATEPART(DAY,O.DateofPurchase) BETWEEN 1 AND 7 THEN O.Quantity ELSE 0 END)) AS Week1,
SUM((CASE WHEN DATEPART(DAY,O.DateofPurchase) BETWEEN 8 AND 15 THEN O.Quantity ELSE 0 END)) AS Week2,
SUM((CASE WHEN DATEPART(DAY,O.DateofPurchase) BETWEEN 16 AND 23 THEN O.Quantity ELSE 0 END)) AS Week3,
SUM((CASE WHEN DATEPART(DAY,O.DateofPurchase) >= 24 THEN O.Quantity ELSE 0 END)) AS Week4,
DATENAME(MONTH,O.DateofPurchase) AS MonthName
FROM #Orders AS O
GROUP BY O.ProductId,DATENAME(MONTH,O.DateofPurchase)) AS Orders ON P.Id = Orders.ProductId
Result :
-----------------------------------------------------------------------
| Id | ProductName | MonthNumber | Week1 | Week2 | Week3 | Week4 |
-----------------------------------------------------------------------
| 1 | Product1 | January | 130 | 140 | 150 | 160 |
| 2 | Product2 | January | 30 | 40 | 50 | 60 |
-----------------------------------------------------------------------

First and second time appearing row id in PostgreSQL

Suppose we have a list of ids with date. And we want to know when the ids appeared for the first and the second time. About the first time, I have created a query that is
SELECT year, mon, COUNT(id) AS sum_first_id
FROM (
SELECT DISTINCT
ON (id) DATE, id
FROM TABLE
GROUP BY 2, 1
) AS foo
GROUP BY 2, 1
ORDER BY 1, 2;
I think that this works. But how could I find when the ids appear for the second time?

Let's say you have the table table_x:
select *
from table_x
order by 1, 2
id | date
----+------------
1 | 2015-06-04
1 | 2015-06-05
1 | 2015-06-14
2 | 2015-06-05
2 | 2015-06-08
2 | 2015-06-10
2 | 2015-06-17
2 | 2015-06-22
(8 rows)
To select n first element in groups use row_number() function:
select id, date
from (
select id, date, row_number() over (partition by id order by date) rn
from table_x
order by 1, 2
) sub
where rn <= 2
id | date
----+------------
1 | 2015-06-04
1 | 2015-06-05
2 | 2015-06-05
2 | 2015-06-08
(4 rows)
It does not appear that your query is correct.
SELECT year, mon, COUNT(id) AS sum_first_id -- what is year, mon?
FROM (
SELECT DISTINCT
ON (id) DATE, id
FROM TABLE
GROUP BY 2, 1 -- should be order by 2, 1
) AS foo
GROUP BY 2, 1
ORDER BY 1, 2;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

t-sql group by category and get top n values - tsql

Related

BigQuery SQL: Group rows with shared ID that occur within 7 days of each other, and return values from most recent occurrence

Oracle SQL return value from child table with minimum row number with values in specific list

Select dates missing data in a range

Get Data Week Wise in SQL Server

First and second time appearing row id in PostgreSQL

Categories

Resources