How to group by records in descending order? - group-by

I have set of records that should be group by user id, and then records should show up in descending order from highest to lowest year id. Here is example of the data I have in the table:
SELECT user_id, year_id FROM tbl_users WHERE section_id = 3590 AND year_id < 30
Query output:
User_ID Year_ID
ZK0616 10
ZK0616 11
ZK0616 12
JP0204 13
JP0204 24
JP0204 25
JP0204 26
JP0204 27
JP0204 28
JP0204 29
I wrote this query in hope to get desired output. Here is the query:
SELECT user_id, max(year_id)
FROM tbl_users
WHERE section_id = 3590 AND year_id < 30
GROUP BY user_id
Query output:
User_ID Year_ID
ZK0616 12
JP0204 29
The output above still is not providing desired output. I would like to see Year_ID in descending order (from highest to lowest like this):
User_ID Year_ID
JP0204 29
ZK0616 12
I can't use ORDER BY since this query will be used as a sub query in LEFT JOIN like this:
LEFT JOIN (
SELECT user_id, max(year_id)
FROM tbl_users
WHERE section_id = 3590 AND year_id < 30
GROUP BY user_id
) tbl2
I'm wondering if there is a way to sort the records by year_id with group by. I use sybase database in my project.

Related

Using the SUM OVER clause, how to check sum over period only when output is not greater than a certain value, otherwise use current month value?

Sample data:
select date, agent, sales
from agentsales
date agent sales
2021-01-03 00:00:00.000 Agent A 10
2021-02-05 00:00:00.000 Agent A 15
2021-03-10 00:00:00.000 Agent A 10
2021-01-05 00:00:00.000 Agent B 5
2021-02-06 00:00:00.000 Agent B 28
2021-03-10 00:00:00.000 Agent B 5
2021-01-02 00:00:00.000 Agent C 35
2021-02-04 00:00:00.000 Agent C 25
2021-03-08 00:00:00.000 Agent C 15
2021-01-01 00:00:00.000 Agent D 5
2021-02-02 00:00:00.000 Agent D 35
2021-03-10 00:00:00.000 Agent D 31
I want to get the counts of agents who have crossed 30 sales, such that if they have never crossed a total of 30 sales then consider sum over current and previous months, otherwise only current month.
Expected output:
YrMon Count_Agent_more_than_30_sales
Jan21 1
Feb21 2
Mar21 2
Logic:
Jan21 - 1 since only C has crossed 30 sales
Feb21 - 2 since B and D have crossed 30 sales. Agent D has crossed the 30 mark in the month, and B has crossed over period for first time. C is not considered as it previously crossed the 30 mark.
Mar21 - 2 since A and D have crossed 30 sales. Agent A has crossed mark over period for 1st time. D has crossed for the month. B is not considered as periodic case was already considered in last month. C is not considered as it already crossed 30 mark last month.
As mentioned above, I want to get the counts of agents who have crossed 30 sales, such that if they have never crossed a total of 30 sales then consider sum over current and previous months, otherwise only current month.
My query to calculate sum over period:
;WITH CTE AS (SELECT CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR) YRMON, AGENT, SUM(SALES) SALES
FROM AgentSales
GROUP BY CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR), AGENT
)
SELECT *, SUM(SALES) OVER(PARTITION BY AGENT ORDER BY YRMON) SUMOVERPERIOD FROM CTE
ORDER BY 2,1
YRMON AGENT SALES SUMOVERPERIOD
2021 1 Agent A 10 10
2021 2 Agent A 15 25
2021 3 Agent A 10 35
2021 1 Agent B 5 5
2021 2 Agent B 28 33
2021 3 Agent B 5 38
2021 1 Agent C 35 35
2021 2 Agent C 25 60
2021 3 Agent C 15 75
2021 1 Agent D 5 5
2021 2 Agent D 35 40
2021 3 Agent D 31 71
Now I am trying to apply the logic on the calculated sum:
;WITH CTE AS (SELECT CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR) YRMON, AGENT, SUM(SALES) SALES
FROM AgentSales
GROUP BY CAST(YEAR([DATE]) AS VARCHAR)+' '+CAST(MONTH([DATE]) AS VARCHAR), AGENT
)
SELECT *, SUM(SALES) OVER(PARTITION BY AGENT ORDER BY YRMON) SUMOVERPERIOD,
CASE WHEN SUM(SALES) OVER(PARTITION BY AGENT ORDER BY YRMON)>30 THEN 1 ELSE 0 END AS CALC
FROM CTE
ORDER BY 2,1
YRMON AGENT SALES SUMOVERPERIOD CALC
2021 1 Agent A 10 10 0
2021 2 Agent A 15 25 0
2021 3 Agent A 10 35 1
2021 1 Agent B 5 5 0
2021 2 Agent B 28 33 1
2021 3 Agent B 5 38 1
2021 1 Agent C 35 35 1
2021 2 Agent C 25 60 1
2021 3 Agent C 15 75 1
2021 1 Agent D 5 5 0
2021 2 Agent D 35 40 1
2021 3 Agent D 31 71 1
This query is always considering sum over current and previous period.
How to check whether the sales has previously crossed the 30 sales mark and for such cases to exclude doing the sum over period? For example can we apply LAG on the result of the SUM OVER column?
Please check if one of these fits your needs (I think the description confusion)
Option 1
-- If you want to count only the first time [agent] crossed 30 sales
;With MyCTE01 as (
SELECT
[date] = EOMONTH([date], -1),
[agent],[sales],
S = SUM([sales]) OVER (PARTITION BY [agent] ORDER BY [date] ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW)
FROM [AgentSales]
),
MyCTE02 as (
SELECT [date],[agent],[sales], S
FROM MyCTE01
-- The idea of using "and S - [sales] < 30" instead of ROW_NUMBER came from #Charlieface, but it is better to do the work on DATE data type and not on string
WHERE S > 30 and S - [sales] < 30
)
SELECT DATENAME(month,[Date]), YEAR([Date]), COUNT(*)
FROM MyCTE02
GROUP BY [date]
GO
Option 2
-- If you want to count all the [agent] crossed 30 sales till now
;With MyCTE01 as (
SELECT
[date] = DATEADD(DAY, 1, EOMONTH([date], -1)),
[agent],[sales],
S = SUM([sales]) OVER (PARTITION BY [agent] ORDER BY [date] ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW)
FROM [AgentSales]
)
,MyCTE02 as (
SELECT [date],[agent],[sales], S
FROM MyCTE01
WHERE S > 30
)
SELECT DATENAME(month,[Date]), YEAR([Date]), COUNT(*)
FROM MyCTE02
GROUP BY [date]
GO
Option 3
-- If you want to count only the first time [agent] crossed 30 sales or when the sales or over 30
;With MyCTE01 as (
SELECT
[date] = DATEADD(DAY,1,EOMONTH([date], -1)),
[agent],[sales],
S = SUM([sales]) OVER (PARTITION BY [agent] ORDER BY [date] ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW)
FROM [AgentSales]
)
,MyCTE02 as (
SELECT [date],[agent],[sales], S
FROM MyCTE01
-- The idea of using "and S - [sales] < 30" instead of ROW_NUMBER came from #Charlieface, but it is better to do the work on DATE data type and not on string
WHERE (S > 30 and S - [sales] < 30) or sales > 30
)
SELECT DATENAME(month,[Date]), YEAR([Date]), COUNT(*)
FROM MyCTE02
GROUP BY [date]
GO
DDL+DML
USE tempdb
GO
DROP TABLE IF EXISTS [AgentSales]
GO
CREATE TABLE [AgentSales](id INT IDENTITY(1,1), [date] DATE, agent VARCHAR(100), sales INT)
GO
INSERT [AgentSales]([date],[agent],[sales]) VALUES
('2021-01-03 00:00:00.000','Agent A', 10),
('2021-02-05 00:00:00.000','Agent A', 15),
('2021-03-10 00:00:00.000','Agent A',10),
('2021-01-05 00:00:00.000','Agent B',5 ),
('2021-02-06 00:00:00.000','Agent B',28),
('2021-03-10 00:00:00.000','Agent B',5 ),
('2021-01-02 00:00:00.000','Agent C',35),
('2021-02-04 00:00:00.000','Agent C',25),
('2021-03-08 00:00:00.000','Agent C',15),
('2021-01-01 00:00:00.000','Agent D',5 ),
('2021-02-02 00:00:00.000','Agent D',35),
('2021-03-10 00:00:00.000','Agent D',31)
GO
SELECT [id],[date],[agent],[sales]
FROM [AgentSales]
GO
Looks like this should work for you
You need to pre-aggregate the sales per agent and month, then get a running sum of that aggregate
Then simply check if each row has crossed over in this month by comparing the current data with the running sum
SELECT
YrMon = FORMAT(Month, 'yyyy MM'),
Count_Agent_more_than_30_sales =
COUNT(CASE WHEN SumOverPeriod >= 30 AND SumOverPeriod - sales < 30 OR sales >= 30 THEN 1 END)
FROM (
SELECT
Month = EOMONTH(date),
agent,
sales = SUM(sales),
SumOverPeriod = SUM(SUM(sales)) OVER (PARTITION BY agent ORDER BY EOMONTH(date)
ROWS UNBOUNDED PRECEDING)
FROM AgentSales
GROUP BY EOMONTH(date), agent
) sales
GROUP BY Month;
db<>fiddle

PostgreSQL: first date cumulative score

I have this sample table:
id date score
11 1/1/2017 14:32 25.34
4 1/2/2017 12:14 34.34
25 1/2/2017 18:08 37.15
4 3/2/2017 23:42 47.24
4 4/2/2017 23:42 54.12
25 7/3/2017 22:07 65.21
11 9/3/2017 21:02 74.6
25 10/3/2017 5:15 11.3
4 10/3/2017 7:11 22.45
My aim is to calculates the first(!) date (YYYY-MM-DD) on which an id's cumulative score has reached 100 (>=). For that, I've written the following code:
SELECT date(date),id, score,
sum(score) over (partition by id order by date(date) rows unbounded preceding) as cumulative_score
FROM test_q1
GROUP BY id, date, score
Order by id, date
It returns:
date id score cumulative_score
1/1/2017 11 25.34 25.34
9/3/2017 11 74.6 99.94
1/2/2017 4 34.34 34.34
3/2/2017 4 47.24 81.58
4/2/2017 4 54.12 135.7
10/3/2017 4 22.45 158.15
1/2/2017 25 37.15 37.15
7/3/2017 25 65.21 102.36
10/3/2017 25 11.3 113.66
I tried to add either WHERE cumulative_score >= 100 or HAVING cumulative score >= 100, but it returns_
ERROR: column "cumulative_score" does not exist
LINE 4: WHERE cumulative_score >= 100
^
SQL state: 42703
Character: 206
Anyone knows how to solve this?
Thanks
What I expect is:
date id score cumulative_score
4/2/2017 4 54.12 135.7
7/3/2017 25 65.21 102.36
And the output just id and date.
Try this:
with cumulative_sum AS (
SELECT id,date,sum(score) over( partition by id order by date) as sum from test_q1
),
above_100_score_rank AS (
SELECT *, rank() over (partition by id order by sum) AS rank
FROM cumulative_sum where sum > 100
)
SELECT * FROM above_100_score_rank WHERE rank= 1;

Postgres: Nested records in a Recursive query in depth first manner

I am working on a simple comment system where a user can comment on other comments, thus creating a hierarchy. To get the comments in a hierarchical order I am using Common Table Expression in Postgres.
Below are the fields and the query used:
id
user_id
parent_comment_id
message
WITH RECURSIVE CommentCTE AS (
SELECT id, parent_comment_id, user_id
FROM comment
WHERE parent_comment_id is NULL
UNION ALL
SELECT child.id, child.parent_comment_id, child.user_id
FROM comment child
JOIN CommentCTE
ON child.parent_comment_id = CommentCTE.id
)
SELECT * FROM CommentCTE
The above query returns records in a breadth first manner:
id parent_comment_id user_id
10 null 30
9 null 30
11 9 30
14 10 31
15 10 31
12 11 30
13 12 31
But can it be modified to achieve something like below where records are returned together for that comment set, in a depth first manner? The point is to get the data in this way to make rendering on the Front-end smoother.
id parent_comment_id user_id
9 null 30
11 9 30
12 11 30
13 12 31
10 null 30
14 10 31
15 10 31
Generally I solve this problem by synthesising a "Path" column which can be sorted lexically, e.g. 0001:0003:0006:0009 is a child of 0001:0003:0006. Each child entry can be created by concatenating the path element to the parent's path. You don't have to return this column to the client, just use it for sorting.
id parent_comment_id user_id sort_key
9 null 30 0009
11 9 30 0009:0011
12 11 30 0009:0011:0012
13 12 31 0009:0011:0012:0013
10 null 30 0010
14 10 31 0010:0014
15 10 31 0010:0015
The path element doesn't have to be anything in particular provided it sorts lexically in the order you want children at that level to sort, and is unique at that level. Basing it on an auto-incrementing ID is fine.
Using a fixed length path element is not strictly speaking necessary but makes it easier to reason about.
WITH RECURSIVE CommentCTE AS (
SELECT id, parent_comment_id, user_id,
lpad(id::text, 4) sort_key
FROM comment
WHERE parent_comment_id is NULL
UNION ALL
SELECT child.id, child.parent_comment_id, child.user_id,
concat(CommentCTE.sort_key, ':', lpad(id::text, 4))
FROM comment child
JOIN CommentCTE
ON child.parent_comment_id = CommentCTE.id
)
SELECT * FROM CommentCTE order by sort_key

How do I select the min opendate from a list of duplicates?

I have 3 columns. SSN|AccountNumber|OpenDate
1 SSN may have multiple AccountNumbers
Each AccountNumber has a corresponding OpenDate
In my list I have many SSN's, each containing several account numbers which may have been opened on different days.
I want the results of my query to be SSN|earlest OpenDate|AccountNumber that corresponds with the earliest opendate.
I'm dealing with about 200,000 records.
EDIT: First I did
select SSN, min(OpenDate), AcctNumber from Table Group By SSN, AccountNumber
but that didn't quite give me the correct data.
The raw data gives me something like this:
SSN | AcctNumber | OpenDate
---------------------------
10 101 Jan
10 102 Feb
10 103 Mar
Where I got 10, Jan, and AccNumber 102 which is not the account number that is associated with Jan OpenDate After looking at others, I found that the account number I got was just one of the account numbers associated with that SSN rather than the one that corresponds with the min(OpenDate)
WITH CTE AS ( SELECT SSN, AcctNumber, OpenDate, ROW_NUM() OVER (PARTITION BY SSN ORDER BY OpenDate DESC) AS RN ) SELECT SSN, AcctNumber, OpenDate FROM CTE WHERE RN=1;
If your table is like this:
SSN | AcctNumber | OpenDate
---------------------------
10 101 April
10 101 May
10 102 April
20 201 June
20 201 July
Do you want your query to return this?
SSN | AcctNumber | OpenDate
---------------------------
10 101 April
10 102 April
20 201 June
Then you would use this query:
select ssn, min(OpenDate), acctNumber from tbl group by ssn, acctNumber
You can try this..
select SSN , AcctNumber, OpenDate
from (SELECT SSN , AcctNumber, OpenDate
, ROW_NUMBER() OVER ( PARTITION BY SSN, ORDER BY OpenDate ASC ) AS RN
FROM table) AS temp
WHERE temp.RN= 1

Subselect and Max

Alright, I've been trying to conceptualize this for a better part of the afternoon and still cannot figure out how to structure this subselect.
The data that I need to report are ages for a given student major grouped by the past 3 fiscal years. Each fiscal year has 3 semesters (summer, fall, spring). I need to have my query grouped on the fiscalyear and agerange fields and then count the distinct student id's.
I currently have this for my SQL statement:
Select COUNT(distinct StuID), AgeRange, FiscalYear
from tblStatic
where Campus like 'World%' and (enrl_act like 'REG%' or enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by FiscalYear, AgeRange
order by FiscalYear, AgeRange
So this is all fine and dandy except it doesn't match my headcount of students for the fiscalyear. The reason being, that people may cross over in the age ranges during the fiscal year and is adding them to my count twice.
How can I use a subselect to resolve this duplicate entry? The field I have been trying to get working is my semester field and using a max to find the max semester during a fiscalyear for a given student.
Data Sample:
Count AgeRange FiscalYear
3 1 to 19 09/10
20 20 to 23 09/10
60 24 to 29 09/10
96 30 to 39 09/10
34 40 to 49 09/10
14 50 to 59 09/10
3 60+ 09/10
2 1 to 19 10/11
24 20 to 23 10/11
73 24 to 29 10/11
109 30 to 39 10/11
43 40 to 49 10/11
11 50 to 59 10/11
2 60+ 10/11
1 1 to 19 11/12
17 20 to 23 11/12
75 24 to 29 11/12
123 30 to 39 11/12
44 40 to 49 11/12
14 50 to 59 11/12
2 60+ 11/12
Solution: (Just got this working and produced my headcounts that match what they are suppose to be)
Select COUNT(distinct S.StuID), AR.AgeRange, S.FiscalYear
from tblStatic S
INNER JOIN
( Select S.StuID, MIN(AgeRange) as AgeRange
From tblStatic S
Group By S.StuID) AR on S.StuID=AR.StuID
where Campus like 'World%' and (enrl_act like 'REG%' or
enrl_act like 'SCH%')
and StuMaj = 'LAWSC' and FiscalYear IN ('09/10', '10/11', '11/12')
group by S.FiscalYear, AR.AgeRange
order by S.FiscalYear, AR.AgeRange
Replace each student's age range with its maximum (or minimum, if you like) age range that fiscal year, then count them:
;
WITH sourceData AS (
SELECT
StudID,
MaxAgeRangeThisFiscalYear = MAX(AgeRange) OVER
(PARTITION BY StudID, FiscalYear),
FiscalYear
FROM tblStatic
WHERE Campus LIKE 'World%'
AND (enrl_act LIKE 'REG%' OR enrl_act LIKE 'SCH%')
AND StuMaj = 'LAWSC'
AND FiscalYear IN ('09/10', '10/11', '11/12')
)
SELECT
FiscalYear,
AgeRange = MaxAgeRangeThisFiscalYear,
Count = COUNT(DISTINCT StudID)
FROM sourceData
GROUP BY
FiscalYear,
MaxAgeRangeThisFiscalYear
ORDER BY
FiscalYear,
MaxAgeRangeThisFiscalYear