SQL Find Records that came for for first time with Value 5 with in last week - tsql

I have a table similar to this
personId Value createdTime
1991126 19.00 2018-08-05
1991126 16.00 2018-06-15
1991126 18.00 2018-08-06
1991206 32.00 2018-08-02
1991431 6.00 2018-08-06
1991431 7.00 2018-08-07
I am trying to find personId's that came since last week for first time with value 5
Here I should show 1991206,1991431 because 1991126, has already 16.00 on 2018-06-15
So the personID should not have history with of >5. So we have to compare previous records.
I tried
Select distinct personId,Value,createdTime
where value>=5 and createdtime>= Dateadd(Day,-7,Getdate())

First find all personId's that came since last week (having min(createdTime) >= Cast(Dateadd(Day,-7,Getdate()) as date)) and then check if they have a value higher than 5 (max(myvalue) > 5):
SELECT personID
FROM YourTable
GROUP BY personID
HAVING min(createdTime) >= Cast(Dateadd(Day,-7,Getdate()) AS date) AND max(Value) > 5

You have to first restrict the records to Value > 5, then apply a ranking, and in the end find the records with ranking 1 that are from since last week. Restricting the values and applying the ranking can be done in just one query, but criteria for the ranking must be applied in an outer query. Also, the criteria for the date must not be applied before ranking, because that would rank only the records from last week. I prefer to use common table expressions for nesting queries:
WITH
Greater5 (personID, Value, createdTime, rnk) AS (
SELECT personID, Value, createdTime,
RANK() OVER (PARTITION BY personID ORDER BY createdTime)
FROM YourTable
WHERE Value > 5
)
SELECT personID, Value, createdTime
FROM Greater5
WHERE rnk = 1
AND createdTime >= Dateadd(Day,-7,Getdate())

Related

postgresql - cumulative sum group by type and week (multiple columns)

I want to have a cumulative sum but my condition needs to group by multiple columns
table: customer
type
week
id
A
2022-01
abc123
B
2022-01
bcd123
B
2022-02
efg123
A
2022-02
klc123
B
2022-02
mad123
My query now:
SELECT week, type, SUM(cnt) OVER (ORDER BY week)
FROM (SELECT week, type, COUNT(*) AS cnt
FROM customer
GROUP BY week, type) t
ORDER BY 1 ASC
and the results:
week
type
Sum
2022-01
A
1
2022-01
B
1
2022-02
A
1
2022-02
B
1
issue is here, the last row of the result should be Sum=2, but for some reason (idk why) it follow the above.
Is it other ways to solve and calculate cumulative?
Thank you
SELECT week, type,
SUM(cnt) OVER (PARTITION BY week, type
ORDER BY week
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM (SELECT week, type, COUNT(*) AS cnt
FROM customer
GROUP BY week, type) t
ORDER BY 1 ASC
Sentence "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" you can use or not, because it's default behavior.

Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

How do I calculate cumulative sum for last 7 rows on a specific date in Postgresql?

I have a table that has these columns: user_id, day, valueA, valueB.
I'd like to calculate the running sum of last 7 rows of valueA and valueB for each user that has data on a specific day, for example '2020-08-01'.
(Note: Users only have a row when their valueA and valueB is not zero so there are some dates not in the table.)
I tried this query:
select user_id, day,
sum(valueA) over(partition by user_id rows between 7 preceding and current row) as last_7_A,
sum(valueB) over(partition by user_id rows between 7 preceding and current row) as last_7_B
from table where day='2020-08-01'
But this query doesn't calculate the running sum and returns me the valueA and valueB on date 2020-08-01
I could just calculate on each day and select the date I want but that'll be really inefficient. Any ideas how to add the date constraint and let it just calculate on just one row's last 7 running sum for each user?
As per question:
sum of last 7 rows for each user for a particular date, this might work
select user_id, sum(valueA) "sum of valueA", sum(valueB) "sum of valueB"
from sample_table
where id in (
select id
from sample_table
where day='2020-08-08'
order by id desc limit 7)
group by user_id;

Calculate past 3 month average for every past 3rd month

I am using SQL Server 2014. I have a table like this
create table revenue (id varchar(2), trasdate date, revenue int);
insert into revenue(id, trasdate, revenue)
values ('aa', '2018/09/01', 1234.5),
('aa' , '2018/08/04', 450),
('aa', '2018/07/03',500),
('aa', '2018/06/04',600),
('ab', '2018/09/01', 1234.5),
('ab' , '2018/08/04', 450),
('ab', '2018/07/03',500),
('ab', '2018/06/04',600),
('ab', '2018/05/03', 200),
('ab', '2018/04/02', 150),
('ab', '2018/03/01', 350),
('ab', '2018/02/05', 700),
('aa', '2018/01/07', 400)
;
I am preparing a SQL query to create a SSRS report. I want to calculate a past 3 month average for current and every past 3rd month with result like below. As we are in month of September right now. The result should show something like this:
**id Period Revenue_3Mon**
aa March-May 233
aa June-Aug 516
ab March-May 233
ab June-Aug 516
Though I can figure out about the Period column. I was mainly focussing on getting the Revenue_3Mon. So I initially tried with the below query after some googling. But this query throws an error as incorrect syntax near 'rows' and if I remove rows from the query then it throws an error as Incorrect syntax near the keyword 'between'. And incorrect syntax near i.
select i.id,i.mon,
avg([i.mon_revenue]) over (partition by i.id, i.mon order by [i.id],
[i.mon] rows between 3 preceding and 1 preceding row) as revenue_3mon --
-- using 3 preceding and 1 preceding row you exclude the current row
from (select a.id, month(a.trasdate) as mon,
sum(a.revenue) as mon_revenue
from revenue a
group by a.id, month(a.trasdate)) i
group by i.id, i.mon
order by i.id,i.mon;
After few efforts, I gave up on this query and came up with new solution which was a bit close to my expectation (after lots of trial and errors).
Declare #count as int;
declare #max as int;
set #count = 4
declare #temp as table (id varchar(2), monthoftrasdate int, revenue int,
[3monavg] int);
SET #MAX = (SELECT distinct MAX(a.ROWNUM) FROM (SELECT id, month(trasdate)
as mon, SUM(revenue) TotalRevenue,
-- sum(revenue) as mon_revenue,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY MONTH(TRASDATE)) AS ROWNUM
FROM revenue
GROUP BY ID, MONTH(TRASDATE)
) A GROUP BY A.ID);
while (#count <= #max )
begin
WITH CTE AS (
SELECT id, month(trasdate) as mon, SUM(revenue) TotalRevenue,
-- sum(revenue) as mon_revenue,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY MONTH(TRASDATE)) AS
ROWNUM
FROM revenue
GROUP BY ID, MONTH(TRASDATE)
)
insert into #temp
SELECT A.ID,A.MON, a.TotalRevenue
,( SELECT avg(b.TotalRevenue) as avgrev
FROM CTE B
WHERE B.ROWNUM BETWEEN A.ROWNUM-3 AND A.ROWNUM-1
AND A.ID = B.ID --AND A.mon = B.mon
--and b.ROWNUM < a.ROWNUM
and (a.mon > 3 and a.ROWNUM > 3)
GROUP BY B.id
) AS REVENUE_3MON
FROM CTE A
set #count = #count + 1
end
select distinct a.* from #temp a
The reason I had to use 'distinct' is because the query was showing duplicate records for every id and every month. So far the result shows like below
id MonthofTrasdate Revenue 3MonAvg
aa 1 400 NULL
aa 2 700 NULL
aa 3 350 NULL
aa 4 150 483
aa 5 200 400
aa 6 600 233
aa 7 500 316
aa 8 450 433
aa 9 1234 516
ab 1 400 NULL
ab 2 700 NULL
ab 3 350 NULL
ab 4 150 483
ab 5 200 400
ab 6 600 233
ab 7 500 316
ab 8 450 433
ab 9 1234 516
This pulls out past 3 month average for every month. But i will just manipulate the rest on SSRS the way i want it.
As currently my table has no data for previous year. This works for me showing the appropriate result for next couple of months for now. But my concern is when I have to show my boss for next year Jan, Feb and March then it should be able to pull also for these months as well like Oct-Dec (Previous year), Nov-Jan and Dec - Feb. I am struggling to figure out the proper way to put this in my query.
Can you please help me out with this query? And also let me know what is wrong with my former query.
Problems with your first attempt:
You enclosed some of the aliases and column names in square brackets like [i.mon_revenue]. There is no need for square brackets, but if you want to use them, you have to break them up at the dot: [i].[mon_revenue].
In your window function expression, there is one row too many (in the end).
Window functions are applied at the very end (after the rest of the respective query), so you also have to include i.mon_revenue in your GROUP BY clause of the outer query.
Knowing that the inner query will produce one row per id and mon, there will never be preceding rows in an id-mon partition. Therefore, you must not partition by both, but only by id.
To simplify the query after resolving the issues: ordering by a partition column generally makes no sense, and since - as already mentioned - the inner query returns unique id-mon combinations, you don't have to group by these in the outer query. Looking at that query, we see that the outer query just directly selects and uses the values from the inner query, which makes a separation in two queries unneccessary. So, in fact, you wanted to perform the following query, which will produce the rolling 3-month average (I added the monthly TotalRevenue as well):
SELECT id, MONTH(trasdate) AS mon, SUM(revenue) AS TotalRevenue,
AVG(SUM(revenue)) OVER (PARTITION BY id ORDER BY MONTH(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) AS revenue_3mon
FROM revenue
GROUP BY id, MONTH(trasdate)
ORDER BY id, MONTH(trasdate);
Suggestions on your second attempt:
When calculating the #MAX value, you rely on the fact that each id has revenues for the same number of months. Are you sure?
The code inside the WHILE loop does not depend on #count, so it will add the same data into the #temp table multiple times, which is probably the reason why you thought you needed a DISTINCT. Therfore: No need for the variables, no need for a loop and a #temp, no need for DISTINCT.
The conditions A.mon > 3 and A.rownum > 3 are redundant with your current data. In general, I guess, you don't want to explicitly excluse the months from January to March, so A.mon > 3 should be removed. A.rownum > 3 could be removed, too, unless you really don't want to see a 3-month average when there are only 2 preceding months or less.
As the subquery for the average is restricted to only one id, there's no need for a GROUP BY.
Since the ROW_NUMBER function doesn't care about gaps in the months, I suggest to use a different numbering function, for example DATEDIFF(month, MAX(trasdate), GETDATE()) AS mnum. Of course, the comparison in the WHERE clause of the subquery then has to be changed to B.mnum BETWEEN A.mnum+1 AND A.mnum+3.
So, your second attempt can be reduced to this, which will produce the same result as the above, at least with your sample data, where no gaps in the months exist:
WITH CTE AS (
SELECT id, MONTH(trasdate) AS mon, SUM(revenue) AS TotalRevenue,
DATEDIFF(month, MAX(trasdate), GETDATE()) AS mnum
FROM revenue
GROUP BY id, MONTH(trasdate)
)
SELECT id, mon, TotalRevenue
, (SELECT AVG(B.TotalRevenue)
FROM CTE B
WHERE B.mnum BETWEEN A.mnum+1 AND A.mnum+3
AND A.id = B.id
) AS revenue_3mon
FROM CTE A
ORDER BY id, mnum DESC;
Now, guess what, an expression like my mnum using DATEDIFF increases by one every month as we move to the past, regardless of a change of years, so this might be useful for grouping as well, whether you want to (or can?) use Window functions or not:
With OVER()
SELECT id, MONTH(MIN(trasdate)) AS mon, YEAR(MIN(trasdate)) AS yr, SUM(revenue) AS TotalRevenue,
AVG(SUM(revenue)) OVER (PARTITION BY id ORDER BY MIN(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) AS revenue_3mon
FROM revenue
GROUP BY id, DATEDIFF(month, trasdate, GETDATE())
ORDER BY id, DATEDIFF(month, trasdate, GETDATE()) DESC;
Without OVER()
WITH CTE AS (
SELECT id, MIN(trasdate) AS min_dt, SUM(revenue) AS TotalRevenue,
DATEDIFF(month, trasdate, GETDATE()) AS mnum
FROM revenue
GROUP BY id, DATEDIFF(month, trasdate, GETDATE())
)
SELECT id, MONTH(min_dt) AS mon, YEAR(min_dt) AS yr, TotalRevenue
, (SELECT AVG(B.TotalRevenue)
FROM CTE B
WHERE B.mnum BETWEEN A.mnum+1 AND A.mnum+3
AND A.id = B.id
) AS revenue_3mon
FROM CTE A
ORDER BY id, mnum DESC;
Both queries allow for retrieving the minimum and maximum date for each period (including month and year).
If you instead wanted what you originally posted under The result should show something like this (just grouping by previous 3-months intervals), you just would have to group your original revenue table by id and (DATEDIFF(month, trasdate, GETDATE())-1)/3 (filtering WHERE DATEDIFF(month, trasdate, GETDATE()) > 0). If so, this kind of grouping and aggregation could, of course, be done also by the Report Server.
I think this should do what you want:
select r.*,
avg(r.mon_revenue) over (partition by r.id
order by r.mon_min
rows between 3 preceding and 1 preceding row
) as revenue_3mon
-- using 3 preceding and 1 preceding row you exclude the current row
from (select r.id, month(r.trasdate) as mon,
min(r.trasdate) as mon_min,
sum(r.revenue) as mon_revenue
from revenue r
group by r.id, year(r.trasdate), month(r.trasdate)
) 4
order by r.id, r.mon, r.mon_min;
Notes:
I fixed the code so it recognizes years as well as dates.
The expression [i.mon_revenue] is not a valid column reference (in your case). You have no column with the name "i.mon_revenue" (with the . in the name).
I changed the column alias to r to match the table.
I added a date column for each month to make it easier to express the ordering.
The outer group by is not necessary.
There are several syntax errors in your code. This should give you what you need. The inner query is the important bit but hopefully this will be enough to get you on your way.
I switch our the temp table for variable and changed the revenue column to not be INT as you have decimal values in there but other than that your original sample table is unchanged
DECLARE #revenue table (id varchar(2), trasdate date, revenue float)
insert into #revenue(id, trasdate, revenue)
values ('aa', '2018/09/01', 1234.5),
('aa' , '2018/08/04', 450),
('aa', '2018/07/03',500),
('aa', '2018/06/04',600),
('ab', '2018/09/01', 1234.5),
('ab' , '2018/08/04', 450),
('ab', '2018/07/03',500),
('ab', '2018/06/04',600),
('ab', '2018/05/03', 200),
('ab', '2018/04/02', 150),
('ab', '2018/03/01', 350),
('ab', '2018/02/05', 700),
('aa', '2018/01/07', 400)
SELECT
*
FROM
(
SELECT
*
, MONTH(trasdate) as MonthNumber
, AVG(revenue) OVER (PARTITION BY id
ORDER BY
id
, MONTH(trasdate) ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) as ThreeMonthAvg
FROM #revenue
) a
WHERE MONTH(GETDATE()) - MonthNumber IN (0, 3, 6, 9)
This gives the following results
aa 2018-06-04 600 6 400
aa 2018-09-01 1234.5 9 516.666666666667
ab 2018-03-01 350 3 700
ab 2018-06-04 600 6 233.333333333333
ab 2018-09-01 1234.5 9 516.666666666667

Count data per day in a specific month in Postgresql

I have a table with a create date called created_at and a delete date called delete_at for each record. If the record was deleted, the field save that date; it's a logic delete.
I need to count the active records in a specific month. To understand what is an active record for me, let's see an example:
For this example we'll use this hypothetical record:
id | created_at | deleted_at
1 | 23-01-2014 | 05-06-2014
This record is active for every days between its creation date and delete date. Including that last. So if I need count the active record for March, in this case, this record must be counted in every days of that month.
I have a query (really easy to do) that show the actives records for a specific month, but my principal problem is how to count that actives for each day in that month.
SELECT
date_trunc('day', created_at) AS dia_creacion,
date_trunc('day', deleted_at) AS dia_eliminacion
FROM
myTable
WHERE
created_at < TO_DATE('01-04-2014', 'DD-MM-YYYY')
AND (deleted_at IS NULL OR deleted_at >= TO_DATE('01-03-2014', 'DD-MM-YYYY'))
Here you are:
select
TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i,
count( case (TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i) between created_at and coalesce(deleted_at, TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i)
when true then 1
else null
end)
from generate_series(0, TO_DATE('01-04-2014', 'DD-MM-YYYY') - TO_DATE('01-03-2014', 'DD-MM-YYYY')) as g(i)
left join myTable on true
group by 1
order by 1;
You can add more specific condition for joining only relevant records from myTable, but even without it gives you idea how to achieve counting as desired.