Monthly count of objects with start and end date using TSQL - tsql

I would like to create a bar chart displaying the number of objects that were a available on a monthly base. All rows have a start and end date. I know how to do the count for a single month:
SELECT COUNT(*) As NumberOfItems
FROM Items
WHERE DATEPART(MONTH, Items.StartDate) <= #monthNumber
AND DATEPART(MONTH, Items.EndDate) >= #monthNumber
Now I would like do create the SQL to get the month number and the number of items using a single SELECT statement.
Is there any elegant way of accomplishing this? I am aware I have to take the year number into account.

Assuming Sql Server 2005 or newer.
CTE part will return month numbers spanning years between #startDate and #endDate. Main body joins month numbers with items performing the same conversion on Items.StartDate and Items.EndDate.
; with months (month) as (
select datediff (m, 0, #startDate)
union all
select month + 1
from months
where month < datediff (m, 0, #endDate)
)
select year (Items.StartDate) Year,
month (Items.StartDate) Month,
count (*) NumberOfItems
from months
inner join Items
on datediff (m, 0, Items.StartDate) <= months.month
and datediff (m, 0, Items.EndDate) >= months.month
group by
year (Items.StartDate),
month (Items.StartDate)
Note: if you intend to span more than hundred months you will need option (maxrecursion 0) at the end of query.

Related

How to get financial year wise periods for a given date range

My financial year start from 01-Jul to 30-Jun every year.
I want to find out all financial year wise periods for a given date range.
Let's say, The date range is From_Date:16-Jun-2021 To_Date 31-Aug-2022. Then my output should be like
Start_Date, End_date
16-Jun-2021, 30-Jun-2021
01-Jul-2021, 30-Jun-2022
01-jul-2022, 31-Aug-2022
Please help me query. First record Start_Date must start from From_Date and Last record End_Date must end at To_Date
This should work for the current century.
with t(fys, fye) as
(
select (y + interval '6 months')::date,
(y + interval '1 year 6 months - 1 day')::date
from generate_series ('2000-01-01'::date, '2100-01-01', interval '1 year') y
),
periods (period_start, period_end) as
(
select
case when fys < '16-Jun-2021'::date then '16-Jun-2021'::date else fys end,
case when fye > '31-Aug-2022'::date then '31-Aug-2022'::date else fye end
from t
)
select * from periods where period_start < period_end;
period_start
period_end
2021-06-16
2021-06-30
2021-07-01
2022-06-30
2022-07-01
2022-08-31
Looks well as a parameterized query too with '16-Jun-2021' and '31-Aug-2022' replaced by parameter placeholders.
You want to create multiple records from one record (your date range). To accomplish this, you will need some kind of helper table.
In this example I created that helper table using GENERATE_SERIES and use it to join it to your date range, with some logic to get the dates you want.
dbfiddle
--Generate a range of fiscal years
WITH FISCAL_YEARS AS (
SELECT
CONCAT(SEQUENCE.YEAR, '-07-01')::DATE AS FISCAL_START,
CONCAT(SEQUENCE.YEAR + 1, '-06-30')::DATE AS FISCAL_END
FROM GENERATE_SERIES(2000, 2030) AS SEQUENCE (YEAR)
),
--Your date range
DATE_RANGE AS (
SELECT
'2021-06-16'::DATE AS RANGE_START,
'2022-08-31'::DATE AS RANGE_END
)
SELECT
--Case statement in case the range_start is later
--than the start of the fiscal year
CASE
WHEN RANGE_START > FISCAL_START
THEN RANGE_START
ELSE FISCAL_START
END AS START_DATE,
--Case statement in case the range_end is earlier
--than the end of the fiscal year
CASE
WHEN RANGE_END < FISCAL_END
THEN RANGE_END
ELSE FISCAL_END
END AS END_DATE
FROM FISCAL_YEARS
JOIN DATE_RANGE
--Join to get all relevant fiscal years
ON FISCAL_YEARS.FISCAL_START BETWEEN DATE_RANGE.RANGE_START AND DATE_RANGE.RANGE_END
OR FISCAL_YEARS.FISCAL_END BETWEEN DATE_RANGE.RANGE_START AND DATE_RANGE.RANGE_END

Using 'over' function results in column "table.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm currently writing an application which shows the growth of the total number of events in my table over time, I currently have the following query to do this:
query = session.query(
count(Event.id).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
This results in the following output:
Count
Year
Month
100
2021
1
50
2021
2
75
2021
3
While this is okay on it's own, I want it to display the total number over time, so not just the number of events that month, so the desired outpout should be:
Count
Year
Month
100
2021
1
150
2021
2
225
2021
3
I read on various places I should use a window function using SqlAlchemy's over function, however I can't seem to wrap my head around it and every time I try using it I get the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError) column "event.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT count(event.id) OVER (PARTITION BY event.date ORDER...
^
[SQL: SELECT count(event.id) OVER (PARTITION BY event.date ORDER BY EXTRACT(year FROM event.date), EXTRACT(month FROM event.date)) AS count, EXTRACT(year FROM event.date) AS year, EXTRACT(month FROM event.date) AS month
FROM event
WHERE event.date IS NOT NULL GROUP BY year, month]
This is the query I used:
session.query(
count(Event.id).over(
order_by=(
extract('year', Event.date),
extract('month', Event.date)
),
partition_by=Event.date
).label('count'),
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month')
).filter(
Event.date.isnot(None)
).group_by('year', 'month').all()
Could someone show me what I'm doing wrong? I've been searching for hours but can't figure out how to get the desired output as adding event.id in the group by would stop my rows from getting grouped by month and year
The final query I ended up using:
query = session.query(
extract('year', Event.date).label('year'),
extract('month', Event.date).label('month'),
func.sum(func.count(Event.id)).over(order_by=(
extract('year', Event.date),
extract('month', Event.date)
)).label('count'),
).filter(
Event.date.isnot(None)
).group_by('year', 'month')
I'm not 100% sure what you want, but I'm assuming you want the number of events up to that month for each month. You're going to first need to calculate the # of events per month and also sum them with the postgresql window function.
You can do that with in a single select statement:
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, SUM(COUNT(events.id)) OVER(ORDER BY extract(year FROM events.date), extract(month FROM events.date)) AS total_so_far
FROM events
GROUP BY 1,2
but it might be easier to think about if you split it into two:
SELECT year, month, SUM(events_count) OVER(ORDER BY year, month)
FROM (
SELECT extract(year FROM events.date) AS year
, extract(month FROM events.date) AS month
, COUNT(events.id) AS events_count
FROM events
GROUP BY 1,2
)
but not sure how to do that in SqlAlchemy

calculating last 5 years from current date in hive

I need to calculate some count based on the given time frame
I need to consider the dates between current date and last 5 years
select count(*) from table where (year(current_date) -year('2015-12-01')) < 5 ;
above query will give counts for last 5 years however it will consider only year part but I need exact counts considering days so if I write
select count(*) from table where datediff(current_date,final_dt) <= 1825 ;
it won't consider the leap years if any in the last 5 years
so Is there any function in hive to calculate exact difference between two dates consider scenarios like leap years?
Use add_months function (assuming the dates should go back to 2013-05-25 with the current date being 2018-05-25).
select count(*)
from table
where final_dt >= add_months(current_date,-60) and final_dt <= current_date
I think you are trying to calculate count(*) all records between current_date and a date which is 5 year in the past from current_date, in this case, you can do something like this:
SELECT count(*) FROM table_1 WHERE date_column BETWEEN current_date AND to_date(CONCAT(YEAR(current_date) - 5, '-', MONTH(current_date), '-', DAY(current_date)));
And SELECT datediff( current_date() ,to_date(CONCAT(YEAR(current_date) - 5, '-', MONTH(current_date), '-', DAY(current_date))));
gives you 1826 (considering the fact that 2016 is a leap year).

Counting by Week in Hive

I'm trying to produce a fully refreshed set of numbers each week, pulling from a table in hive. Right now I using this method:
SELECT
COUNT(DISTINCT case when timestamp between TO_DATE("2016-01-28") and TO_DATE("2016-01-30") then userid end) as week_1,
COUNT(DISTINCT case when timestamp between TO_DATE("2016-01-28") and TO_DATE("2016-02-06") then userid end) as week_2
FROM Data;
I'm trying to get something more along the lines of:
SELECT
Month(timestamp), Week(timestamp), COUNT (DISTINCT userid)
FROM Data
Group By Month, Week
But my week runs Sunday to Saturday. Is there a smarter way to be doing this that works in HIVE?
Solution found:
You can simply create your own formula instead of going with pre-defined function for "week of the year" Advantage: you will be able to take any set of 7 days for a week.
In your case since you want the week should start from Sunday-Saturday we will just need the first date of sunday in a year
eg- In 2016, First Sunday is on '2016-01-03' which is 3rd of Jan'16 --assumption considering the timestamp column in the format 'yyyy-mm-dd'
SELECT
count(distinct UserId), lower(datediff(timestamp,'2016-01-03') / 7) + 1 as week_of_the_year
FROM table.data
where timestamp>='2016-01-03'
group by lower(datediff(timestamp,'2016-01-03') / 7) + 1;
I see that you need the data to be grouped by week. you can just do this :
SELECT weekofyear(to_date(timestamp)), COUNT (DISTINCT userid) FROM Data Group By weekofyear(to_date(timestamp))

Get First and Last Day of Any Year

I'm currently trying to get the first and last day of any year. I have data from 1950 and I want to get the first day of the year in the dataset to the last day of the year in the dataset (note that the last day of the year might not be December 31rst and same with the first day of the year).
Initially I thought I could use a CTE and call DATEPART with the day of the year selection, but this wouldn't partition appropriately. I also tried a CTE self-join, but since the last day or first day of the year might be different, this also yields inaccurate results.
For instance, using the below actually generates some MINs in the MAX and vice versa, though in theory it should only grab the MAX date for the year and the MIN date for the year:
;WITH CT AS(
SELECT Points
, Date
, DATEPART(DY,Date) DA
FROM Table
WHERE DATEPART(DY,Date) BETWEEN 363 AND 366
OR DATEPART(DY,Date) BETWEEN 1 AND 3
)
SELECT MIN(c.Date) MinYear
, MAX(c.Date) MaxYear
FROM CT c
GROUP BY YEAR(c.Date)
You want something like this for the first day of the year:
dateadd(year, datediff(year,0, c.Date), 0)
and this for the last day of the year:
--first day of next year -1
dateadd(day, -1, dateadd(year, datediff(year,0, c.Date) + 1, 0)
try this
for getting first day ,last day of the year && firstofthe next_year
SELECT
DATEADD(yy, DATEDIFF(yy,0,getdate()), 0) AS Start_Of_Year,
dateadd(yy, datediff(yy,-1, getdate()), -1) AS Last_Day_Of_Year,
DATEADD(yy, DATEDIFF(yy,0,getdate()) + 1, 0) AS FirstOf_the_NextYear
so putting this in your query
;WITH CT AS(
SELECT Points
, Date
, DATEPART(DY,Date) DA
FROM Table
WHERE DATEPART(DY,Date) BETWEEN
DATEPART(day,DATEADD(yy, DATEDIFF(yy,0,getdate()), 0)) AND
DATEPART(day,dateadd(yy, datediff(yy,-1, getdate()), -1))
)
SELECT MIN(c.Date) MinYear
, MAX(c.Date) MaxYear
FROM CT c
GROUP BY YEAR(c.Date)
I should refrain from developing in the evenings because I solved it, and it's actually quite simple:
SELECT MIN(Date)
, MAX(Date)
FROM Table
GROUP BY YEAR(Date)
I can put these values into a CTE and then JOIN on the dates and get what I need:
;WITH CT AS(
SELECT MIN(Date) Mi
, MAX(Date) Ma
FROM Table
GROUP BY YEAR(Date)
)
SELECT c.Mi
, m.Points
, c.Ma
, f.Points
FROM CT c
INNER JOIN Table m ON c.Mi = m.Date
INNER JOIN Table f ON c.Ma = f.Date