Truncate date by month with custom start day in postgres - postgresql

I am getting some statistics using a query like
SELECT date_trunc('month', created_at) AS time, count(DISTINCT "user_id") AS mau
FROM "session"
GROUP BY time
ORDER BY time;
Which is working fine if I want to get monthly active users for each calendar month. But I would like to shift the result to show last X moths starting from today instead of actual calendar months. How do I do? Can I add an offset in some way?
EDIT
As an example, I am currently getting results like
time | mau
2022-04-01 | 10
2022-05-01 | 20
2022-06-01 | 30
But I would like it to be something like (where 2022-06-07 is today)
time | mau
2022-04-07 | 10
2022-05-07 | 20
2022-06-07 | 30

Related

How to group data weekly for MTD and YTD values

I'm trying to get Weekly MTD and YTD values based of hourly data, but I'm having difficulties achieving this.
This is the data I'm working with:
max(Date) - Last day of the week
ISOWeek - Week in question
Value - The data I'm trying to sum
SELECT MAX(ISOWeek) AS [ISOWeek]
,MAX(Date) AS [Date]
,SUM(Value1) AS [MTD]
FROM Table1
GROUP BY ISOWeek, FORMAT(Date,'yyMM')
ORDER BY ISOWeek DESC
This is what that query returns:
ISOWeek Date MTD
29 2020-07-19 367529
28 2020-07-12 367138
27 2020-06-30 103290
27 2020-07-05 266755
26 2020-06-28 346588
25 2020-06-21 337168
This is what I would like to get:
ISOWeek Date MTD
29 2020-07-19 261515
28 2020-07-12 184104
27 2020-07-05 103414
26 2020-06-28 432114
25 2020-06-21 346588
The data has to be grouped by ISOWeek, if it's a week that dips into two months, I'm only interested in the MTD of the month in which the week ends. We have hundreds of values, so the plan is to create a MTD view and a YTD view. If I can get some help with the MTD one, I can get the other one done.
I'm nearly sure that what I'm after has to do with a WHERE clause and DATEADD but I'm not too sure what it should say.
Thank you for taking the time.
I don't really follow the rules you would like to apply, but per dates apply the formula to get weekstart/monthend or what you need. Place the date instead of the current date in the example.
Then group by the modified date.
You could build a date dimension where you have the required dates in some columns (first day of month, first day of week,etc.). This way you get a table with all the dates and the matching result for each.
It might be easier/faster to join it on the requried column.
declare #monthstart date,
#monthend date,
#weekstart date
;
select #monthstart=datefromparts(year(current_timestamp),month(current_timestamp),1);
select #monthend=EOMONTH(getdate(),0);
select #monthstart,#monthend,EOMONTH(getdate(),1) as next_month, EOMONTH(getdate(),-1) as previous_month;
select cast(DATEADD(d,1-DATEPART(WEEKDAY,current_timestamp),CURRENT_TIMESTAMP) as date) as Sunday,
cast(DATEADD(d,2-case when DATEPART(WEEKDAY,current_timestamp)=1 then 8 else DATEPART(WEEKDAY,current_timestamp) end,CURRENT_TIMESTAMP) as date) as Monday
;

ksql - Aggregating data based on last 1 year ( 365 days)

I would like to build sql query to have aggregation done for last 1 year. In this case, window size is last 365 days. Is it possible to do it using ksql ? Like the following query
SELECT regionid, regioncity, COUNT(*) FROM pageviews
WINDOW HOPPING (SIZE 30 SECONDS, ADVANCE BY 10 SECONDS)
GROUP BY regionid, regioncity ;
My question is : When we specify 'SIZE 30 seconds' or 'SIZE 3600 seconds', will it start the window from start of the stream or from latest ?
The start date and time of windows is based on the Unix epoch. It has no bearing on the timestamp of the messages on the source topic.
Windows greater than a day will start at the Unix epoch and increment from there. Because there's no YEAR size and you're using 365 days you'll find that the window starts on 20th December (because of leap years)
SELECT TIMESTAMPTOSTRING(WINDOWSTART(),'yyyy-MM-dd HH:mm:ss','Europe/London') AS WINDOW_START_TS,
CUSTOMER,
SUM(COST)
FROM SOURCE_DATA
WINDOW TUMBLING (SIZE 365 DAYS)
GROUP BY CUSTOMER
EMIT CHANGES ;
+-----------------------+----------+------------+
|WINDOW_START_TS |CUSTOMER |KSQL_COL_2 |
+-----------------------+----------+------------+
|2018-12-20 00:00:00 |A |4 |
|2019-12-20 00:00:00 |A |2 |
I've written this up here:
https://rmoff.net/2020/01/09/exploring-ksqldb-window-start-time/

How to calculate average weekly hours between 2 dates covering multiple weeks?

Postgresql 8.4.
I'm new to this concept so if people could teach me I'd appreciate it.
For Obamacare, anyone that works 30 hours per week or more must be offered the same healthcare as is offered to any other worker. We can't afford that so we have to limit work hours for temp and part-timers. This is affecting the whole country.
I need to calculate the hours worked (doesn't matter if overtime,
regular time, double time, etc) between two dates, say Jan 1, 2014,
and Nov 1, 2014 (Saturday) for each custom week (which beings on Sunday), not the week as defined by Postgresql (which begins on Monday).
Each of my custom work weeks begins on Sunday and ends on Saturday.
I don't know if I have to include weeks where
they did not work at all in the average, but let's assume I do. Zero hours that week would draw down the average.
Table name is 'employeetime', date field is 'employeetime.stopdate', hours worked per day is in the field 'employeetime.hours', employeeid field is 'employeetime.empid'.
I'd prefer to do this in one query per employee and I will execute the query once per employee as I loop through employees. If not I'm open to suggestions. But I'd like to understand the SQL presented in the answer.
Currently EXTRACT(week from '2014-01-01') calculates the start of the week as a Monday, so that doesn't work for me. Link here.
How would I do that without doing, say a separate query for each week, per person? We have 200 people to process.
Thank you.
I have set up a table to match your format:
select * from employeetime order by date;
id date hours
1 2014-11-06 10
1 2014-11-07 3
1 2014-11-08 5
1 2014-11-09 3
1 2014-11-10 5
You can get the week starting on Sunday by shifting. Note, here the 9th is a Sunday, so that is where we want the boundary.
select *, extract(week from date + '1 day'::interval) as week
from employeetime
order by week;
id date hours week
1 2014-11-07 3 45
1 2014-11-06 10 45
1 2014-11-08 5 45
1 2014-11-09 3 46
1 2014-11-10 5 46
And now the week shifts on Sunday rather than Monday. From here, the query to get hours by week/employee would be simple:
select id, sum(hours) as hours, extract(week from date + '1 day'::interval) as week
from employeetime
group by id, week
order by id, week;
id hours week
1 18 45
1 8 46

Calculating total working hours based on shifts

I would like to calculate how many hours each employee has worked for a certain time period, based on information from this table:
start employee_id
2014-08-10 18:10:00 5
2014-08-10 13:30:00 7
2014-08-10 09:00:00 7
2014-08-09 23:55:00 4
2014-08-09 16:23:00 12
2014-08-09 03:59:00 9
2014-08-08 20:05:00 7
2014-08-08 13:00:00 8
Each employee replaces another employee and that's where his work is done, so there are no empty slots.
The desired format of the result would be the following:
employee_id total_minutes_worked
I'm trying to think of the best way to achieve this, so any help will be appreciated!
You can get the total time as:
select employee_id, sum(stop - start)
from (
select start, lead(start) over (order by start) as stop, employee_id
from t
) as x
group by employee_id;
It remains to format the time, but I assume this it not what puzzles you
you should use 'GroupBy' clause to first create a group of the same employee id
than you should calculate the time by checking the start time of work and end time of work in each slot.
(NOTE - you should maintain the start time and end time both of the employee in each slot of there shift)

How can I use the PERIOD feature of Temporal Postgres / Postgres 9.2 on Heroku?

I am building an app that deals with times and durations, and intersections between given units of time and start/end times in a database, for example:
Database:
Row # | start - end
Row 1 | 1:00 - 4:00
Row 2 | 3:00 - 6:00
I want to be able to select sums of time between two certain times, or GROUP BY an INTERVAL such that the returned recordset will have one row for each sum during a given interval, something like:
SELECT length( (start, end) ) WHERE (start, end) INTERSECTS (2:00,4:00)
(in this case (start,end) is a PERIOD which is a new data type in Postgres Temporal and pg9.2)
which would return
INTERVAL 3 HOURS
since Row 1 has two hours between 2:00 - 4:00 and Row 2 has one hour during that time.
further, i'd like to be able to:
SELECT "lower bound of start", length( (start, end) ) GROUP BY INTERVAL 1 HOUR
which i would like to return:
1:00 | 1
2:00 | 1
3:00 | 2
4:00 | 2
5:00 | 1
which shows one row for each hour during the given interval and the sum of time at the beginning of that interval
I think that the PERIOD type can be used for this, which is in Postgres Temporal and Postgres 9.2. However, these are not available on Heroku at this time as far as I can tell - So,
How can I enable these sorts of maths on Heroku?
Try running:
heroku addons:add heroku-postgresql:dev --version=9.2
That should give you the 9.2 version which has range types supported. As this is currently very alpha any feedback would be greatly appreciated at dod-feedback#heroku.com