Postgresql - 24 hour rolling window - postgresql

I am currently using Metabase to put together a live dashboard of some internal company metrics and one of the things I am trying to query is a 24 hour rolling window for transaction on our mobile app. Metabase has a useful visualization tool called "Smart Number" which allows you to compare changes in values over a defined time period. Like this.
I am having trouble writing a query that outputs data in 24 hour intervals so I can compare the past 24 hours to the 24 hours before that. I have tried using the date_trunc function to divide the transactions by hour and then possibly limit the results to the last 24 but it doesn't print out hours that don't have transactions. I also tried using the filter function as seen in the code below but the data needs to to be transposed for "Smart Numbers" to work. Does anyone have any suggestions as to how I should approach this problem?
Example of one of my approaches:
SELECT
(DATE_TRUNC('hour', (reservations.created_at::timestamptz))) as hour,
SUM(reservations.covers) as total_covers
FROM reservations
JOIN restaurants on restaurants.id = reservations.restaurant_id
WHERE reservations.origin = 'mobile'
and restaurants.relationship_type in ('listing_only', 'difficult', 'ipad')
GROUP BY hour
ORDER BY hour desc
Which outputs something like this:
hour total_covers
"2019-02-19 15:00:00+00" 4
"2019-02-19 13:00:00+00" 15
"2019-02-19 12:00:00+00" 4
"2019-02-19 11:00:00+00" 4
"2019-02-19 10:00:00+00" 26
"2019-02-19 09:00:00+00" 5
"2019-02-19 08:00:00+00" 8
"2019-02-19 07:00:00+00" 12
"2019-02-19 03:00:00+00" 2
I would like to get something like this:
Time_Interval Total_Covers
24 Hours 389
48 Hours 254
72 hours 459
96 Hours 239

This query uses the date function to group results.
SELECT
DATE(reservations.created_at) AS day,
SUM(reservations.covers) AS total_covers
FROM reservations
WHERE
reservations.origin = 'mobile' AND
restaurants.relationship_type IN ('listing_only', 'difficult', 'ipad')
GROUP BY day
ORDER BY day DESC
This query calculates the total covers from the time of the query, and groups them by the number of days ago.
SELECT
CEIL(EXTRACT(EPOCH FROM NOW() - reservations.created_at) / 86400) as days_ago
SUM(reservations.covers) as total_covers,
FROM reservations
WHERE
reservations.origin = 'mobile' AND
restaurants.relationship_type IN ('listing_only', 'difficult', 'ipad')
GROUP BY days_ago
ORDER BY days_ago;
See it in action on rextester.

Here is an example of using a generated series for the from, and then left-joining the data you want to count. Idea being you want every hour to appear in the result, and every piece of data to be counted, but it is optional.
Note I just checked the query compiled; I don't have a mock schema on hand to test this.
SELECT
hours.x as hour,
coalesce(SUM(reservations.covers), 0) as total_covers
FROM (
select * from generate_series(date_trunc('hour', (now() - INTERVAL '24 hours')), now(), '1 hour')
) hours(x)
LEFT JOIN reservations on hours.x = (DATE_TRUNC('hour', (reservations.created_at::timestamptz)))
LEFT JOIN restaurants on restaurants.id = reservations.restaurant_id
WHERE reservations.origin = 'mobile'
and restaurants.relationship_type in ('listing_only', 'difficult', 'ipad')
GROUP BY hours.x
ORDER BY hours.x desc

Related

Count distinct dates between two timestamps

I want to count %days when a user was active. A query like this
select
a.id,
a.created_at,
CURRENT_DATE - a.created_at::date as days_since_registration,
NOW() as current_d
from public.accounts a where a.id = 3257
returns
id created_at days_since_registration current_d tot_active
3257 2022-04-01 22:59:00.000 1 2022-04-02 12:00:0.000 +0400 2
The person registered less than 24 hours ago (less than a day ago), but there are two distinct dates between the registration and now. Hence, if a user was active one hour before midnight and one hour after midnight, he is two days active in less than a day (active 200% of days)
What is the right way to count distinct dates and get 2 for a user, who registered at 23:00:00 two hours ago?
WITH cte as (
SELECT 42 as userID,'2022-04-01 23:00:00' as d
union
SELECT 42,'2022-04-02 01:00:00' as d
)
SELECT
userID,
count(d),
max(d)::date-min(d)::date+1 as NrOfDays,
count(d)/(max(d)::date-min(d)::date+1) *100 as PercentageOnline
FROM cte
GROUP BY userID;
output:
userid
count
nrofdays
percentageonline
42
2
2
100

PostgreSQL select statement to return rows after where condition

I am working on a query to return the next 7 days worth of data every time an event happens indicated by "where event = 1". The goal is to then group all the data by the user id and perform aggregate functions on this data after the event happens - the event is encoded as binary [0, 1].
So far, I have been attempting to use nested select statements to structure the data how I would like to have it, but using the window functions is starting to restrict me. I am now thinking a self join could be more appropriate but need help in constructing such a query.
The query currently first creates daily aggregate values grouped by user and date (3rd level nested select). Then, the 2nd level sums the data "value_x" to obtain an aggregate value grouped by the user. Then, the 1st level nested select statement uses the lead function to grab the next rows value over and partitioned by each user which acts as selecting the next day's value when event = 1. Lastly, the select statement uses an aggregate function to calculate the average "sum_next_day_value_after_event" grouped by user and where event = 1. Put together, where event = 1, the query returns the avg(value_x) of the next row's total value_x.
However, this doesn't follow my time rule; "where event = 1", return the next 7 days worth of data after the event happens. If there is not 7 days worth of data, then return whatever data is <= 7 days. Yes, I currently only have one lead with the offset as 1, but you could just put 6 more of these functions to grab the next 6 rows. But, the lead function currently just grabs the next row without regard to date. So theoretically, the next row's "value_x" could actually be 15 days from where "event = 1". Also, as can be seen below in the data table, a user may have more than one row per day.
Here is the following query I have so far:
select
f.user_id
avg(f.sum_next_day_value_after_event) as sum_next_day_values
from (
select
bld.user_id,
lead(bld.value_x, 1) over(partition by bld.user_id order by bld.daily) as sum_next_day_value_after_event
from (
select
l.user_id,
l.daily,
sum(l.value_x) as sum_daily_value_x
from (
select
user_id, value_x, date_part('day', day_ts) as daily
from table_1
group by date_part('day', day_ts), user_id, value_x) l
group by l.user_id, l.day_ts
order by l.user_id) bld) f
group by f.user_id
Below is a snippet of the data from table_1:
user_id
day_ts
value_x
event
50
4/2/21 07:37
25
0
50
4/2/21 07:42
45
0
50
4/2/21 09:14
67
1
50
4/5/21 10:09
8
0
50
4/5/21 10:24
75
0
50
4/8/21 11:08
34
0
50
4/15/21 13:09
32
1
50
4/16/21 14:23
12
0
50
4/29/21 14:34
90
0
55
4/4/21 15:31
12
0
55
4/5/21 15:23
34
0
55
4/17/21 18:58
32
1
55
4/17/21 19:00
66
1
55
4/18/21 19:57
54
0
55
4/23/21 20:02
34
0
55
4/29/21 20:39
57
0
55
4/30/21 21:46
43
0
Technical details:
PostgreSQL, supported by EDB, version = 14.1
pgAdmin4, version 5.7
Thanks for the help!
"The query currently first creates daily aggregate values"
I don't see any aggregate function in your first query, so that the GROUP BY clause is useless.
select
user_id, value_x, date_part('day', day_ts) as daily
from table_1
group by date_part('day', day_ts), user_id, value_x
could be simplified as
select
user_id, value_x, date_part('day', day_ts) as daily
from table_1
which in turn provides no real added value, so this first query could be removed and the second query would become :
select user_id
, date_part('day', day_ts) as daily
, sum(value_x) as sum_daily_value_x
from table_1
group by user_id, date_part('day', day_ts)
The order by user_id clause can also be removed at this step.
Now if you want to calculate the average value of the sum_daily_value_x in the period of 7 days after the event (I'm referring to the avg() function in your top query), you can use avg() as a window function that you can restrict to the period of 7 days after the event :
select f.user_id
, avg(f.sum_daily_value_x) over (order by f.daily range between current row and '7 days' following) as sum_next_day_values
from (
select user_id
, date_part('day', day_ts) as daily
, sum(value_x) as sum_daily_value_x
from table_1
group by user_id, date_part('day', day_ts)
) AS f
group by f.user_id
The partition by f.user_id clause in the window function is useless because the rows have already been grouped by f.user_id before the window function is applied.
You can replace the avg() window function by any other one, for instance sum() which could better fit with the alias sum_next_day_values

postgresql query for hour minutes and seconds

Hi I am having a Postgresql query like below to calculate DateTime difference for {1} and {2} in minutes.
CAST(ROUND(EXTRACT(EPOCH from (({2}::timestamp) - ({1}::timestamp)))/60) AS INT)
I want to calculate the difference in hours, minutes and seconds displayed like:
3 hrs 31 minutes 42 secs
What manipulation do I need for displaying like above?
SELECT to_char((col1 - col0), 'HH24 hrs MI "minutes" SS "seconds"') FROM T1;
Here is a sqlfiddle : link
The to_char function takes an interval (an interval is the time span between two timestamps, and subtracting timestamps gives you an interval). It then takes a formatting, and you can apply pretty much what you want.
Formatting functions in PostgreSQL
Try use this sql:
SELECT to_char(column2 - column1, 'DD" days "HH24" hours "MI" minutes "SS" seconds"');
The subtraction of two timestamp or timestamptz values produces an interval. (While subtracting two date values produces an integer!)
Details about date/time types in the manual.
The default text representation of an interval may be sufficient:
SELECT timestamp '2017-1-6 12:34:56' - timestamp '2017-1-1 0:0';
Result is an interval, displayed as:
5 days 12:34:56
If you need the format in the question, precisely, you need to specify how to deal with intervals >= 24 hours. Add 'days'? Or just increase hours accordingly?
#Nobody provided how to use to_char(). But add days one way or the other:
SELECT to_char(ts_col2 - ts_col1, 'DD" days "HH24" hours "MI" minutes "SS" seconds"');
Result:
05 days 12 hours 34 minutes 56 seconds
'days' covers the rest. There are no greater time units in the result by default.
Simple
SELECT
EXTRACT(year FROM LOCALTIMESTAMP(0) - yourFieldTime)||' year '||
EXTRACT(month FROM LOCALTIMESTAMP(0) - yourFieldTime)||' month '||
EXTRACT(day FROM LOCALTIMESTAMP(0) - yourFieldTime)||' day '||
EXTRACT(hour FROM LOCALTIMESTAMP(0) - yourFieldTime)||' hour '||
EXTRACT(minute FROM LOCALTIMESTAMP(0) - yourFieldTime)||' minute '||
EXTRACT(second FROM LOCALTIMESTAMP(0) - yourFieldTime)||' second '
AS full_time_as_you_wish FROM your_table;
Result
full_time_as_you_wish
---------------------------------
0 year 0 month 0 day 0 hour 0 minute 0 second

Group every N days

Ive been searching for something like that using PostgreSQL, but havent found yet.
Lets suppose i have the following table:
id order amount created_at
2 527837 10.0 2014-12-01T...
3 527838 50.0 2014-12-02T...
4 527839 30.0 2014-12-02T...
5 527840 40.0 2014-12-10T...
6 527841 80.0 2014-12-13T...
And i want to have a query that returns the sum of all amounts for each full week of 7 days (even if some day had no orders):
Example:
week total_amount
Dec/01 - Dec/07 90.0
Dec/08 - Dec/15 120.0
Dec/16 - Dec/23 0.0
//...and so on until current date
Also, lets suppose, that January has 30 days, and February has 28 days, i want the weeks to be grouped like that:
Jan/01-Jan/08
Jan/09-Jan/16
Jan/17-Jan/24
Jan/25-Feb/02 (theres no problem on crossing months)
Feb/03-Feb/10
What is the best way to do this?
EDIT 1:
I have found a way to build a query to generate a temporary table with the days i need for my grouping, i just having dificult grouping this and joining it with my original table...
(SELECT TO_CHAR(generate_series, 'YYYY-MM-DD') as "day" FROM generate_series('2013-06-01 00:00'::timestamp,
'2015-06-01 00:00'::timestamp, '1 Day'))
select date_trunc('week', created_At),date_trunc('week', created_At)+ INTERVAL '6' DAY,
SUM(amount)
from t
GROUP BY date_trunc('week', created_At)
ORDER BY MIN(created_At);
FIDDLE

Retrieving the start and end hour queries correctly in PostgreSQL Query

I have a CTE-based query in which I retrieve hourly intervals between two given timespans. My query works as following:
Getting start and end datetimes (let's say 07-13-2011 00:21:09 and 07-31-2011 21:11:21)
get the hourly total query values between the hourly intervals (in here it's from 00 to 21, a total of 21 hours but this is parametric and depends on the hours I give for the inputs) for each day. This query works well but there is a problem. It displays hourly amounts but for the start time, it gets all the queries between 00:00:00 and 00:59:59 for each day instead of 00:21:09 - 00:59:59 and same applies for the end time, it gets all the queries between 21:00:00 and 22:00:00 for each day instead of 21:00:00 and 21:11:21. -By the way, the other hour intervals e.g 03:00 - 04:00 etc are currently retrieved normally, no minute and seconds provided, just 1 hour flat intervals- How can I fix that? The query is below, thanks.
WITH cal AS (
SELECT generate_series('2011-02-02 00:00:00'::timestamp , '2012-04-01 05:00:00'::timestamp , '1 hour'::interval) AS stamp
)
, qqq AS (
SELECT date_trunc('hour', calltime) AS stamp
, count(*) AS zcount
FROM mytable
WHERE calltime >= '07-13-2011 00:21:09' AND calltime <='07-31-2011 21:11:21' AND date_part('hour', calltime) >= 0 AND date_part('hour', calltime) <= 21
GROUP BY date_trunc('hour', calltime)
)
SELECT cal.stamp
, COALESCE (qqq.zcount, 0) AS zcount
FROM cal
LEFT JOIN qqq ON cal.stamp = qqq.stamp
WHERE cal.stamp >= '07-13-2011 00:00:00' AND cal.stamp<='07-31-2011 21:11:21' AND date_part('hour', cal.stamp) >= 0 AND date_part('hour', cal.stamp) <= 21
ORDER BY stamp ASC;
EDIT:
What I mean with my problem is, despite giving 00:21:09 for my starting hour on first day, the days after that day calculate the total query count for the first hour interval as count of total queries between 00:00:00-01:00:00 instead of 00:21:09-01:00:00.(by the way this should apply to the first hour interval for every day, I can give 04:30:21 for the starting hour and the day will start to count total queries hourly starting from there etc.- Same applies to the ending hour 21:00:00-21:11:21, only the LAST day in the query results take this interval, other days before it take the query count between hour 21 and 22 by counting all queries between 21:00:00-22:00:00 instead of 21:00:00-21:11:21.
For example, if there are 200 queries between 00:00:00 and 01:00:00 on july 14 2011 (the next day after july 13, the start date) but there are 159 queries between 00:21:09 - 01:00:00, I should get 159 queries instead of 200. Also, if there are 300 queries between 21:00:00-22:00:00 on any random day, and 123 of them are between 21:00:00-21:11:21, I should get 123 queries as result instead of 300. (This applies to every single day, other hourly intervals should be counted as usual such as 01:00-02:00, 20:00-21:00 etc. This is parametric, hourly intervals and start-end times depend on user input-
Adding AND calltime::time >= '00:21:09' AND calltime::time <= '21:11:21' to the WHERE calltime >= '07-13-2011 00:21:09' AND calltime <='07-31-2011 21:11:21' block solved the issue.