Postgres SELECT Across Time Zones Specific Day

Postgres SELECT Across Time Zones Specific Day - postgresql

I have a two tables (orders and regions) that when joined return
orders.date | orders.regions.tz
--------------------------------------
2016-01-01 2:00:00 | PST
2016-01-01 2:00:00 | EST
2016-01-01 2:00:00 | EST
...
I can select my different times in the corresponding time zones using:
SELECT date::timestamp at time zone regions.tz
FROM orders INNER JOIN regions ON orders.region_id = regions.id;
Now I'm trying to find a way to SELECT all orders on that are on a specific day. That is:
Orders in the PST region between Jan 1, 2016 12:00 AM PST and Jan 2, 2016 12:00 AM PST.
Orders in the EST region between Jan 1, 2016 12:00 AM EST and Jan 2, 2016 12:00 AM EST.
...
I'm guessing this is going to rely on some sort of use of without timezone and time but I'm unsure of how to proceed.

Your best bet is to probably cast to ::date:
SELECT orders.*
FROM orders INNER JOIN regions ON orders.region_id = regions.id
WHERE (scheduled::timestamp at time zone regions.tz)::date = '2016-1-1'::date;

While I don't know if it is the most efficient way to do things, this could just be done with a bunch of OR satements.
SELECT date::timestamp at time zone regions.tz
FROM orders INNER JOIN regions ON orders.region_id = regions.id
WHERE (date::timestamp BETWEEN startTime AND endTime AND regions.tz = 'PST')
OR (date::timestamp BETWEEN startTime AND endTime AND regions.tz = 'EST')
You'll of course have to figure out all the start and end times for each individual timezone.

Related

Postgres tsrange, filter by date and time

I have an events table that has a field called duration thats of type tsrange and that captures the beginning and end time of an event thats of type timestamp. What I want is to be able to filter all events across a certain date range and then filter those events by time. So for instance, a user should be able to filter for all events happening between (inclusive) 12-15-2019 to 12-17-2019 and that are playing at 9PM. To do this, the user submits a date range which filters all events in that date range:
WHERE lower(duration)::date <# '[start, finish]'::daterange
In the above start and finish are user submitted parameters.
Then I want to filter those events by events that are playing during a specific time e.g. 9PM, essentially only show events that have 9PM between their start and end time.
So if I have the following table:
id duration
--- ------------------------------------
A 2019-12-21 19:00...2019-12-22 01:00
B 2019-12-17 16:00...2019-12-17 18:00
C 2019-12-23 19:00...2019-12-23 21:00
D 2019-12-23 19:00...2019-12-24 01:00
E 2019-12-27 14:00...2019-12-27 16:00
And the user submits a date range of 2019-12-21 to 2019-12-27 then event B will be filtered out. Then the user submits a time of 9:00PM (21:00), in which case A, C, and D will be returned.
EDIT
I was able to get it to work using the following:
WHERE duration #> (lower(duration)::date || ' 21:00:00')::timestamp
Where the 21:00 above is the user data, but it seems a bit hackish

A tsrange contains a timestamp at 9 p.m. if and only if 9 p.m. on the starting day or 9 p.m. on the following day are part of the range.
You can use that to write your condition.
An example:
lower(r)::date + TIME '21:00' <# r OR
(lower(r)::date + 1) + TIME '21:00' <# r
is a test if r contains some timestamp at 9 p.m.

The user input from 2019-12-21 to 2019-12-27 at 21:00 means that he is interested in
select generate_series(timestamp '2019-12-21 21:00', '2019-12-27 21:00', '1 day') as t
t
---------------------
2019-12-21 21:00:00
2019-12-22 21:00:00
2019-12-23 21:00:00
2019-12-24 21:00:00
2019-12-25 21:00:00
2019-12-26 21:00:00
2019-12-27 21:00:00
(7 rows)
Hence you should check whether the duration column contains one of the timestamp:
select distinct e.*
from events e
cross join generate_series(timestamp '2019-12-21 21:00', '2019-12-27 21:00', '1 day') as t
where duration #> t
id | duration
----+-----------------------------------------------
A | ["2019-12-21 19:00:00","2019-12-22 01:10:00")
C | ["2019-12-23 19:00:00","2019-12-23 21:10:00")
D | ["2019-12-23 19:00:00","2019-12-24 01:10:00")
(3 rows)

7 Day Return/Retention Rate

I've been trying to calculate 7 Day Return Rate (also known as Classic Retention Rate, as described here: https://www.braze.com/blog/calculate-retention-rate/) and then taking a 30 day average to reduce noise in Postgresql.
However, I'm sure I'm doing something wrong. First of all, the numbers look waaay higher than intuitively I feel they should be (generally around 5% for the rest of the sector). Also, I believe the first 7 days should show 0, as theoretically users should take at least 7 days to count as a "return". However, I get around 40-70%, as shown below.
Would someone mind taking a look at the code below and seeing if there are any errors? 7 Day Return Rate is a really common metric for apps, and I haven't found any questions using postgresql that calculate it to this level of sophistication on Stack Exchange (or even the rest of the web), so I feel like a solid response could be very useful to a lot of people.
Sample data
Wednesday, August 1, 2018 12:00 AM 71.14
Thursday, August 2, 2018 12:00 AM 55.44
Friday, August 3, 2018 12:00 AM 50.09
Saturday, August 4, 2018 12:00 AM 45.81
Sunday, August 5, 2018 12:00 AM 43.27
Monday, August 6, 2018 12:00 AM 40.61
Tuesday, August 7, 2018 12:00 AM 39.38
Wednesday, August 8, 2018 12:00 AM 38.46
Thursday, August 9, 2018 12:00 AM 36.81
Friday, August 10, 2018 12:00 AM 35.94
with
user_first_event as (
select distinct id, min(timestamp)::date as first_event_date
from log
where
timestamp <= current_date
and timestamp >= {{start_date}} and timestamp <= {{end_date}}
group by id),
event as (
select distinct id, timestamp::date as user_event_date
from log
where timestamp <= current_date and timestamp >= {{start_date}}),
gap as (
select
user_first_event.id,
user_first_event.first_event_date,
event.user_event_date,
event.user_event_date - user_first_event.first_event_date as days_since_signup
from user_first_event
join event on user_first_event.id = event.id
where user_first_event.first_event_date <= event.user_event_date),
conversion_rate as (
select
first_event_date,
(sum(case when days_since_signup = 7 then 1 else 0 end) * 100.0 /
count(distinct id)
) as seven_day_retention_rate
from gap
group by first_event_date
)
SELECT first_event_date,
AVG(seven_day_retention_rate)
OVER(ORDER BY first_event_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_avg_retention_rate
FROM conversion_rate

The problem is a bit easier than your query makes it seem, you can actually do it with just one subquery and one out query as follows:
select first_event_date
, avg(seven_day_return) as seven_day_return_day_only
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user, 1 if they retain and 0 if they do not
select min(timestamp)::date as first_event_date
, case when array_agg(timestamp::date) #> ARRAY[ (min(timestamp)::date + 7) ] then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;
Note that this weights each day equally rather than each user equally across days. If you want to weight the average by user across days then you can update the outer calculation using more aggregates and windows to compute the value with weightings.
Reference: http://sqlfiddle.com/#!17/ee17e/1/0
If you don't have access to array_agg (but have access to window functions) you can use:
select first_event_date
, avg(seven_day_return) as day_seven_day_return
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user
select min(timestamp)::date as first_event_date
, case when exists(select 1 from log l2 where l2.id = log.id and l2.timestamp::date = min(log.timestamp)::date + 7) then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;

How to sort attendance date along with the month?

Attendance is sorting according to date, that is fine, but I want to sort date along with the month name January should come at the bottom, and December at the top.
Table
Attendance Date
---------------
26 Feb 2018
19 Dec 2018
18 Dec 2018
14 Dec 2018
12 June 2018
7 Dec 2018
5 Feb 2018
Query
select distinct
(select ARRAY_TO_STRING(ARRAY_AGG(ARRAY[to_char(t1.l_time,'HH12:mi AM')]::text), ',')
from
(select (al1.create_time AT TIME ZONE 'UTC+5:30')::time as l_time
from users.access_log as al1
where al1.user_id = al.user_id
and al1.login_status = 1
and al1.create_time::date = al.create_time::date
order by al1.create_time::time ASC
) as t1
) as login_time,
(select ARRAY_TO_STRING(ARRAY_AGG(ARRAY[to_char(t2.o_time,'HH:mi AM')]::text), ',')
from
(select (al2.create_time AT TIME ZONE 'UTC+5:30')::time as o_time
from users.access_log as al2
where al2.user_id = al.user_id
and al2.login_status = 0
and al2.create_time::date = al.create_time::date
order by al2.create_time::time ASC
) as t2
) as logout_time,
al.create_time::date
from users.access_log as al
where al.user_id = ?;
Attendance is sorting according to date, that is fine, but I want to sort date along with the month name January should come at the bottom, and December at the top.

Creating sequence of dates and inserting each date into query

I need to find certain data within first day of current month to the last day of current month.
select count(*) from q_aggr_data as a
where a.filial_='fil1'
and a.operator_ like 'unit%'
and date_trunc('day',a.s_end_)='"+ date_to_search+ "'
group by a.s_name_,date_trunc('day',a.s_end_)
date_to_searh here is 01.09.2014,02.09.2014, 03.09.2014,...,30.09.2014
I've tried to loop through i=0...30 and make 30 queries, but that takes too long and extremely naive. Also to the days where there is no entry it should return 0. I've seen how to generate date sequences, but can't get my head around on how to inject those days one by one into the query

By creating not only a series, but a set of 1 day ranges, any timestamp data can be joined to the range using >= with <
Note in particular that this approach avoids functions on the data (such as truncating to date) and because of this it permits the use indexes to assist query performance.
If some data looked like this:
CREATE TABLE my_data
("data_dt" timestamp)
;
INSERT INTO my_data
("data_dt")
VALUES
('2014-09-01 08:24:00'),
('2014-09-01 22:48:00'),
('2014-09-02 13:12:00'),
('2014-09-03 03:36:00'),
('2014-09-03 18:00:00'),
Then that can be joined, using an outer join so unmatched ranges are still reported to a generated set of ranges (dt_start & dt_end pairs)
SELECT
r.dt_start
, count(d.data_dt)
FROM (
SELECT
dt_start
, dt_start + INTERVAL '1 Day' dt_end
FROM
generate_series('2014-09-01 00:00'::timestamp,
'2014-09-30 00:00', '1 Day') AS dt_start
) AS r
LEFT OUTER JOIN my_data d ON d.data_dt >= r.dt_start
AND d.data_dt < r.dt_end
GROUP BY
r.dt_start
ORDER BY
r.dt_start
;
and a result such as this is produced:
| DT_START | COUNT |
|----------------------------------|-------|
| September, 01 2014 00:00:00+0000 | 2 |
| September, 02 2014 00:00:00+0000 | 1 |
| September, 03 2014 00:00:00+0000 | 2 |
| September, 04 2014 00:00:00+0000 | 2 |
...
| September, 29 2014 00:00:00+0000 | 0 |
| September, 30 2014 00:00:00+0000 | 0 |
See this SQLFiddle demo

One way to solve this problem is to group by truncated date.
select count(*)
from q_aggr_data as a
where a.filial_='fil1'
and a.operator_ like 'unit%'
group by date_trunc('day',a.s_end_), a.s_name_;
The other way is to use a window function, for getting the count over truncated date for example.

Please check if this query satisfies your requirements:
select sum(matched) -- include s_name_, s_end_ if you want to verify the results
from
(select a.filial_
, a.operator_
, a.s_name_
, generate_series s_end_
, (case when a.filial_ = 'fil1' then 1 else 0 end) as matched
from q_aggr_data as a
right join generate_series('2014-09-01', '2014-09-30', interval '1 day')
on a.s_end_ = generate_series
and a.filial_ = 'fil1'
and a.operator_ like 'unit%') aa
group by s_name_, s_end_
order by s_end_, s_name_
http://sqlfiddle.com/#!15/e8edf/3

PostgreSQL UTC to CET/CEST

I have some big trouble with timezones and Benjamin Franklin.
I have a table with an UTC timestamp field, no timezone is stored.
My goal is to group the rows of this table by day of week or by slices like n hours.
This id how I proceeded until now :
-- CET => UTC/GMT + 1 => Winter
-- CEST => UTC/GMT + 2 => Summer
--
-- Hour changes 2014 : march 30th, 2h CET => 3h CEST
--
--
-- Output for "d" Wanted output for "d"
-- "2014-03-28 23:00:00+00" "2014-03-28 23:00:00+00" (23h UTC because CET)
-- "2014-03-29 23:00:00+00" "2014-03-29 23:00:00+00" (23h UTC because CET)
-- "2014-03-30 23:00:00+00" "2014-03-30 22:00:00+00" (22h UTC because CEST : fail because start timestamp is CET)
SELECT dates.d AS d, archives_d, range_begin_date, sequentialid
FROM generate_series('2014-03-28T23:00:00+00:00'::timestamp, -- 23h UTC because CET -- 2014-03-28T22:00:00+00:00
'2014-03-31T21:59:59+00:00'::timestamp, -- 22h UTC because CEST -- 2014-03-31T21:59:59+00:00
'86400 seconds') AS dates(d)
LEFT JOIN (
-- Select archive date floored by a step (here 86400 seconds -> 24 hours)
SELECT *, (to_timestamp(((floor(extract(epoch from (archives.range_begin_date - '2014-03-28T23:00:00+00:00'::timestamp)) / 86400)) * 86400) + extract(epoch from '2014-03-28T23:00:00+00:00'::timestamp))) AS archives_d
FROM archives
) AS archives
ON dates.d = archives.archives_d
ORDER BY dates.d
Do you have any idea of how get the wanted output (at least for the generate series).
Note that my step for the generate series is not fixed at one day but an arbitrary interval.
Thanks

There is a complete list of timezone names and offsets in pg_timezone_names.
I'm guessing you want a city-based name rather than a zone-based name. That way you get daylight-saving adjustments thrown in.
=> SELECT now() AT TIME ZONE 'America/New_York' AS ny, now() AT TIME ZONE 'Europe/London' AS lon;
ny | lon
---------------------------+---------------------------
2014-08-04 09:01:06.08988 | 2014-08-04 14:01:06.08988
(1 row)
The above was posted at 13:01 UTC. I'm guessing New York is what you want, but geography isn't my strong point :-)