How to bin timestamp data into buckets of n minutes in postgres

How to bin timestamp data into buckets of n minutes in postgres - postgresql

I have the following query which works, binning timestamped "observations" into buckets whose boundaries are defined by the bins table:
SELECT
count(id),
width_bucket(
time :: TIMESTAMP,
(SELECT ARRAY(SELECT start_time
FROM bins
WHERE owner_id = 'some id'
ORDER BY start_time ASC) :: TIMESTAMP[])
) bucket
FROM observations
WHERE owner_id = 'some id'
GROUP BY bucket
ORDER BY bucket;
I would like to modify this to allow for querying arbitrary n-minute bins starting from a specified timestamp, rather than having to pull from from an actual "bins" table.
That is, given a start time, a "bin width" in minutes, and a number of bins, is there a way I can generate the array of timestamps to pass into the width_bucket function?
Alternatively, is there a different/simpler approach to get the same results?

Use the function generate_series(start, stop, step interval), e.g.
select array(
select generate_series(
timestamp '2018-04-15 00:00',
'2018-04-15 01:00',
'30 minutes'))
array
---------------------------------------------------------------------
{"2018-04-15 00:00:00","2018-04-15 00:30:00","2018-04-15 01:00:00"}
(1 row)
Example in Db<>fiddle.

The above answers seem to do what you want, but as of PostgreSQL 14, there is now a function date_bin just for binning timestamps.
Quoting the documentation:
date_bin(stride,source,origin)
source is a value expression of type timestamp or timestamp with time zone. (Values of type date are cast automatically to timestamp.) stride is a value expression of type interval. The return value is likewise of type timestamp or timestamp with time zone, and it marks the beginning of the bin into which the source is placed.
Examples:
SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP > '2001-01-01');
Result: 2020-02-11 15:30:00
SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP '2001-01-01 00:02:30');
Result: 2020-02-11 15:32:30
In the case of full units (1 minute, 1 hour, etc.), it gives the same result as the analogous date_trunc call, but the difference is that date_bin can truncate to an arbitrary interval.
The stride interval must be greater than zero and cannot contain units of month or larger.
I would like to call special attention to the line
The return value [...] marks the beginning of the bin into which the source is placed.
This means that input timestamps will always be binned by "rounding down", rather than binning to whichever bin is closest. E.g. if you do:
SELECT date_bin('1 hour', '2021-10-13 00:59:59', '2021-10-13 00:00:00');
Then the result will be 2020-10-13 00:00:00 (rounded down by 59 minutes and 59 seconds), NOT 2021-10-13 01:00:00 (which is only one second away from the supplied timestamp). So the date_bin function does something slightly different than exactly what you ask for, but I figure this is good to post for anyone coming here in the future.

A different approach without a series:
Divide the difference of time and start by the width of the bin (5 minutes in the example) and add 1 because the first bucket of width_bucket(...) is 1 not 0.
floor(extract(epoch from (time - '2019-06-04 00:00'::timestamp)) / (5 * 60) ) + 1 as bucket
Getting the start of the bin is also possible
to_timestamp(floor(extract(epoch from a.time) / (5 * 60)) * (5 * 60)) as bin_start
Putting this all together:
SELECT
count(id),
floor(extract(epoch from (time - '2019-06-04 00:00'::timestamp)) / (5 * 60) ) + 1 as bucket,
to_timestamp(floor(extract(epoch from time) / (5 * 60)) * (5 * 60)) as bin_start
FROM observations
WHERE owner_id = 'some id'
GROUP BY bucket, bin_start
ORDER BY bucket;

Related

Postgres generate date series with exactly 100 steps

Lets say we have the dates
'2017-01-01'
and
'2017-01-15'
and I would like to get a series of exactly N timestamps in between these dates, in this case 7 dates:
SELECT * FROM
generate_series_n(
'2017-01-01'::timestamp,
'2017-01-04'::timestamp,
7
)
Which I would like to return something like this:
2017-01-01-00:00:00
2017-01-01-12:00:00
2017-01-02-00:00:00
2017-01-02-12:00:00
2017-01-03-00:00:00
2017-01-03-12:00:00
2017-01-04-00:00:00
How can I do this in postgres?

Possibly this can be useful, using the generate series, and doing the math in the select
select '2022-01-01'::date + generate_series *('2022-05-31'::date - '2022-01-01'::date)/15
FROM generate_series(1, 15)
;
output
?column?
------------
2022-01-11
2022-01-21
2022-01-31
2022-02-10
2022-02-20
2022-03-02
2022-03-12
2022-03-22
2022-04-01
2022-04-11
2022-04-21
2022-05-01
2022-05-11
2022-05-21
2022-05-31
(15 rows)

WITH seconds AS
(
SELECT EXTRACT(epoch FROM('2017-01-04'::timestamp - '2017-01-01'::timestamp))::integer AS sec
),
step_seconds AS
(
SELECT sec / 7 AS step FROM seconds
)
SELECT generate_series('2017-01-01'::timestamp, '2017-01-04'::timestamp, (step || 'S')::interval)
FROM step_seconds
Conversion to function is easy, let me know if have trouble with it.
One problem with this solution is that extract epoch always assumes 30-days months. If this is problem for your use case (long intervals), you can tweak the logic for getting seconds from interval.

You can divide the difference between the end and the start value by the number of values you want:
SELECT *
FROM generate_series('2017-01-01'::timestamp,
'2017-01-04'::timestamp,
('2017-01-04'::timestamp - '2017-01-01'::timestamp) / 7)
This could be wrapped into a function if you want to avoid repeating the start and end value.

Recurring future date at every 3 month after create date in PostgreSQL

I am looking for a function in PostgreSQL which help me to generate recurring date after every 90 days from created date
for example: here is a demo table of mine.
id date name
1 "2020-09-08" "abc"
2 "2020-09-08" "xyz"
3 "2020-09-08" "def"
I need furure date like 2020-12-08, 2021-03-08, 2021-06-08, and so on

First it's important to note that, if you happen to have a date represented as text, then you can convert it to a date via:
SELECT TO_DATE('2017-01-03','YYYY-MM-DD');
So, if you happen to have a text as an input, then you will need to convert it to date. Next, you need to know that if you have a date, you can add days to it, like
SELECT CURRENT_DATE + INTERVAL '90 day';
Now, you need to understand that you can use dynamic variables, like:
select now() + interval '1 day' * 180;
Finally, you will need a temporary table to generate several values described as above. Read more here: How to return temp table result in postgresql function
Summary:
create a function
that generates a temporary table
where you insert as many records as you like
having the date shifted
and converting text to date if needed

You can create a function that returns a SETOF dates/timestamps. The below function takes 3 parameters: a timestamp, an interval, the num_of_periods desired. It returns num_of_periods + 1 timestamps, as it returns the original timestamp and the num_of_periods each the specified interval apart.
create or replace
function generate_periodic_time_intervals
( start_date timestamp
, period_length interval
, num_of_periods integer
, out gen_timestamp timestamp
)
returns setof timestamp
language sql
immutable strict
as $$
select (start_date + n * period_length)::timestamp
from generate_series(0,num_of_periods) gs(n)
$$;
For your particular case to timestamp/date as necessary. The same function would work for your case with the interval specified as '3 months' or of '90 days'. Just a note the interval specified can be any valid INTERVAL data type. See here. It also demonstrates the difference between 3 months and 90 days.

PostgreSQL: Date Difference with fractions

SELECT cu.user_id, cu.last_activity, cu.updated_time,
DATE_PART('day', cu.last_activity - cu.updated_time), to_char(end_date - start_date, 'DD.HH24')
FROM stats.core_users cu
WHERE cu.user_id = '117132014' or cu.user_id = '117132012';
Get the result like:
117132014 2017-12-11 10:34:51.349905 2017-12-09 12:00:38.503518 1 01.22
117132012 2017-12-11 05:18:20.312283 2017-12-08 15:46:51.914085 2 02.13
Is is feasible to get the day difference with fractions like 1.91 days in the first case, instead of 1 days and 22 hours, to be more precise and easier to fit in a machine learning model?

date_part() does what it's name says: it returns one part of several elements from a date, interval or timestamp. In your case it's one part of an interval (because timestamp - timestamp returns an interval).
If you want the result as a fraction, you need to extract the seconds of the interval and then divide that by 86400 (which is the number of seconds in a day)
extract(epoch from cu.last_activity - cu.updated_time) / 86400

DB2 - get the average time between set of dates

I have a list of events and each one has a startDate and endDate. I need to know the average time taken for each event.
I need something like this:
select sum ( (timestamp(startDate) - timestamp(endDate)) for each event )
/ (count of events)

It only makes mathematical sense to take the AVG() of a numeric value, not datetime values or durations. Since you want your answer to be in minutes precision, you want to get your difference in minutes, then convert back to days, hours, minutes. (There are 24*60=1440 minutes in a standard day.)
with q as
(select avg(
timestampdiff(4, char(endDate - startDate) )
) as avgmns
from yourChosenData
)
select int(avgmns / 1440) as avg_days,
int( mod(avgmns,1440) / 60) as avg_mins,
mod(avgmns, 60) as avg_secs
from q
As mentioned below, timestampdiff() is an estimate. To avoid this issue, one could use a more accurate calculation.
with q as
(select avg(
( days(endDate) - days(startDate) ) * 1440
+ ( midnight_seconds(endDate) - midnight_seconds(startDate) ) / 60
) as avgmns
from yourChosenData
)
select int(avgmns / 1440) as avg_days,
int( mod(avgmns,1440) / 60) as avg_mins,
mod(avgmns, 60) as avg_secs
from q
In order to address the DST issue, if needed, one might choose either of:
include a UTC offset column corresponding to each timestamp field. This would also be useful if timstamps were being recorded in more than one timezone. The diference in offsets could then be fed into the calculation along with the timestamps.
provide a deterministic UDF which could return a UTC or DST adjustment offset for a given timestamp. If multiple timezones are involved, then the zone should also be a parameter to the function. Depending on the geographic areas involved, the logic may also need to consider areas which observe alternative DST rules.

You have to be careful of the denominator to prevent a 0 division: SQL0802 - Data Conversion or Data Mapping Error
Depending on the precision of the results, you will need to convert the date. Let's suppose you need seconds (2)
select
sum ( timestampdiff(2, endDate - startDate))
/
sum (count of events)
from yourTable
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0000861.html

How can I compare two datetime fields but ignore the year?

I get to dust off my VBScript hat and write some classic ASP to query a SQL Server 2000 database.
Here's the scenario:
I have two datetime fields called fieldA and fieldB.
fieldB will never have a year value that's greater than the year of fieldA
It is possible the that two fields will have the same year.
What I want is all records where fieldA >= fieldB, independent of the year. Just pretend that each field is just a month & day.
How can I get this? My knowledge of T-SQL date/time functions is spotty at best.

You may want to use the built in time functions such as DAY and MONTH. e.g.
SELECT * from table where
MONTH(fieldA) > MONTH(fieldB) OR(
MONTH(fieldA) = MONTH(fieldB) AND DAY(fieldA) >= DAY(fieldB))
Selecting all rows where either the fieldA's month is greater or the months are the same and fieldA's day is greater.

select *
from t
where datepart(month,t.fieldA) >= datepart(month,t.fieldB)
or (datepart(month,t.fieldA) = datepart(month,t.fieldB)
and datepart(day,t.fieldA) >= datepart(day,t.fieldB))
If you care about hours, minutes, seconds, you'll need to extend this to cover the cases, although it may be faster to cast to a suitable string, remove the year and compare.
select *
from t
where substring(convert(varchar,t.fieldA,21),5,20)
>= substring(convert(varchar,t.fieldB,21),5,20)

SELECT *
FROM SOME_TABLE
WHERE MONTH(fieldA) > MONTH(fieldB)
OR ( MONTH(fieldA) = MONTH(fieldB) AND DAY(fieldA) >= DAY(fieldB) )

I would approach this from a Julian date perspective, convert each field into the Julian date (number of days after the first of year), then compare those values.
This may or may not produce desired results with respect to leap years.
If you were worried about hours, minutes, seconds, etc., you could adjust the DateDiff functions to calculate the number of hours (or minutes or seconds) since the beginning of the year.
SELECT *
FROM SOME_Table
WHERE DateDiff(d, '1/01/' + Cast(DatePart(yy, fieldA) AS VarChar(5)), fieldA) >=
DateDiff(d, '1/01/' + Cast(DatePart(yy, fieldB) AS VarChar(5)), fieldB)

Temp table for testing
Create table #t (calDate date)
Declare #curDate date = '2010-01-01'
while #curDate < '2021-01-01'
begin
insert into #t values (#curDate)
Set #curDate = dateadd(dd,1,#curDate)
end
Example of any date greater than or equal to today
Declare #testDate date = getdate()
SELECT *
FROM #t
WHERE datediff(dd,dateadd(yy,1900 - year(#testDate),#testDate),dateadd(yy,1900 - year(calDate),calDate)) >= 0
One more example with any day less than today
Declare #testDate date = getdate()
SELECT *
FROM #t
WHERE datediff(dd,dateadd(yy,1900 - year(#testDate),#testDate),dateadd(yy,1900 - year(calDate),calDate)) < 0

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to bin timestamp data into buckets of n minutes in postgres - postgresql

Related

Postgres generate date series with exactly 100 steps

Recurring future date at every 3 month after create date in PostgreSQL

PostgreSQL: Date Difference with fractions

DB2 - get the average time between set of dates

How can I compare two datetime fields but ignore the year?

Categories

Resources