Get Data From Postgres Table At every nth interval - postgresql

Below is my table and i am inserting data from my windows .Net application at every 1 Second Interval. i want to write query to fetch data from the table at every nth interval for example at every 5 second.Below is the query i am using but not getting result as required. Please Help me
CREATE TABLE table_1
(
timestamp_col timestamp without time zone,
value_1 bigint,
value_2 bigint
)
This is my query which i am using
select timestamp_col,value_1,value_2
from (
select timestamp_col,value_1,value_2,
INTERVAL '5 Seconds' * (row_number() OVER(ORDER BY timestamp_col) - 1 )
+ timestamp_col as r
from table_1
) as dt
Where r = 1

Use date_part() function with modulo operator:
select timestamp_col, value_1, value_2
from table_1
where date_part('second', timestamp_col)::int % 5 = 0

Related

Extract() from Postgres to calculate minutes between 2 columns

Want to calculate minutes between to columns with start_time and end_time as timestamp without zone for two customers types, and then averange the result for each.
I tried to use extract() by using the following statement, but can't get the right result:
select avg(duration_minutes)
from (
select started_at,ended_at, extract('minute' from (started_at - ended_at)) as duration_minutes
from my_data
where customer_type = 'member'
) avg_duration;
Result:
avg
0.000
This run sucessfuly in BQ using the following:
select avg(duration_minutes) from
(
select started_at,ended_at,
datetime_diff(ended_at,started_at, minute) as duration_minutes
from my_table
where customer_type = "member"
) avg_duration
Result:
f0_
21.46
Wondering what might be failing in postgres?
extract(minute from ...) extracts the field with the minutes from the interval. So if the interval is "1 hour 26 minutes and 45 seconds" the result would be 26 not 86.
To convert an interval to the equivalent number of minutes, extract the total number of seconds using extract(epoch ...) and multiply that with 60.
select avg(duration_minutes)
from (
select started_at,
ended_at,
extract(epoch from (started_at - ended_at)) * 60 as duration_minutes
from my_data
where customer_type = 'member'
) avg_duration;
Note that you can calculate the average of an interval without the need to convert it to minutes:
select avg(duration)
from (
select started_at,
ended_at,
started_at - ended_at as duration
from my_data
where customer_type = 'member'
) avg_duration;
Depending on how you use the result, returning an interval might be more useful. You can also convert that to minutes using:
extract(epoch from avg(duration)) * 60 as average_minutes

Postgresql generate series with interval '15 minutes' longer than 29092 items

Sut:
create table meter.materialized_quarters
(
id int4 not null generated by default as identity,
tm timestamp without time zone
,constraint pk_materialized_quarters primary key (id)
--,constraint uq_materialized_quarters unique (tm)
);
Then setup data:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-01-01', '2030-10-30', interval '15 minute');
And check data:
select count(*),tm
from meter.materialized_quarters
group by tm
having count(*)> 1
Some results:
count|tm |
-----+-----------------------+
2|1999-10-31 02:00:00.000|
2|1999-10-31 02:15:00.000|
2|1999-10-31 02:30:00.000|
2|1999-10-31 02:45:00.000|
2|2000-10-29 02:00:00.000|
2|2000-10-29 02:15:00.000|
2|2000-10-29 02:30:00.000|
2|2000-10-29 02:45:00.000|
2|2001-10-28 02:00:00.000|
2|2001-10-28 02:15:00.000|
2|2001-10-28 02:30:00.000|
....
Details:
select * from meter.materialized_quarters where tm = '1999-10-31 01:45:00';
Result:
id |tm |
-----+-----------------------+
29092|1999-10-31 01:45:00.000|
As I see, 29092 is maximum series of nonduplicated data generated by: GENERATE_SERIES with 15 minutes interval.
How to fill table (meter.materialized_quarters) from 1999 to 2030?
One solution is:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-01-01', '1999-10-31 01:45:00', interval '15 minute');
then:
insert into meter.materialized_quarters (tm)
select GENERATE_SERIES ('1999-10-31 02:00:00.000', '2000-10-29 00:00:00.000', interval '15 minute');
and again, and again.
Or
with bad as (
select count(*),tm
from meter.materialized_quarters
group by tm
having count(*)> 1
)
, ids as (
select mq1.id, mq2.id as iddel
from meter.materialized_quarters mq1 inner join bad on bad.tm = mq1.tm inner join meter.materialized_quarters mq2 on bad.tm = mq2.tm
where mq1.id<mq2.id
)
delete from meter.materialized_quarters
where id in (select iddel from ids);
Is there more 'elegant' way?
EDIT.
I see the problem.
xxxx-10-29 02:00:00 - summer time become winter time.
select GENERATE_SERIES ('1999-10-31 01:45:00', '1999-10-31 02:00:00', interval '15 minute');
Your problem is the conversion from timestamp WITH time zone which is returned by generate_series() and your column which is defined as timestamp WITHOUT time zone.
1999-10-31 is the day where daylight savings time changes (at least in some countries)
If you change your column to timestamp WITH time zone your code works without any modification.
Example
If you want to stick with timestamp WITHOUT timestamp you need to convert the value returned by generate_series()
insert into materialized_quarters (tm)
select g.tm at time zone 'UTC' --<< change to the time zone you need
from GENERATE_SERIES ('1999-01-01', '2030-10-30', interval '15 minute') as g(tm)
Example

SQL partition by query optimization

I have below prices table and I want to obtain last_30_days price array and last_year_price from it. As shown below
CREATE TABLE prices
(
id integer NOT NULL,
"time" timestamp without time zone NOT NULL,
close double precision NOT NULL,
CONSTRAINT prices_pkey PRIMARY KEY (id, "time")
)
select id,time,first_value(close) over (partition by id order by time range between '1 year' preceding and CURRENT ROW) as prev_year_close,
array_agg(p.close) OVER (PARTITION BY id ORDER BY time ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) as prices_30
from prices p
However I want to place a where clause on the prices table to get last_30_days price array and last_year_price for some specific rows. Eg. where time >= '1 week' interval (so I run this query only for last 1 week values as opposed to the entire table)
But a where clause pre-filtering the rows and then window partition only runs on that conditioned rows which results in wrong results. it is giving result like
time, id, last_30_days
-1day, X, [A,B,C,D,E, F,G]
-2day, X, [A,B,C,D,E,F]
-3day, X, [A,B,C,D,E]
-4day, X, [A,B,C,D]
-5day, X, [A,B,C]
-6day, X, [A,B]
-7day, X, [A]
How do I fix this so that partition over window always takes 30 values irrespective of where condition? Without having to run the query always on the entire table and then selecting a subset of rows with where clause. My prices table is huge and running it on entire table is very expensive.
EDIT
CREATE TABLE prices
(
id integer NOT NULL,
"time" timestamp without time zone NOT NULL,
close double precision NOT NULL,
prev_30 double precision[],
prev_year double precision,
CONSTRAINT prices_pkey PRIMARY KEY (id, "time")
)
Use a subquery:
SELECT *
FROM (select id, time,
first_value(close) over (partition by id order by time range between '1 year' preceding and CURRENT ROW) as prev_year_close,
array_agg(p.close) OVER (PARTITION BY id ORDER BY time ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) as prices_30 I
from prices p
WHERE time >= current_timestamp - INTERVAL '1 year 1 week') AS q
WHERE time >= current_timestamp - INTERVAL '1 week' ;

Postgres: difference between two timestamps (hours:minutes:seconds)

i'm creating a select that calculate the difference between two timestamps
here the code: (isn't necessary you understand tables below. Just follow the thread)
(select value from demo.data where id=q.id and key='timestampend')::timestamp
- (select value from demo.data where id=q.id and key='timestampstart')::timestamp) as durata
Look at this example, if you want easier:
select timestamp_end::timestamp - timestamp_start as duration
here the result:
// "durata" is duration
The problem is that the first timestamp is 2017-06-21 and the second is 2017-06-22 so we have 1 day and some hours of difference.
How can i do for show the result not like "1 day 02:06:41.993657" but "26:06:41.993657" without milliseconds (26:06:41)?
Update
I'm testing this query:
select id as ticketid,
(select value from demo.data where id=q.id and key = 'timestampstart')::timestamp as TEnd,
(select value from demo.data where id=q.id and key = 'timestampend')::timestamp as TStart,
(select
make_interval
(
0,0,0,0, -- years, months, weeks, days
extract(days from duration1)::int * 24 + extract(hours from duration1)::int, -- calculated hours (days * 24 + hours)
extract(mins from duration1)::int, -- minutes
floor(extract(secs from duration1))::int -- seconds, without miliseconds, thus FLOOR()
) as duration1
from
(
(select value from demo.data where id=q.id and key='timestampstart')::timestamp - (select value from demo.data where id=q.id and key='timestampend')::timestamp
) t(duration) as dur
from (select distinct id from demo.data) q
error is the same: [Err] ERROR: syntax error at or near "::"
there is an error on id = q.id
data table is like this:
You could use EXTRACT function and wrap it up with MAKE_INTERVAL and some math. It's pretty straight forward, since you pass each part of timestamp to it:
select
make_interval(
0,0,0,0, -- years, months, weeks, days
extract(days from durdata)::int * 24 + extract(hours from durdata)::int, -- calculated hours (days * 24 + hours)
extract(mins from durdata)::int, -- minutes
floor(extract(secs from durdata))::int -- seconds, without miliseconds, thus FLOOR()
) as durdata
from (
select '2017-06-22 02:06:41.993657'::timestamp - '2017-06-21'::timestamp
) t(durdata);
Output:
durdata
----------
26:06:41
You could wrap it up within a function to make it easy to work with.
There is no worry about timestamp - timestamp returning an output with precision to more than days, and thus losing you some information, because even calculation for different years would still return days and additional time part.
Example:
postgres=# select ('2019-06-22 01:03:05.993657'::timestamp - '2017-06-21'::timestamp) as durdata;
durdata
------------------------
731 days 01:03:05.993657
In Postgres, although interval data type allows having hours value greater than 23 (see https://www.postgresql.org/docs/9.6/static/functions-formatting.html), to_char() function will cut out days and will take only "hours within a day" if you put delta value to it and try to get 'HH24' value.
So, I ended up with such trick, combining to_char(...) with extract('epoch' from...) and then putting the concatinated value to another to_char():
with timestamps(ts1, ts2) as (
select
'2017-06-21'::timestamptz,
'2017-06-22 01:03:05.1212'::timestamptz
), res as (
select
round(extract('epoch' from ts2 - ts1) / 3600) as hours,
to_char(ts2 - ts1, 'MI:SS') as min_sec
from timestamps
)
select hours, min_sec, to_char(format('%s:%s', hours, min_sec)::interval, 'HH24:MI:SS')
from res;
The result is:
hours | min_sec | to_char
-------+---------+----------
25 | 03:05 | 25:03:05
(1 row)
You can define an SQL function to make using it easier:
create or replace function extract_hhmmss(timestamptz, timestamptz) returns interval as $$
with delta(i) as (
select
case when $2 > $1 then $2 - $1
else $1 - $2
end
), res as (
select
round(extract('epoch' from i) / 3600) as hours,
to_char(i, 'MI:SS') as min_sec
from delta
)
select
(
case when $2 < $1 then '-' else '' end
|| to_char(format('%s:%s', hours, min_sec)::interval, 'HH24:MI:SS')
)::interval
from res;
$$ language sql stable;
Example of usage:
[local]:5432 nikolay#test=# select extract_hhmmss('2017-06-21'::timestamptz, '2017-06-22 01:03:05.1212'::timestamptz);
extract_hhmmss
----------------
25:03:05
(1 row)
Time: 0.882 ms
[local]:5432 nikolay#test=# select extract_hhmmss('2017-06-22 01:03:05.1212'::timestamptz, '2017-06-21'::timestamptz);
extract_hhmmss
----------------
-25:03:05
(1 row)
Notice, that it will give an error if timestamps are provided in reverse order, but it's not really hard to fix. // Update: already fixed.

generate_series function in Amazon Redshift

I tried the below:
SELECT * FROM generate_series(2,4);
generate_series
-----------------
2
3
4
(3 rows)
SELECT * FROM generate_series(5,1,-2);
generate_series
-----------------
5
3
1
(3 rows)
But when I try,
select * from generate_series('2011-12-31'::timestamp, '2012-12-31'::timestamp, '1 day');
It generated error.
ERROR: function generate_series(timestamp without time zone, timestamp without time zone, "unknown") does not exist
HINT: No function matches the given name and argument types. You may need to add explicit type casts.
I use PostgreSQL 8.0.2 on Redshift 1.0.757.
Any idea why it happens?
UPDATE:
generate_series is working with Redshift now.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,31) i
ORDER BY 1
This will generate last 30 days date
The version of generate_series() that supports dates and timestamps was added in Postgres 8.4.
As Redshift is based on Postgres 8.0, you need to use a different way:
select timestamp '2011-12-31 00:00:00' + (i * interval '1 day')
from generate_series(1, (date '2012-12-31' - date '2011-12-31')) i;
If you "only" need dates, this can be abbreviated to:
select date '2011-12-31' + i
from generate_series(1, (date '2012-12-31' - date '2011-12-31')) i;
generate_series is working with Redshift now.
SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime
FROM generate_series(1,31) i
ORDER BY 1
This will generate last 30 days date
I found a solution here for my problem of not being able to generate a time dimension table on Redshift using generate_series(). You can generate a temporary sequence by using the following SQL snippet.
with digit as (
select 0 as d union all
select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
),
seq as (
select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
from digit a
cross join
digit b
cross join
digit c
cross join
digit d
order by 1
)
select (getdate()::date - seq.num)::date as "Date"
from seq;
The generate_series() function, it seems, is not supported completely on Redshift yet. If I run the SQL mentioned in the answer by DJo, it works, because the SQL runs only on the leader node. If I prepend insert into dim_time to the same SQL it doesn't work.
There is no generate_series() function in Redshift for Date Range but you can generate the series with below steps...
Step 1: Created a table genid and insert constant value as 1 for number of times you need to generate the series. If you need the series to be generated for 12 month you can insert 12 times. Better you can insert for more number of times like 100, so that you do not face any issue.
create table genid(id int)
------------ for number of months
insert into genid values(1)
Step 2: The table for which you need to generate the series.
create table pat(patid varchar(10),stdt timestamp, enddt timestamp);
insert into pat values('Pat01','2018-03-30 00:00:00.0','2018-04-30 00:00:00.0')
insert into pat values('Pat02','2018-02-28 00:00:00.0','2018-04-30 00:00:00.0')
insert into pat values('Pat03','2017-10-28 00:00:00.0','2018-04-30 00:00:00.0')
Step 3: This query will generate the series for you.
with cte as
(
select max(enddt) as maxdt
from pat
) ,
cte2 as(
select dateadd('month', -1 * row_number() over(order by 1), maxdt::date ) as gendt
from genid , cte
) select *
from pat, cte2
where gendt between stdt and enddt