I have a table with a column abc carrying the unix timestamp (eg. 13898161481435) and I want to run a between dates select.
It would be not efficient to do a
where TO_CHAR(TO_TIMESTAMP(abc / 1000), 'DD/MM/YYYY') > '14/01/2014 00:00:00' and ..;
which would convert every record.
Rather do something like
where abc > ('14/01/2014 00:00:00' tobigint()) and abc < ...
But I cant find any reference, though for the reverse case.
Try this
WHERE abc > extract(epoch from timestamp '2014-01-28 00:00:00')
PostgreSQL Docs
You do not need to convert it to char to compare it.
WHERE to_timestamp(abc/1000) > timestamp '2014-01-28 00:00:00'
I don't think that conversion would be very inefficient because timestamps are stored internally in a similar format to epoch secs (admittedly with a different origin and resolution).
If you really want to go the other way:
WHERE abc > extract(epoch from timestamp '2014-01-28 00:00:00')
Interesting observation though, while
select count(*) from cb.logs where to_timestamp(timestmp/1000) > timestamp '2014-01-15 00:00:00' and to_timestamp(timestmp/1000) < timestamp '2014-01-15 23:59:59';
takes almost 10 seconds (my db with 1,5 mill records), the below only 1,5 sec
select count(*) from cb.logs where (timestmp > (select extract(epoch from timestamp '2014-01-15 00:00:00') * 1000) and timestmp < (select extract(epoch from timestamp '2014-01-15 23:59:59') * 1000));
and the below about 1sec
select count(*) from cb.logs where (timestmp > extract(epoch from timestamp '2014-01-15 00:00:00') * 1000) and (timestmp < extract(epoch from timestamp '2014-01-15 23:59:59') * 1000);
to count ~40.000 records
Most likely because the division I would say.
1
select count(*) from cb.logs where to_timestamp(timestmp/1000) > timestamp '2014-01-15 00:00:00' and to_timestamp(timestmp/1000) < timestamp '2014-01-15 23:59:59';
8600ms
"Aggregate (cost=225390.52..225390.53 rows=1 width=0)"
" -> Seq Scan on logs (cost=0.00..225370.34 rows=8073 width=0)"
" Filter: ((to_timestamp(((timestmp / 1000))::double precision) > '2014-01-15 00:00:00'::timestamp without time zone) AND (to_timestamp(((timestmp / 1000))::double precision) < '2014-01-15 23:59:59'::timestamp without time zone))"
2
select count(*) from cb.logs where (timestmp > (select extract(epoch from timestamp '2014-01-15 00:00:00') * 1000) and timestmp < (select extract(epoch from timestamp '2014-01-15 23:59:59') * 1000));
1199ms
"Aggregate (cost=209245.94..209245.95 rows=1 width=0)"
" InitPlan 1 (returns $0)"
" -> Result (cost=0.00..0.01 rows=1 width=0)"
" InitPlan 2 (returns $1)"
" -> Result (cost=0.00..0.01 rows=1 width=0)"
" -> Seq Scan on logs (cost=0.00..209225.74 rows=8073 width=0)"
" Filter: (((timestmp)::double precision > $0) AND ((timestmp)::double precision < $1))"
Related
I have a query below to query max and min of day interval in a range of time ( current_date - 2 to current_date - 1). Now, I need to query dayshift and extra shift separately ( dayshift from 5am to 3pm, extra shift will be the remains).
select sum(gap) from (
select to_char(time_stamp, 'yyyy/mm/dd') as day,
EXTRACT(EPOCH FROM (max(time_stamp) - min(time_stamp))) /3600 as gap
from group_table_debarker
where time_stamp >= (current_date - 2)
and time_stamp <= (current_date - 1)
and to_char(time_stamp, 'hh:mi') > '03:00' and to_char(time_stamp, 'hh:mi') < '15:00'
group by to_char(time_stamp, 'yyyy/mm/dd')
) as xxx
select sum(gap) from (
select to_char(time_stamp, 'yyyy/mm/dd') as day,
EXTRACT(EPOCH FROM (max(time_stamp) - min(time_stamp))) /3600 as gap
from group_table_debarker
where time_stamp >= (current_date - 2)
and time_stamp <= (current_date - 1)
and to_char(time_stamp, 'hh:mi') > '03:00' and to_char(time_stamp, 'hh:mi') < '15:00'
group by to_char(time_stamp, 'yyyy/mm/dd')
) as xxx
I've tried this but result wasn't expected
I need to get the difference in minutes excluding weekends (Saturday, Sunday), between 2 timestamps in postgres, but I'm not getting the expected result.
Examples:
Get diff in minutes, however, weekends are include
SELECT EXTRACT(EPOCH FROM (NOW() - '2021-08-01 08:00:00') / 60)::BIGINT as diff_in_minutes;
$ diff_in_minutes = 17566
Get diff in weekdays, excluding saturday and sunday
SELECT COUNT(*) as diff_in_days
FROM generate_series('2021-08-01 08:00:00', NOW(), interval '1d') d
WHERE extract(isodow FROM d) < 6;
$ diff_in_days = 10
Expected:
From '2021-08-12 08:00:00' to '2021-08-13 08:00:00' = 1440
From '2021-08-13 08:00:00' to '2021-08-16 08:00:00' = 1440
From '2021-08-13 08:00:00' to '2021-08-17 08:00:00' = 2880
and so on ...
the solution is:
SELECT GREATEST(COUNT(*) - 1, 0)
FROM generate_series(from_ts, to_ts, interval'1 minute') AS x
WHERE extract(isodow FROM x) <= 5
so
SELECT GREATEST(COUNT(*) - 1, 0)
FROM generate_series('2021-08-13 08:00:00'::timestamp, '2021-08-17 08:00:00', '1 minute') AS x
WHERE extract(isodow FROM x) <= 5
returns 2880
This is not an optimal solution - but I will leave finding the optimal solution as a homework for you.
First, create an SQL function
CREATE OR REPLACE FUNCTION public.time_overlap (
b_1 timestamptz,
e_1 timestamptz,
b_2 timestamptz,
e_2 timestamptz
)
RETURNS interval AS
$body$
SELECT GREATEST(interval '0 second',e_1 - b_1 - GREATEST(interval '0 second',e_1 - e_2) - GREATEST(interval '0 second',b_2 - b_1));
$body$
LANGUAGE 'sql'
IMMUTABLE
RETURNS NULL ON NULL INPUT
SECURITY INVOKER
PARALLEL SAFE
COST 100;
Then, call it like this:
WITH frame AS (SELECT generate_series('2021-08-13 00:00:00', '2021-08-17 23:59:59', interval '1d') AS d)
SELECT SUM(EXTRACT(epoch FROM time_overlap('2021-08-13 08:00:00', '2021-08-17 08:00:00',d,d + interval '1 day'))/60) AS total
FROM frame
WHERE extract(isodow FROM d) < 6
In the CTE you should round down the left/earlier of the 2 timestamps and round up the right/later of the 2 timestamps. The idea is that you should generate the series over whole days - not in the middle of the day.
When calling the time_overlap function you should use the exact values of your 2 timestamps so that it properly calculates the overlapping in minutes between each day of the generated series and the given timeframe between your 2 timestamps.
In the end, when you sum over all the overlappings - you will get the total number of minutes excluding the weekends.
I am trying to figure out a way to report how many people are in a location at the same time, down to the second.
I have a table with the id for the person, the date they entered, the time they entered, the date they left and the time they left.
example:
select unique_id, start_date, start_time, end_date, end_time
from My_Table
where start_date between '09/01/2019' and '09/02/2019'
limit 3
"unique_id" "start_date" "start_time" "end_date" "end_time"
989179 "2019-09-01" "06:03:13" "2019-09-01" "06:03:55"
995203 "2019-09-01" "11:29:27" "2019-09-01" "11:30:13"
917637 "2019-09-01" "11:06:46" "2019-09-01" "11:06:59"
i've concatenated the start_date & start_time as well as end_date & end_time so they are 2 fields
select unique_id, ((start_date + start_time)::timestamp without time zone) as start_date,
((end_date + end_time)::timestamp without time zone) as end_date
result example:
"start_date"
"2019-09-01 09:28:54"
so i'm making that a CTE, then using a second CTE that uses generate_series between dates down to the second.
The goal being, the generate series will have a row for every second between the two dates. Then when i join my data sets, i can count how many records exist in my_table where the start_date(plus time) is equal or greater than the generate_series date_time field, and the end_date(plus time) is less than or equal to the generate_series date_time field.
i feel that was harder to explain than it needed to be.
in theory, if a person was in the room from 2019-09-01 00:01:01 and left at 2019-09-01 00:01:03, i would count that record in the generate_series rows 2019-09-01 00:01:01, 2019-09-01 00:01:02 & 2019-09-01 00:01:03.
When i look at the data i can see that i should be returning hundreds of people in the room at specific peak periods. but the query returns all 0's.
is this possibly a field formatting issue i need to adjust?
Here is the query:
with CTE as (
select unique_id, ((start_date+start_time)::timestamp without time zone) as start_date,
((end_date+end_time)::timestamp without time zone) as end_date
from My_table
where start_date between '09/01/2019' and '09/02/2019'
),
time_series as (
select generate_series( (date '2019-09-01')::timestamp, (date '2019-09-02')::timestamp, interval '1 second') as date_time
)
/*FINAL SELECT*/
select date_time, count(B.unique_id) as NumPpl
FROM (
select A.date_time
FROM time_series a
)x
left join CTE b on b.start_date >= x.date_time AND b.end_date <= x.date_time
GROUP BY 1
ORDER BY 1
(partial) result screenshot
Thank you in advance
i should also add i have read only access to this database so i'm not able to create functions.
Simple version: b.start_date >= x.date_time AND b.end_date <= x.date_time will never be true assuming end_date is always after start_date.
Longer version: You also do not need a CTE for the generate_series() and there is no reason for selecting all columns and all rows of this CTE as a subquery. I would also drop the CTE for your original data and just join it to the seconds (NOTE: this does somehow change the query, since you might now take those entries into account, where start_date is earlier than 2019-09-01. If you do not want this, you can add your condition again to the join condition. But I guess this is what you really wanted). I also removed some casts which were not needed. Try this:
SELECT gs.second, COUNT(my.unique_id)
FROM generate_series('2019-09-01'::timestamp, '2019-09-02'::timestamp, interval '1 second') gs (second)
LEFT JOIN my_table my ON (my.start_date + my.start_time) <= gs.second
AND (my.end_date + my.end_time) >= gs.second
GROUP BY 1
ORDER BY 1
I have a postgres 9.6 table with hundreds of millions of records in 'prices' table with only four columns: uid, price, unit, dt. dt is a datetime in standard format like '2017-05-01 00:00:00.585' with fractions of a second. It might be none to dozens record each second.
I can find MAX and MIN price record in some time period. I can quite easily select a period using
SELECT date_trunc('second', dt) as time, min(price), max(price)
FROM prices
WHERE dt >= '2017-05-01 00:00:00' AND dt < '2017-05-01 00:00:59'
GROUP BY time
ORDER BY time;
But date_trunc does not have flexibility and does not allow to set arbitrary period, for example 5 seconds, or 10 minutes. Is there a way to solve it?
Use generate_series to get the ranges on the interval of time you need to search. Then use dd + '5 seconds'::interval to get the upper bound of the range
In this example we look for one day of data every 5 seconds
WITH ranges as (
SELECT dd as start_range,
dd + '5 seconds'::interval as end_range,
ROW_NUMBER() over () as grp
FROM generate_series
( '2017-05-01 00:00:00'::timestamp
, '2017-05-02 00:00:00'::timestamp
, '5 seconds'::interval) dd
), create_grp as (
SELECT r.grp, r.start_range, r.end_range, p.price
FROM prices p
JOIN ranges r
ON p.date >= r.start_range
AND p.date < r.end_range
)
SELECT grp, start_range, end_range, MIN(price), MAX(price)
FROM create_grp
GROUP BY grp, start_range, end_range
ORDER BY grp
I am using a SQL query for monthly extraction of data from a huge PostgreSQL replica database which stores location data. Currently I have split it into 3 parts (10 days each) and each part is taking roughly 21 hours to complete. Was wondering if there is any way to optimize the query and process the data more quickly.
select
asset_dcs.registration_number,
date_trunc('day', transmitter_received_dttm + '08:00:00' + '-04:00:00') AS bussines_date,
min(seq_num) as min_seq_num,
max(seq_num) as max_seq_num,
count (*) row_count
from dcs_posn
LEFT OUTER JOIN asset_dcs on (asset_id = asset_dcs.id)
where 1=1
and date_trunc('day', transmitter_received_dttm + '08:00:00' + '-04:00:00') > '2015-12-31'
and date_trunc('day', transmitter_received_dttm + '08:00:00' + '-04:00:00') <= '2016-01-10'
group by asset_id, bussines_date, asset_dcs.registration_number;
The most obvious improvement is in your filter:
where 1=1
and date_trunc('day', transmitter_received_dttm + '08:00:00' + '-04:00:00') > '2015-12-31'
and date_trunc('day', transmitter_received_dttm + '08:00:00' + '-04:00:00') <= '2016-01-10'
should be rewritten as:
WHERE transmitter_received_dttm > '2015-12-31 20:00:00'::timestamp
AND transmitter_received_dttm <= '2016-01-10 20:00:00'::timestamp
The date_trunc() function is very wasteful the way that you use it.
Otherwise you should add an EXPLAIN ... to your question so that we can see the query plan, as well as other performance-related information such as any indexes.