Query for distinct set of non-overlapping ranges in PostgreSQL - postgresql

Given a column of overlapping and/or discontinuous ranges:
WITH tbl (active_dates) AS
(
VALUES
('["2015-05-21","2018-10-01")'::TSRANGE),
('["2016-08-13","2018-09-01")'::TSRANGE),
('["2019-03-01","2019-05-01")'::TSRANGE)
)
SELECT *
FROM tbl;
How can we generate an output that identifies all the discrete time periods like so:
active_dates
------------
["2015-05-21 00:00:00","2016-08-13 00:00:00")
["2016-08-13 00:00:00","2018-09-01 00:00:00")
["2018-09-01 00:00:00","2018-10-01 00:00:00")
["2019-03-01 00:00:00","2018-05-01 00:00:00")

As always, you can do that with window functions:
WITH tbl (active_dates) AS
(
VALUES
('["2015-05-21","2018-10-01")'::TSRANGE),
('["2016-08-13","2018-09-01")'::TSRANGE),
('["2019-03-01","2019-05-01")'::TSRANGE)
),
/* get all time points where something changes */
points AS (
SELECT upper(active_dates) AS p
FROM tbl
UNION SELECT lower(active_dates)
FROM tbl
),
/*
* Get all date ranges between these time points.
* The first time range will start with NULL,
* but that will be excluded in the next CTE anyway.
*/
inter AS (
SELECT tsrange(
lag(p) OVER (ORDER BY p),
p
) i
FROM points
)
/*
* Get all date ranges that are contained
* in at least one of the intervals.
*/
SELECT DISTINCT i
FROM inter
CROSS JOIN tbl
WHERE i <# active_dates
ORDER BY i;

Related

Finding the timeslot with the maximum decrease in count of nearby points

For each entry in the loc_of_interest table, I want to find the 15 minute timeslot (from the data in the other cte) with the maximum decrease in count of nearby points. I do not know how to proceed beyond the 'pseudocode' part, and indeed, am uncertain if I am going in the right direction with this existing code as well.
Here is my code:
-- I have two cte's already made
subset_cr -- many rows of data
(device_id, points_geom, time_created)
loc_of_interest -- 2 rows of data
(loc_id, points_geom)
-- here is how I wish to proceed:
with temp as (
SELECT loi.loc_id AS loc_id,
routes.fifteen_min_slot ,
routes.count_of_near_points
FROM loc_of_interest as loi
CROSS JOIN LATERAL (
SELECT date_trunc('hour', routes.time_created) + date_part('minute', routes.time_created)::int / 15 * interval '15 min' as fifteen_min_slot,
count (ST_DWithin(
loi.point_geom::geography,
st_transform(route_points.point_geom,4326)::geography,
100)) as count_of_near_points
FROM subset_cr as routes
) routes
group by 1,2
)
--pseudocode below
for each loc_id
select fifteen_min_slot
from temp
where difference in count_of_near_points is max
Code update:
I have added the following code for the pseudocode I wrote earlier:
tempy as (
select loc_id, fifteen_min_slot, count_of_near_points - lag (count_of_near_points) over (partition by loc_id, order by fifteen_min_slot) as lagging_diff
from temp
)
select loc_id, fifteen_min_slot
from tempy
where lagging_diff = (select max lagging_diff from tempy)

Finding the percentage (%) range of average value in SQL

I am wanting to return the values that lie within 20% of the average value within the Duration column in my database.
I want to build on the code below but instead of returning Where Duration is less than the average value of duration I want to return all values which lay within 20% of the AVG(Duration) value.
Select * From table
Where Duration < (Select AVG(Duration) from table)
Here is one way...
Select * From table
Where Duration between (Select AVG(Duration)*0.8 from table)
and (Select AVG(Duration)*1.2 from table)
perhaps this to avoid repeated scans:
with cte as ( Select AVG(Duration) as AvgDuration from table )
Select * From table
Where Duration between (Select AvgDuration*0.8 from cte)
and (Select AvgDuration*1.2 from cte)
or
Select table.* From table
cross join ( Select AVG(Duration) as AvgDuration from table ) cj
Where Duration between cj.AvgDuration*0.8 and cj.AvgDuration*1.2
or using a window function:
Select d.*
from (
SELECT table.*
, AVG(Duration) OVER() as AvgDuration
From table
) d
Where d.Duration between d.AvgDuration*0.8 and d.AvgDuration*1.2
The last one might be the most efficient method.

Grouping consecutive dates in PostgreSQL

I have two tables which I need to combine as sometimes some dates are found in table A and not in table B and vice versa. My desired result is that for those overlaps on consecutive days be combined.
I'm using PostgreSQL.
Table A
id startdate enddate
--------------------------
101 12/28/2013 12/31/2013
Table B
id startdate enddate
--------------------------
101 12/15/2013 12/15/2013
101 12/16/2013 12/16/2013
101 12/28/2013 12/28/2013
101 12/29/2013 12/31/2013
Desired Result
id startdate enddate
-------------------------
101 12/15/2013 12/16/2013
101 12/28/2013 12/31/2013
Right. I have a query that I think works. It certainly works on the sample records you provided. It uses a recursive CTE.
First, you need to merge the two tables. Next, use a recursive CTE to get the sequences of overlapping dates. Finally, get the start and end dates, and join back to the "merged" table to get the id.
with recursive allrecords as -- this merges the input tables. Add a unique row identifier
(
select *, row_number() over (ORDER BY startdate) as rowid from
(select * from table1
UNION
select * from table2) a
),
path as ( -- the recursive CTE. This gets the sequences
select rowid as parent,rowid,startdate,enddate from allrecords a
union
select p.parent,b.rowid,b.startdate,b.enddate from allrecords b join path p on (p.enddate + interval '1 day')>=b.startdate and p.startdate <= b.startdate
)
SELECT id,g.startdate,g.enddate FROM -- outer query to get the id
-- inner query to get the start and end of each sequence
(select parent,min(startdate) as startdate, max(enddate) as enddate from
(
select *, row_number() OVER (partition by rowid order by parent,startdate) as row_number from path
) a
where row_number = 1 -- We only want the first occurrence of each record
group by parent)g
INNER JOIN allrecords a on a.rowid = parent
The below fragment does what you intend. (but it will probably be very slow) The problem is that detecteng (non)overlapping dateranges is impossible with standard range operators, since a range could be split into two parts.
So, my code does the following:
split the dateranges from table_A into atomic records, with one date per record
[the same for table_b]
cross join these two tables (we are only interested in A_not_in_B, and B_not_in_A) , remembering which of the L/R outer join wings it came from.
re-aggregate the resulting records into date ranges.
-- EXPLAIN ANALYZE
--
WITH RECURSIVE ranges AS (
-- Chop up the a-table into atomic date units
WITH ar AS (
SELECT generate_series(a.startdate,a.enddate , '1day'::interval)::date AS thedate
, 'A'::text AS which
, a.id
FROM a
)
-- Same for the b-table
, br AS (
SELECT generate_series(b.startdate,b.enddate, '1day'::interval)::date AS thedate
, 'B'::text AS which
, b.id
FROM b
)
-- combine the two sets, retaining a_not_in_b plus b_not_in_a
, moments AS (
SELECT COALESCE(ar.id,br.id) AS id
, COALESCE(ar.which, br.which) AS which
, COALESCE(ar.thedate, br.thedate) AS thedate
FROM ar
FULL JOIN br ON br.id = ar.id AND br.thedate = ar.thedate
WHERE ar.id IS NULL OR br.id IS NULL
)
-- use a recursive CTE to re-aggregate the atomic moments into ranges
SELECT m0.id, m0.which
, m0.thedate AS startdate
, m0.thedate AS enddate
FROM moments m0
WHERE NOT EXISTS ( SELECT * FROM moments nx WHERE nx.id = m0.id AND nx.which = m0.which
AND nx.thedate = m0.thedate -1
)
UNION ALL
SELECT rr.id, rr.which
, rr.startdate AS startdate
, m1.thedate AS enddate
FROM ranges rr
JOIN moments m1 ON m1.id = rr.id AND m1.which = rr.which AND m1.thedate = rr.enddate +1
)
SELECT * FROM ranges ra
WHERE NOT EXISTS (SELECT * FROM ranges nx
-- suppress partial subassemblies
WHERE nx.id = ra.id AND nx.which = ra.which
AND nx.startdate = ra.startdate
AND nx.enddate > ra.enddate
)
;

Fill table with two datetime columns with random dates

I have table T1 with two datetime columns (StartDate, EndDate) which I must populate with random dates under one circumstance:
EndDate value must be greater than StartDate in minimal one day.
Example:
StartDate EndDate
===========================
2001-04-04 2001-04-06 (2 days)
2001-01-05 2001-01-15 (10 days)
.
.
.
Can I do that in one statement?
P.S. My first idea was to change EndDate column to NULL, and in first step populate StartDate leaving EndDate as NULL, and in second statement to write some mechanism to update EndDate with dates greater (in different number of days for every record) then StartDate.
Here's a solution that populates the table in one step:
insert into T1 (StartDate, EndDate)
select
X.StartDate,
dateadd(day, abs(checksum(newid())) % 10, X.StartDate) EndDate
from (
select top 20
dateadd(day, -abs(checksum(newid())) % 100, convert(date, getDate())) StartDate
from sys.columns c1, sys.columns c2
) X
The query above uses some tricks that I personally often use in ad-hoc SQL queries:
new_Id() generates different random values for each row, as opposed to RAND(), which would be evaluated once per query. The expression abs(checksum(newid())) % N will generate random integer values in the 0 - N-1 range.
the TOP X ... FROM sys.columns c1, sys.columns c2 trick allows you to generate X rows whose values can be composed of scalar values, like the ones in our example.
Obviously, you can modify the hardcoded values in the above query to:
generate more rows
increase the range of random start dates
increase the maximum duration of each row.
INSERT T1 (StartDate, EndDate)
select T1, T1 + add_days
from
(select DATEADD(day, (ABS(CHECKSUM(NEWID())) % 65530), 0) T1,
ROW_NUMBER() OVER(ORDER BY number) add_days
from [ master ] .. spt_values) X;
sqlfiddle example
Something simple using rand() function:
Fiddle Example
declare #records int = 100, --Number of records needed
#count int = 0, #start int, #end int
while(#records>#count)
begin
select #start = rand() * 10, #end = rand() * 100, #count+=1
insert into mytable
select dateadd(day, #start, getdate()),dateadd(day, #end, getdate())
end
select * from mytable

Postgresql - get closest datetime row relative to given datetime value

I have a postgres table with a unique datetime field.
I would like to use/create a function that takes as argument a datetime value and returns the row id having the closest datetime relative (but not equal) to the passed datetime value. A second argument could specify before or after the passed value.
Ideally, some combination of native datetime functions could handle this requirement. Otherwise it'll have to be a custom function.
Question: What are methods for querying relative datetime over a collection of rows?
select id, passed_ts - ts_column difference
from t
where
passed_ts > ts_column and positive_interval
or
passed_ts < ts_column and not positive_interval
order by abs(extract(epoch from passed_ts - ts_column))
limit 1
passed_ts is the timestamp parameter and positive_interval is a boolean parameter. If true only rows where the timestamp column is lower then the passed timestamp. If false the inverse.
use simply -.
Assuming you have a table with attributes Key, Attr and T (timestamp with or without timezone):
you can search with
select min(T - TimeValue) from Table where (T - TimeValue) > 0;
this will give you the main difference. You can combine this value with a join to the same table to get the tuple you are interested in:
select * from (select *, T - TimeValue as diff from Table) as T1 NATURAL JOIN
( select min(T - TimeValue) as diff from Table where (T - TimeValue) > 0) as T2;
that should do it
--dmg
You want the first row of a select statement producing all the rows below (or above) the given datetime in descending (or ascending) order.
Pseudo code for the function body:
SELECT id
FROM table
WHERE IF(#above, datecol < #param, datecol > #param)
ORDER BY IF (#above. datecol ASC, datecol DESC)
LIMIT 1
However, this does not work: one cannot condition the ordering direction.
The second idea is to do both queries, and select afterwards:
SELECT *
FROM (
(
SELECT 'below' AS dir, id
FROM table
WHERE datecol < #param
ORDER BY datecol DESC
LIMIT 1
) UNION (
SELECT 'above' AS dir, id
FROM table
WHERE datecol > #param
ORDER BY datecol ASC
LIMIT 1)
) AS t
WHERE dir = #dir
That should be pretty fast with an index on the datetime column.
-- test rig
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE lutser
( dt timestamp NOT NULL PRIMARY KEY
);
-- populate it
INSERT INTO lutser(dt)
SELECT gs
FROM generate_series('2013-04-30', '2013-05-01', '1 min'::interval) gs
;
DELETE FROM lutser WHERE random() < 0.9;
--
-- The query:
WITH xyz AS (
SELECT dt AS hh
, LAG (dt) OVER (ORDER by dt ) AS ll
FROM lutser
)
SELECT *
FROM xyz bb
WHERE '2013-04-30 12:00' BETWEEN bb.ll AND bb.hh
;
Result:
NOTICE: drop cascades to table tmp.lutser
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "lutser_pkey" for table "lutser"
CREATE TABLE
INSERT 0 1441
DELETE 1288
hh | ll
---------------------+---------------------
2013-04-30 12:02:00 | 2013-04-30 11:50:00
(1 row)
Wrapping it into a function is left as an excercise for the reader
UPDATE: here is a second one with the sandwiched-not-exists-trick (TM):
SELECT lo.dt AS ll
FROM lutser lo
JOIN lutser hi ON hi.dt > lo.dt
AND NOT EXISTS (
SELECT * FROM lutser nx
WHERE nx.dt < hi.dt
AND nx.dt > lo.dt
)
WHERE '2013-04-30 12:00' BETWEEN lo.dt AND hi.dt
;
You have to join the table to itself with the where condition looking for the smallest nonzero (negative or positive) interval between the base table row's datetime and the joined table row's datetime. It would be good to have an index on that datetime column.
P.S. You could also look for the max() of the previous or the min() of the subsequent.
Try something like:
SELECT *
FROM your_table
WHERE (dt_time > argument_time and search_above = 'true')
OR (dt_time < argument_time and search_above = 'false')
ORDER BY CASE WHEN search_above = 'true'
THEN dt_time - argument_time
ELSE argument_time - dt_time
END
LIMIT 1;