Postgresql - get closest datetime row relative to given datetime value - postgresql

I have a postgres table with a unique datetime field.
I would like to use/create a function that takes as argument a datetime value and returns the row id having the closest datetime relative (but not equal) to the passed datetime value. A second argument could specify before or after the passed value.
Ideally, some combination of native datetime functions could handle this requirement. Otherwise it'll have to be a custom function.
Question: What are methods for querying relative datetime over a collection of rows?

select id, passed_ts - ts_column difference
from t
where
passed_ts > ts_column and positive_interval
or
passed_ts < ts_column and not positive_interval
order by abs(extract(epoch from passed_ts - ts_column))
limit 1
passed_ts is the timestamp parameter and positive_interval is a boolean parameter. If true only rows where the timestamp column is lower then the passed timestamp. If false the inverse.

use simply -.
Assuming you have a table with attributes Key, Attr and T (timestamp with or without timezone):
you can search with
select min(T - TimeValue) from Table where (T - TimeValue) > 0;
this will give you the main difference. You can combine this value with a join to the same table to get the tuple you are interested in:
select * from (select *, T - TimeValue as diff from Table) as T1 NATURAL JOIN
( select min(T - TimeValue) as diff from Table where (T - TimeValue) > 0) as T2;
that should do it
--dmg

You want the first row of a select statement producing all the rows below (or above) the given datetime in descending (or ascending) order.
Pseudo code for the function body:
SELECT id
FROM table
WHERE IF(#above, datecol < #param, datecol > #param)
ORDER BY IF (#above. datecol ASC, datecol DESC)
LIMIT 1
However, this does not work: one cannot condition the ordering direction.
The second idea is to do both queries, and select afterwards:
SELECT *
FROM (
(
SELECT 'below' AS dir, id
FROM table
WHERE datecol < #param
ORDER BY datecol DESC
LIMIT 1
) UNION (
SELECT 'above' AS dir, id
FROM table
WHERE datecol > #param
ORDER BY datecol ASC
LIMIT 1)
) AS t
WHERE dir = #dir
That should be pretty fast with an index on the datetime column.

-- test rig
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE lutser
( dt timestamp NOT NULL PRIMARY KEY
);
-- populate it
INSERT INTO lutser(dt)
SELECT gs
FROM generate_series('2013-04-30', '2013-05-01', '1 min'::interval) gs
;
DELETE FROM lutser WHERE random() < 0.9;
--
-- The query:
WITH xyz AS (
SELECT dt AS hh
, LAG (dt) OVER (ORDER by dt ) AS ll
FROM lutser
)
SELECT *
FROM xyz bb
WHERE '2013-04-30 12:00' BETWEEN bb.ll AND bb.hh
;
Result:
NOTICE: drop cascades to table tmp.lutser
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "lutser_pkey" for table "lutser"
CREATE TABLE
INSERT 0 1441
DELETE 1288
hh | ll
---------------------+---------------------
2013-04-30 12:02:00 | 2013-04-30 11:50:00
(1 row)
Wrapping it into a function is left as an excercise for the reader
UPDATE: here is a second one with the sandwiched-not-exists-trick (TM):
SELECT lo.dt AS ll
FROM lutser lo
JOIN lutser hi ON hi.dt > lo.dt
AND NOT EXISTS (
SELECT * FROM lutser nx
WHERE nx.dt < hi.dt
AND nx.dt > lo.dt
)
WHERE '2013-04-30 12:00' BETWEEN lo.dt AND hi.dt
;

You have to join the table to itself with the where condition looking for the smallest nonzero (negative or positive) interval between the base table row's datetime and the joined table row's datetime. It would be good to have an index on that datetime column.
P.S. You could also look for the max() of the previous or the min() of the subsequent.

Try something like:
SELECT *
FROM your_table
WHERE (dt_time > argument_time and search_above = 'true')
OR (dt_time < argument_time and search_above = 'false')
ORDER BY CASE WHEN search_above = 'true'
THEN dt_time - argument_time
ELSE argument_time - dt_time
END
LIMIT 1;

Related

Express Nearest Neighbor Join in Postgresql?

I have two tables Q and T, both containing a column of float numbers.
What I want to do is, for each number in Q, I want to find a number in T that has the smallest distance to it.
For example, for T={1,7,9} and Q={2,6,10}, I want to return Q,T pairs as {(2,1),(6,7),(10,9)}.
How should I express this query with SQL?
In addition, is that possible to accelerate this join by index, e.g. add an operator class which bind "FOR ORDER BY <->" with fabs calculation?
create table t (val_t integer);
create table q (val_q integer);
insert into t values (1),(7),(9);
insert into q values (2),(6),(10);
Start with a query that cross joins the two tables and adds a rank based on the difference:
SELECT val_q, val_t, rank() OVER (PARTITION BY val_q ORDER BY abs(val_t - val_q))
FROM t
JOIN q ON true ;
Use this query in a cte or subquery and filter by rank:
WITH src AS(
SELECT val_q, val_t, rank() OVER (PARTITION BY val_q ORDER BY abs(val_t - val_q))
FROM t
JOIN q ON true )
SELECT val_q, val_t FROM src
WHERE rank = 1;
val_q | val_t
-------+-------
2 | 1
6 | 7
10 | 9
See https://www.postgresql.org/docs/12/tutorial-window.html
Given this schema:
create table t (tn float);
insert into t values (1), (7), (9);
create table q (qn float);
insert into q values (2), (6), (10);
DISTINCT ON is the most straightforward way:
select distinct on (qn) qn, tn
from q
cross join t
order by qn, abs(qn - tn);
Exploiting a numeric range may perform better depending on your data sizes. If performance is an issue, then you can create an actual temp table for the range_tn CTE and put a gist index on it:
with all_tn as (
select tn
from t
union select null
), range_tn as (
select numrange(tn::numeric, (lead(tn) over w)::numeric, '[]') as tr
from all_tn
window w as (order by tn nulls first)
)
select qn,
case
when lower_inf(tr) then upper(tr)
when upper_inf(tr) then lower(tr)
when 2 * qn - lower(tr) - upper(tr) > 0 then upper(tr)
else lower(tr)
end as tn
from q
join range_tn
on qn::numeric <# tr;
Fiddle here

Finding the percentage (%) range of average value in SQL

I am wanting to return the values that lie within 20% of the average value within the Duration column in my database.
I want to build on the code below but instead of returning Where Duration is less than the average value of duration I want to return all values which lay within 20% of the AVG(Duration) value.
Select * From table
Where Duration < (Select AVG(Duration) from table)
Here is one way...
Select * From table
Where Duration between (Select AVG(Duration)*0.8 from table)
and (Select AVG(Duration)*1.2 from table)
perhaps this to avoid repeated scans:
with cte as ( Select AVG(Duration) as AvgDuration from table )
Select * From table
Where Duration between (Select AvgDuration*0.8 from cte)
and (Select AvgDuration*1.2 from cte)
or
Select table.* From table
cross join ( Select AVG(Duration) as AvgDuration from table ) cj
Where Duration between cj.AvgDuration*0.8 and cj.AvgDuration*1.2
or using a window function:
Select d.*
from (
SELECT table.*
, AVG(Duration) OVER() as AvgDuration
From table
) d
Where d.Duration between d.AvgDuration*0.8 and d.AvgDuration*1.2
The last one might be the most efficient method.

Postgresql create function index on dynamic values

I have such a query
select r.timestamp, r,value
from result_table r
where timestamp > ( NOW() - INTERVAL '120 hour' )
and r.id%10=1`
where id is the autoincremental primary key.
Instead, 120 and 10 can by any other number (decided by the user depending on his needs). Basically, the user wants data for some time interval with some decimation.
Obviously, it works too slow on a big amount of data. What should be the index(s) here?
PostgreSQL supports SQL expression or function indexes
where
timestamp > ( NOW() - INTERVAL '120 hour' )
and r.id % 10 = 1
Needs the index (timestamp, (id % 10)) to get more performance.
Query
CREATE INDEX
timestamp__idmod10
ON
result_table
(timestamp, (id % 10))
see demos
with index http://sqlfiddle.com/#!17/8e63b/6
without index http://sqlfiddle.com/#!17/9be99/3
Editted because of comment
Thanks, Raymond, However, (id % 10) is not that good since instead of
10 can be any other number. 9, 11, 100, 1, etc
Other approach use generate_series() and a delivered table to generate a id list matching % number = 1.
And use that resultset with a IN clause.
p.s this statement assumes a id column with SERIAL and a table equal or less then 1 million records. Also keep in mind that the generate_series() function takes some time.
SQL statement
SELECT
numbers.number FROM (
SELECT
generate_series(1, 1000000) as number
) AS numbers
WHERE
numbers.number % number = 1
Then you can use the index
CREATE INDEX timestamp_id ON result_table(timestamp, id);
And the query
SELECT
*
FROM
result_table
WHERE
timestamp > ( NOW() - INTERVAL '120 hour' )
AND
id IN (
SELECT
numbers.number FROM (
SELECT
generate_series(1, 1000000) as number
) AS numbers
WHERE
numbers.number % 10 = 1
)
see demo http://sqlfiddle.com/#!17/5013c0/6 with example data.

db2 - How to get the min date and the next from the same table

I have a table with date attribute and i need to do a query that gets the MIN date and the next of the MIN date
And I tried that :
select min(SC.TIMESTAMP) as minDate, result.TIMESTAMP
from Event SC
INNER JOIN
(SELECT TIMESTAMP from Event
HAVING TIMESTAMP > min(SC.TIMESTAMP)
) as result on result.BUSINESSID1 = SC.BUSINESSID1
where SC.BUSINESSSTEP = 'CONTAINER_PLACING_EVENT'
and SC.LOCATIONCODE = '1';
Could you please advice how to do that ?
Thanks in Advance
Perhaps you can rearrange your query into this form:
select
min(TS), min(TS2)
from
event,
(select TS as TS2 from event where TS > (select min(TS) from event))
Add extra criteria as desired. I would try to rewrite yours, but it isn't entirely clear what the criteria for the count are supposed to be. If you are expecting more than one row (for example, the min and min2 of each LOCATIONCODE) then you will probably want a GROUP BY in there.
Also, I wouldn't call a column TIMESTAMP as it is a reserved word.
You can use the ROW_NUMBER() OLAP Function:
SELECT *
FROM (
SELECT
TIMESTAMP
,ROW_NUMBER() OVER (
PARTITION BY BUSINESSSTEP, LOCATIONCODE
ORDER BY TIMESTAMP ASC
) AS RN
FROM EVENT
WHERE BUSINESSSTEP = 'CONTAINER_PLACING_EVENT'
AND LOCATIONCODE = '1'
) A
WHERE RN < 3
This will return as rows instead of columns, but it should get you what you want. If you think your original query would have returned multiple rows (for multiple entities), you can change the PARTITION BY clause to include the column that makes them distinct.

query for a range of records in result

I am wondering if there is some easy way, a function, or other method to return data from a query with the following results.
I have a SQL Express DB 2008 R2, a table that contains numerical data in a given column, say col T.
I am given a value X in code and would like to return up to three records. The record where col T equals my value X, and the record before and after, and nothing else. The sort is done on col T. The record before may be beginning of file and therefore not exist, likewise, if X equals the last record then the record after would be non existent, end of file/table.
The value of X may not exist in the table.
This I think is similar to get a range of results in numerical order.
Any help or direction in solving this would be greatly appreciated.
Thanks again,
It might not be the most optimal solution, but:
SELECT T
FROM theTable
WHERE T = X
UNION ALL
SELECT *
FROM
(
SELECT TOP 1 T
FROM theTable
WHERE T > X
ORDER BY T
) blah
UNION ALL
SELECT *
FROM
(
SELECT TOP 1 T
FROM theTable
WHERE T < X
ORDER BY T DESC
) blah2
DECLARE #x int = 100
;WITH t as
(
select ROW_NUMBER() OVER (ORDER BY T ASC) AS row_nm,*
from YourTable
)
, t1 as
(
select *
from t
WHERE T = #x
)
select *
from t
CROSS APPLY t1
WHERE t.row_nm BETWEEN t1.row_nm -1 and t1.row_nm + 1