Closest datetime for PostgreSQL 9.5 - postgresql

I'm using PostgreSQL 9.5
and I have a table like this:
CREATE TABLE tracks (
track bigserial NOT NULL,
time_track timestamp,
CONSTRAINT pk_aircraft_tracks PRIMARY KEY ( track )
);
I want to obtain track for the closest value of datetime by SELECT operator.
e.g, if I have:
track datatime
1 | 2016-12-01 21:02:47
2 | 2016-11-01 21:02:47
3 |2016-12-01 22:02:47
For input datatime 2016-12-01 21:00, the track is 2.
I foud out Is there a postgres CLOSEST operator? similar queston for integer.
But it is not working with datatime or PostgreSQL 9.5 :
SELECT * FROM
(
(SELECT time_track, track FROM tracks WHERE time_track >= now() ORDER BY time_track LIMIT 1) AS above
UNION ALL
(SELECT time_track, track FROM tracks WHERE time_track < now() ORDER BY time_track DESC LIMIT 1) AS below
)
ORDER BY abs(?-time_track) LIMIT 1;
The error:
ERROR: syntax error at or near "UNION"
LINE 4: UNION ALL

Track 1 is the closest to '2016-12-01 21:00':
with tracks(track, datatime) as (
values
(1, '2016-12-01 21:02:47'::timestamp),
(2, '2016-11-01 21:02:47'),
(3, '2016-12-01 22:02:47')
)
select *
from tracks
order by
case when datatime > '2016-12-01 21:00' then datatime - '2016-12-01 21:00'
else '2016-12-01 21:00' - datatime end
limit 1;
track | datatime
-------+---------------------
1 | 2016-12-01 21:02:47
(1 row)

Related

Gaps and Islands - get a list of dates unemployed over a date range with Postgresl

I have a table called Position, in this table, I have the following, dates are inclusive (yyyy-mm-dd), below is a simplified view of the employment dates
id, person_id, start_date, end_date , title
1 , 1 , 2001-12-01, 2002-01-31, 'admin'
2 , 1 , 2002-02-11, 2002-03-31, 'admin'
3 , 1 , 2002-02-15, 2002-05-31, 'sales'
4 , 1 , 2002-06-15, 2002-12-31, 'ops'
I'd like to be able to calculate the gaps in employment, assuming some of the dates overlap to produce the following output for the person with id=1
person_id, start_date, end_date , last_position_id, gap_in_days
1 , 2002-02-01, 2002-02-10, 1 , 10
1 , 2002-06-01, 2002-06-14, 3 , 14
I have looked at numerous solutions, UNIONS, Materialized views, tables with generated calendar date ranges, etc. I really am not sure what is the best way to do this. Is there a single query where I can get this done?
step-by-step demo:db<>fiddle
You just need the lead() window function. With this you are able to get a value (start_date in this case) to the current row.
SELECT
person_id,
end_date + 1 AS start_date,
lead - 1 AS end_date,
id AS last_position_id,
lead - (end_date + 1) AS gap_in_days
FROM (
SELECT
*,
lead(start_date) OVER (PARTITION BY person_id ORDER BY start_date)
FROM
positions
) s
WHERE lead - (end_date + 1) > 0
After getting the next start_date you are able to compare it with the current end_date. If they differ, you have a gap. These positive values can be filtered within the WHERE clause.
(if 2 positions overlap, the diff is negative. So it can be ignored.)
first you need to find what dates overlaps Determine Whether Two Date Ranges Overlap
then merge those ranges as a single one and keep the last id
finally calculate the ranges of days between one end_date and the next start_date - 1
SQL DEMO
with find_overlap as (
SELECT t1."id" as t1_id, t1."person_id", t1."start_date", t1."end_date",
t2."id" as t2_id, t2."start_date" as t2_start_date, t2."end_date" as t2_end_date
FROM Table1 t1
LEFT JOIN Table1 t2
ON t1."person_id" = t2."person_id"
AND t1."start_date" <= t2."end_date"
AND t1."end_date" >= t2."start_date"
AND t1.id < t2.id
), merge_overlap as (
SELECT
person_id,
start_date,
COALESCE(t2_end_date, end_date) as end_date,
COALESCE(t2_id, t1_id) as last_position_id
FROM find_overlap
WHERE t1_id NOT IN (SELECT t2_id FROM find_overlap WHERE t2_ID IS NOT NULL)
), cte as (
SELECT *,
LEAD(start_date) OVER (partition by person_id order by start_date) next_start
FROM merge_overlap
)
SELECT *,
DATE_PART('day',
(next_start::timestamp - INTERVAL '1 DAY') - end_date::timestamp
) as days
FROM cte
WHERE next_start IS NOT NULL
OUTPUT
| person_id | start_date | end_date | last_position_id | next_start | days |
|-----------|------------|------------|------------------|------------|------|
| 1 | 2001-12-01 | 2002-01-31 | 1 | 2002-02-11 | 10 |
| 1 | 2002-02-11 | 2002-05-31 | 3 | 2002-06-15 | 14 |

PostgreSQL: how do I group rows by 'nearby' timestamps

Considering the following simplified situation:
create table trans
(
id integer not null
, tm timestamp without time zone not null
, val integer not null
, cus_id integer not null
);
insert into trans
(id, tm, val, cus_id)
values
(1, '2017-12-12 16:42:00', 2, 500) --
,(2, '2017-12-12 16:42:02', 4, 501) -- <--+---------+
,(3, '2017-12-12 16:42:05', 7, 502) -- |dt=54s |
,(4, '2017-12-12 16:42:56', 3, 501) -- <--+ |dt=59s
,(5, '2017-12-12 16:43:00', 2, 503) -- |
,(6, '2017-12-12 16:43:01', 5, 501) -- <------------+
,(7, '2017-12-12 16:43:15', 6, 502) --
,(8, '2017-12-12 16:44:50', 4, 501) --
;
I want to group rows by cus_id, but also where the interval between time stamps of consecutive rows for the same cus_id is less than 1 minute.
In the example above this applies to rows with id's 2, 4 and 6. These rows have the same cus_id (501) and have intervals below 1 minute. The interval id{2,4} is 54s and for id{2,6} it is 59s. The interval id{4,6} is also below 1 minute, but it is overridden by the larger interval id{2,6}.
I need a query that gives me the output:
cus_id | tm | val
--------+---------------------+-----
501 | 2017-12-12 16:42:02 | 12
(1 row)
The tm value would be the tm of the first row, i.e. with the lowest tm. The val would be the sum(val) of the grouped rows.
In the example 3 rows are grouped, but that could also be 2, 4, 5, ...
For simplicity, I only let the rows for cus_id 501 have nearby time stamps, but in my real table, there would be a lot more of them. It contains 20M+ rows.
Is this possible?
Naive (subobtimal) solution using a CTE
(a faster approach would avoid the CTE, replacing it by a joined subquery or maybe even use a window function) :
-- Step one: find the start of a cluster
-- (the start is everything after a 60 second silence)
WITH starters AS (
SELECT * FROM trans tr
WHERE NOT EXISTS (
SELECT * FROM trans nx
WHERE nx.cus_id = tr.cus_id
AND nx.tm < tr.tm
AND nx.tm >= tr.tm -'60sec'::interval
)
)
-- SELECT * FROM starters ; \q
-- Step two: join everything within 60sec to the starter
-- and aggregate the clusters
SELECT st.cus_id
, st.id AS id
, MAX(tr.id) AS max_id
, MIN(tr.tm) AS first_tm
, MAX(tr.tm) AS last_tm
, SUM(tr.val) AS val
FROM trans tr
JOIN starters st ON st.cus_id = tr.cus_id
AND st.tm <= tr.tm AND st.tm > tr.tm -'60sec'::interval
GROUP BY 1,2
ORDER BY 1,2
;

Difference between the max date and the penultimate max for specific employee - postgresql

Bit stuck on a problem. Trying to find the difference between two dates in postgreSQL.
I have a table emp with many employees in it:
emp_id, date
1, 31-10-2017
1, 08-08-2017
1, 02-06-2017
I want it to look like this:
emp_id, max_date, penultimate_date, difference
1, 31-10-2017, 08-08-2017, 84 days
Obviously you can use max(date) and group by the emp_id, however how do you retrieve the penultimate date. I have used a few functions like:
order by date desc limit 1 offset 1
I have also tried to put these in sub queries but that hasn,t worked as there are many employee numbers and I need one row for each employee.
Can anyone help???
Thanks,
pp84
as kindly suggested by #Haleemur Ali, order by date desc limit 1 offset 1 would not work with several emp_id:
t=# with d(emp_id, date)as (values(1, '31-10-2017'::date),(1, '08-08-2017'),(1, '02-06-2017' ),(2,'2016-01-01'),(2,'2016-02-02'),(2,'2016-03-03'))
select distinct emp_id
, max(date) over (partition by emp_id) max_date
, nth_value(date,2) over (partition by emp_id) penultimate_date
, max(date) over (partition by emp_id) - nth_value(date,2) over (partition by emp_id) diff
from d
;
emp_id | max_date | penultimate_date | diff
--------+------------+------------------+------
2 | 2016-03-03 | 2016-02-02 | 30
1 | 2017-10-31 | 2017-08-08 | 84
(2 rows)
Time: 0.756 ms
WITH emps (emp_id, date) AS (
VALUES (1, '2017-10-31'::DATE)
, (1, '2017-08-08'::DATE)
, (1, '2017-08-08'::DATE)
)
SELECT DISTINCT ON (emp_id)
emp_id
, "date" max_date
, LEAD("date") OVER w penultimate_date
, "date" - LEAD("date") OVER w difference
FROM emps
WINDOW w AS (PARTITION BY emp_id)
ORDER BY emp_id, date DESC
When ordered in descending order, the LEAD("date") w will give the value of the date value from the next row.
The DISTINCT ON limits the resultset to 1 row (the first row encountered) per emp_id.
With our ordering this first row must contain the greatest date, and the LEAD(...) over w therefore returns the penultimate date. This gives us the following result:
emp_id | max_date | penultimate_date | difference
--------+------------+------------------+------------
1 | 2017-10-31 | 2017-08-08 | 84
(1 row)

Get Data From Postgres Table At every nth interval

Below is my table and i am inserting data from my windows .Net application at every 1 Second Interval. i want to write query to fetch data from the table at every nth interval for example at every 5 second.Below is the query i am using but not getting result as required. Please Help me
CREATE TABLE table_1
(
timestamp_col timestamp without time zone,
value_1 bigint,
value_2 bigint
)
This is my query which i am using
select timestamp_col,value_1,value_2
from (
select timestamp_col,value_1,value_2,
INTERVAL '5 Seconds' * (row_number() OVER(ORDER BY timestamp_col) - 1 )
+ timestamp_col as r
from table_1
) as dt
Where r = 1
Use date_part() function with modulo operator:
select timestamp_col, value_1, value_2
from table_1
where date_part('second', timestamp_col)::int % 5 = 0

Update Redshift table from query

I'm trying to update a table in Redshift from query:
update mr_usage_au au
inner join(select mr.UserId,
date(mr.ActionDate) as ActionDate,
count(case when mr.EventId in (32) then mr.UserId end) as Moods,
count(case when mr.EventId in (33) then mr.UserId end) as Activities,
sum(case when mr.EventId in (10) then mr.Duration end) as Duration
from mr_session_log mr
where mr.EventTime >= current_date - interval '1 days' and mr.EventTime < current_date
Group By mr.UserId,
date(mr.ActionDate)) slog on slog.UserId=au.UserId
and slog.ActionDate=au.Date
set au.Moods = slog.Moods,
au.Activities=slog.Activities,
au.Durarion=slog.Duration
But I receive the following error:
ERROR: syntax error at or near "au".
This is completely invalid syntax for Redshift (or Postgres). Reminds me of SQL Server ...
Should work like this (at least on current Postgres):
UPDATE mr_usage_au
SET Moods = slog.Moods
, Activities = slog.Activities
, Durarion = slog.Duration
FROM (
select UserId
, ActionDate::date
, count(CASE WHEN EventId = 32 THEN UserId END) AS Moods
, count(CASE WHEN EventId = 33 THEN UserId END) AS Activities
, sum(CASE WHEN EventId = 10 THEN Duration END) AS Duration
FROM mr_session_log
WHERE EventTime >= current_date - 1 -- just subtract integer from a date
AND EventTime < current_date
GROUP BY UserId, ActionDate::date
) slog
WHERE slog.UserId = mr_usage_au.UserId
AND slog.ActionDate = mr_usage_au.Date;
This is generally the case for Postgres and Redshift:
Use a FROM clause to join in additional tables.
You cannot table-qualify target columns in the SET clause.
Also, Redshift was forked from PostgreSQL 8.0.2, which is very long ago. Only some later updates to Postgres were applied.
For instance, Postgres 8.0 did not allow a table alias in an UPDATE statement, yet - which is the reason behind the error you see.
I simplified some other details.