Postgres : find the minimum of difference between timestamp rows - postgresql

I have users, they run and I have a timestamp record of each lap.
I'd like to sort users by fastest lap.
The idea is to use "lag" function of postgres, with a "GROUP BY user_id"
I tried something but I dont know "lag" well, moreover I can't get and window function (lap) work with an aggregation function (min)
http://sqlfiddle.com/#!15/ec9dc/6
Do you have an idea ?
Thanks

If you want to order the users by their lap durations you have to partition the rows by the column user_id. Otherwise the lap times of different users will be mixed.
I've taken your example code and modified it a bit: I've removed one time stamp, which was inserted twice and I introduced a view which contains the window functions rank() and lag(). The former is used to calculate the lap number per user and the latter to determine the preceeding time stamp for the current time stamp.
BEGIN;
CREATE TABLE lap
(
user_id text,
timestamp timestamp without time zone
);
INSERT INTO lap VALUES
('a', '2015-08-20 16:14:30.568'),
('a', '2015-08-20 16:16:13.06'),
('b', '2015-08-20 16:16:18.06'),
('b', '2015-08-20 16:16:25.63'),
('b', '2015-08-20 16:17:10.568'),
('a', '2015-08-20 16:17:25.63'),
--('a', '2015-08-20 16:17:34.087'), -- Timestamp was inserted twice.
('a', '2015-08-20 16:17:34.087');
CREATE OR REPLACE VIEW user_lap_duration AS
SELECT
user_id,
-1+rank() OVER w AS lap_nr,
lag(timestamp) OVER w AS timestamp_prev,
timestamp,
timestamp - lag(timestamp)
OVER w
AS lap_duration
FROM
lap
WINDOW w AS (PARTITION BY user_id ORDER BY timestamp);
SELECT
user_id,
lap_nr,
lap_duration
FROM
user_lap_duration
WHERE
lap_nr <> 0
ORDER BY
lap_duration;
ROLLBACK;
Running above code yields the following output.
user_id | lap_nr | lap_duration
---------+--------+--------------
b | 1 | 00:00:07.57
a | 3 | 00:00:08.457
b | 2 | 00:00:44.938
a | 2 | 00:01:12.57
a | 1 | 00:01:42.492
(5 rows)

Related

How can I rank a table in postgresql and then find the rank of a specific row?

I have a postgresql table
cubing=# SELECT * FROM times;
count | name | time
-------+---------+--------
4 | sean | 32.97
5 | Austin | 15.64
6 | Kirk | 117.02
I retrieve all from it with SELECT * FROM times ORDER BY time ASC. But now I want to give the user the option to search for a specific value (say, WHERE name = Austin) and have it tell them what rank they are in the table. Right now, I have SELECT name,time, RANK () OVER ( ORDER BY time ASC) rank_number FROM times. From how I understand it, that is giving me the rank of the entire table. I would like the rank, name, and time of who I am searching for. I am afraid if I added a where clause to my last SELECT statement with the name Austin, it would only find where the name equals Austin and rank those, rather than the rank of Austin in the rest of the table.
thanks for reading
I think the behavior you want here is to first rank your current data, then query it with some WHERE filter:
WITH cte AS (
SELECT *, RANK() OVER (ORDER BY time) rank_number
FROM times
)
SELECT count, name, time
FROM cte
WHERE name = 'Austin';
The point here is that at the time we do a query searching for Austin, the ranks for each row in your original table have already been generated.
Edit:
If you're running this query from an application, it would probably be best to avoid CTE syntax. Instead, just inline the CTE as a subquery:
SELECT count, name, time, rank_number
FROM
(
SELECT *, RANK() OVER (ORDER BY time) rank_number
FROM times
) t
WHERE name = 'Austin';

Multiple UPDATE ... FROM same row is not working

I'm trying to do multiple update, but it works only for the first row.
I have table "users" with 2 records:
create table users
(
uid serial not null
constraint users_pkey
primary key,
balance numeric default 0 not null
);
INSERT INTO public.users (uid, balance) VALUES (2, 100);
INSERT INTO public.users (uid, balance) VALUES (1, 100);
I try to UPDATE user "1" twice with the query, but it update only one time:
balance for user "1" become "105", not "115"
update users as u
set balance = balance + c.bal
from (values (1, 5),
(1, 10)
) as c(uid, bal)
where c.uid = u.uid;
Why it not updated for all rows from subquery?
The postgresql documentation gives no reason for this behaviour but does specify it.
Relevant quote
When a FROM clause is present, what essentially happens is that the
target table is joined to the tables mentioned in the from_list, and
each output row of the join represents an update operation for the
target table. When using FROM you should ensure that the join produces
at most one output row for each row to be modified. In other words, a
target row shouldn't join to more than one row from the other
table(s). If it does, then only one of the join rows will be used to
update the target row, but which one will be used is not readily
predictable.
Use a SELECT with a GROUP BY to combine the rows before performing the update.
You need to aggregate in the inner query before joining:
update users as u
set balance = balance + d.bal
from (
select uid, sum(bal) bal
from ( values (1, 5), (1, 10) ) as c(uid, bal)
group by uid
) d
where d.uid = u.uid;
Demo on DB Fiddle:
| uid | balance |
| --- | ------- |
| 2 | 100 |
| 1 | 115 |

Postgres 9.3 count rows matching a column relative to row's timestamp

I've used WINDOW functions before but only when working with data that has a fixed cadence/interval. I am likely missing something simple in aggregation but I've never had a scenario where I'm not working with fixed intervals.
I have a table the records samples at arbitrary timestamps. A sample is only recorded when it is a delta from the previous sample and the sample rate is completely irregular due to a large number of conditions. The table is very simple:
id (int)
happened_at (timestamp)
sensor_id (int)
new_value (float)
I'm trying to construct a query that will include a count of all of the samples before the happened_at of a given result row. So given an ultra simple 2 row sample data set:
id|happened_at |sensor_id| new_value
1 |2019-06-07:21:41|134679 | 123.331
2 |2019-06-07:19:00|134679 | 100.009
I'd like the result set to look like this:
happened_at |sensor_id | new_value | sample_count
2019-06-07:21:41|134679 |123.331 |2
2019-06-07:19:00|134679 |123.331 |1
I've tried:
SELECT *,
(SELECT count(sample_history.id) OVER (PARTITION BY score_history.sensor_id
ORDER BY sample_history.happened_at DESC))
FROM sensor_history
ORDER by happened_at DESC
and the duh not going to work.
(SELECT count(*)
FROM sample_history
WHERE sample_history.happened_at <= sample_timestamp)
Insights greatly appreciated.
Get rid of the SELECT (sub-query) when using the window function.
SELECT *,
count(*) OVER (PARTITION BY sensor_id ORDER BY happened_at DESC)
FROM sensor_history
ORDER BY happened_at DESC

How to rewrite SQL joins into window functions?

Database is HP Vertica 7 or PostgreSQL 9.
create table test (
id int,
card_id int,
tran_dt date,
amount int
);
insert into test values (1, 1, '2017-07-06', 10);
insert into test values (2, 1, '2017-06-01', 20);
insert into test values (3, 1, '2017-05-01', 30);
insert into test values (4, 1, '2017-04-01', 40);
insert into test values (5, 2, '2017-07-04', 10);
Of the payment cards used in the last 1 day, what is the maximum amount charged on that card in the last 90 days.
select t.card_id, max(t2.amount) max
from test t
join test t2 on t2.card_id=t.card_id and t2.tran_dt>='2017-04-06'
where t.tran_dt>='2017-07-06'
group by t.card_id
order by t.card_id;
Results are correct
card_id max
------- ---
1 30
I want to rewrite the query into sql window functions.
select card_id, max(amount) over(partition by card_id order by tran_dt range between '60 days' preceding and current row) max
from test
where card_id in (select card_id from test where tran_dt>='2017-07-06')
order by card_id;
But result set does not match, how can this be done?
Test data here:
http://sqlfiddle.com/#!17/db317/1
I can't try PostgreSQL, but in Vertica, you can apply the ANSI standard OLAP window function.
But you'll need to nest two queries: The window function only returns sensible results if it has all rows that need to be evaluated in the result set.
But you only want the row from '2017-07-06' to be displayed.
So you'll have to filter for that date in an outer query:
WITH olap_output AS (
SELECT
card_id
, tran_dt
, MAX(amount) OVER (
PARTITION BY card_id
ORDER BY tran_dt
RANGE BETWEEN '90 DAYS' PRECEDING AND CURRENT ROW
) AS the_max
FROM test
)
SELECT
card_id
, the_max
FROM olap_output
WHERE tran_dt='2017-07-06'
;
card_id|the_max
1| 30
As far as I know, PostgreSQL Window function doesn't support bounded range preceding thus range between '90 days' preceding won't work. It does support bounded rows preceding such as rows between 90 preceding, but then you would need to assemble a time-series query similar to the following for the Window function to operate on the time-based rows:
SELECT c.card_id, t.amount, g.d as d_series
FROM generate_series(
'2017-04-06'::timestamp, '2017-07-06'::timestamp, '1 day'::interval
) g(d)
CROSS JOIN ( SELECT distinct card_id from test ) c
LEFT JOIN test t ON t.card_id = c.card_id and t.tran_dt = g.d
ORDER BY c.card_id, d_series
For what you need (based on your question description), I would stick to using group by.

Make postgresql timestamps unique

I have a dataset with 6M+ rows including timestamps from about 2003 to current. In 2014 the database was migrated to postgresql and the timestamp column became unique due to the higher precision of timestamps. The original ID column was not migrated. About 300k of the timestamps are repeated at least once. I want to modify the timestamp column so that they are unique by adding precision (all non-unique timestamps only go to the second).
I have this
ts message
--------------------|---------------
2014-02-01 07:40:37 | message1
2014-02-01 07:40:37 | message2
I want this
ts message
-------------------------|---------------
2014-02-01 07:40:37.0000 | message1
2014-02-01 07:40:37.0001 | message2
This should work, but it will be horribly slow I guess:
update the_table
set ts = ts + '1 millisecond'::interval * x.rn
from (
select ctid, row_number() over (order by ts) as rn
from the_table
) x
where the_table.ctid = x.ctid;
The column ctid is an internal unique identifier (actually the physical address of the row) maintained by Postgres.
You might want to add another where condition to pick out only those rows that need to be modified.
One simple solution is to try to add a random interval to the timestamp:
update t
set ts = ts + random() * interval '1000000 microsecond'
where ts = date_trunc('second', ts)
The chance of a collision is very low. If it occurs use #a_horse's answer