Compare Consecutive Records and Get Earliest Start Date

Compare Consecutive Records and Get Earliest Start Date - tsql

A Customer can be in multiple positions over the lifetime and can only have one active position (marked by start date and end date). A position is part of a cost centre.
If a customer over the lifetime had 18 positions, i have to check if any of those positions in a consecutive order, were part of the same cost centre. if they were, then i have use the start date from the earliest position (same cost centre). i have written something like this:

By using 2 row_number() calculations on slightly different partitions it is possible to get a calculation(rn) that allows us to group for each consecutive set of positions in the same cost centre. You already have one such row_number when you setup the temp table. I included rn1 & rn2 so you could investigate how it works.
SQL Fiddle
MS SQL Server 2014 Schema Setup:
CREATE TABLE TempTbl
([ConsecutivePositions] int, [CustomerID] int, [PositionID] int, [CustomerPositionId] int, [StartDate] datetime, [EndDate] varchar(23), [CostCentreID] int)
;
INSERT INTO TempTbl
([ConsecutivePositions], [CustomerID], [PositionID], [CustomerPositionId], [StartDate], [EndDate], [CostCentreID])
VALUES
(1, 2734, 195, 31860, '2013-10-17 16:08:53', '2015-03-06 11:51:09.440', 5),
(2, 2734, 29, 39405, '2015-03-06 11:51:09', '2016-01-27 13:10:19.720', 3),
(3, 2734, 271, 23599, '2012-04-05 16:21:41', '2012-12-04 11:32:47.433', 13),
(4, 2734, 107, 26479, '2012-12-04 11:32:47', '2013-03-19 09:07:13.633', 14),
(5, 2734, 297, 28497, '2013-03-19 09:07:13', '2013-10-17 16:08:53.120', 14),
(6, 2734, 154, 2723, '2007-11-27 00:00:00', '2009-07-10 15:44:16.640', 3),
(7, 2734, 145, 19436, '2011-03-15 00:00:00', '2011-10-18 15:42:36.877', 906),
(8, 2734, 146, 17453, '2010-09-12 00:00:00', '2010-11-11 15:58:25.043', 13),
(9, 2734, 8, 18180, '2010-11-11 00:00:00', '2011-03-15 17:57:48.027', 13),
(10, 2734, 8, 21606, '2011-10-18 15:42:36', '2011-11-11 16:42:54.787', 13),
(11, 2734, 8, 21982, '2011-11-14 11:18:24', '2012-04-05 16:21:41.230', 13),
(12, 2734, 264, 21958, '2011-11-11 16:42:54', '2011-11-14 11:18:24.057', 906),
(13, 2734, 5, 12785, '2009-07-10 00:00:00', '2009-07-29 09:30:52.430', 3),
(14, 2734, 5, 12999, '2009-07-29 00:00:00', '2010-03-04 13:00:30.223', 3),
(15, 2734, 149, 15165, '2010-03-04 00:00:00', '2010-08-16 12:13:30.703', 3),
(16, 2734, 8, 17044, '2010-08-16 00:00:00', '2010-09-12 16:29:01.203', 13),
(17, 2734, 891, 45453, '2016-01-27 13:10:19', NULL, 906)
;
Query 1:
with cte as (
select
*
, row_number() over(partition by CustomerID order by StartDate) rn1
, row_number() over(partition by CustomerID, CostCentreID order by StartDate) rn2
, row_number() over(partition by CustomerID order by StartDate)
- row_number() over(partition by CustomerID, CostCentreID order by StartDate) rn3
from temptbl
)
select
CustomerID
, CostCentreID
, rn3
, count(*) c
, min(StartDate) StartDate
, max(EndDate) EndDate
from cte
group by
CustomerID, CostCentreID, rn3
order by
CustomerID, StartDate
Results:
| CustomerID | CostCentreID | rn3 | c | StartDate | EndDate |
|------------|--------------|-----|---|----------------------|-------------------------|
| 2734 | 3 | 0 | 4 | 2007-11-27T00:00:00Z | 2010-08-16 12:13:30.703 |
| 2734 | 13 | 4 | 3 | 2010-08-16T00:00:00Z | 2011-03-15 17:57:48.027 |
| 2734 | 906 | 7 | 1 | 2011-03-15T00:00:00Z | 2011-10-18 15:42:36.877 |
| 2734 | 13 | 5 | 1 | 2011-10-18T15:42:36Z | 2011-11-11 16:42:54.787 |
| 2734 | 906 | 8 | 1 | 2011-11-11T16:42:54Z | 2011-11-14 11:18:24.057 |
| 2734 | 13 | 6 | 2 | 2011-11-14T11:18:24Z | 2012-12-04 11:32:47.433 |
| 2734 | 14 | 12 | 2 | 2012-12-04T11:32:47Z | 2013-10-17 16:08:53.120 |
| 2734 | 5 | 14 | 1 | 2013-10-17T16:08:53Z | 2015-03-06 11:51:09.440 |
| 2734 | 3 | 11 | 1 | 2015-03-06T11:51:09Z | 2016-01-27 13:10:19.720 |
| 2734 | 906 | 14 | 1 | 2016-01-27T13:10:19Z | (null) |
----
Instead of a group by query, use more window functions, and you can get the all details in the temp table as well as the wanted cost center related dates.
with cte as (
select
*
, row_number() over(partition by CustomerID order by StartDate) rn1
, row_number() over(partition by CustomerID, CostCentreID order by StartDate) rn2
, row_number() over(partition by CustomerID order by StartDate)
- row_number() over(partition by CustomerID, CostCentreID order by StartDate) rn3
from temptbl
)
, cte2 as (
select
*
, min(StartDate) over(partition by CustomerID, CostCentreID, rn3) MinStartDate
, max(EndDate) over(partition by CustomerID, CostCentreID, rn3) MaxEndDate
from cte
)
select
*
from cte2
;
Results:
| ConsecutivePositions | CustomerID | PositionID | CustomerPositionId | StartDate | EndDate | CostCentreID | rn1 | rn2 | rn3 | MinStartDate | MaxEndDate |
|----------------------|------------|------------|--------------------|----------------------|-------------------------|--------------|-----|-----|-----|----------------------|-------------------------|
| 6 | 2734 | 154 | 2723 | 2007-11-27T00:00:00Z | 2009-07-10 15:44:16.640 | 3 | 1 | 1 | 0 | 2007-11-27T00:00:00Z | 2010-08-16 12:13:30.703 |
| 13 | 2734 | 5 | 12785 | 2009-07-10T00:00:00Z | 2009-07-29 09:30:52.430 | 3 | 2 | 2 | 0 | 2007-11-27T00:00:00Z | 2010-08-16 12:13:30.703 |
| 14 | 2734 | 5 | 12999 | 2009-07-29T00:00:00Z | 2010-03-04 13:00:30.223 | 3 | 3 | 3 | 0 | 2007-11-27T00:00:00Z | 2010-08-16 12:13:30.703 |
| 15 | 2734 | 149 | 15165 | 2010-03-04T00:00:00Z | 2010-08-16 12:13:30.703 | 3 | 4 | 4 | 0 | 2007-11-27T00:00:00Z | 2010-08-16 12:13:30.703 |
| 16 | 2734 | 8 | 17044 | 2010-08-16T00:00:00Z | 2010-09-12 16:29:01.203 | 13 | 5 | 1 | 4 | 2010-08-16T00:00:00Z | 2011-03-15 17:57:48.027 |
| 8 | 2734 | 146 | 17453 | 2010-09-12T00:00:00Z | 2010-11-11 15:58:25.043 | 13 | 6 | 2 | 4 | 2010-08-16T00:00:00Z | 2011-03-15 17:57:48.027 |
| 9 | 2734 | 8 | 18180 | 2010-11-11T00:00:00Z | 2011-03-15 17:57:48.027 | 13 | 7 | 3 | 4 | 2010-08-16T00:00:00Z | 2011-03-15 17:57:48.027 |
| 10 | 2734 | 8 | 21606 | 2011-10-18T15:42:36Z | 2011-11-11 16:42:54.787 | 13 | 9 | 4 | 5 | 2011-10-18T15:42:36Z | 2011-11-11 16:42:54.787 |
| 11 | 2734 | 8 | 21982 | 2011-11-14T11:18:24Z | 2012-04-05 16:21:41.230 | 13 | 11 | 5 | 6 | 2011-11-14T11:18:24Z | 2012-12-04 11:32:47.433 |
| 3 | 2734 | 271 | 23599 | 2012-04-05T16:21:41Z | 2012-12-04 11:32:47.433 | 13 | 12 | 6 | 6 | 2011-11-14T11:18:24Z | 2012-12-04 11:32:47.433 |
| 7 | 2734 | 145 | 19436 | 2011-03-15T00:00:00Z | 2011-10-18 15:42:36.877 | 906 | 8 | 1 | 7 | 2011-03-15T00:00:00Z | 2011-10-18 15:42:36.877 |
| 12 | 2734 | 264 | 21958 | 2011-11-11T16:42:54Z | 2011-11-14 11:18:24.057 | 906 | 10 | 2 | 8 | 2011-11-11T16:42:54Z | 2011-11-14 11:18:24.057 |
| 2 | 2734 | 29 | 39405 | 2015-03-06T11:51:09Z | 2016-01-27 13:10:19.720 | 3 | 16 | 5 | 11 | 2015-03-06T11:51:09Z | 2016-01-27 13:10:19.720 |
| 4 | 2734 | 107 | 26479 | 2012-12-04T11:32:47Z | 2013-03-19 09:07:13.633 | 14 | 13 | 1 | 12 | 2012-12-04T11:32:47Z | 2013-10-17 16:08:53.120 |
| 5 | 2734 | 297 | 28497 | 2013-03-19T09:07:13Z | 2013-10-17 16:08:53.120 | 14 | 14 | 2 | 12 | 2012-12-04T11:32:47Z | 2013-10-17 16:08:53.120 |
| 1 | 2734 | 195 | 31860 | 2013-10-17T16:08:53Z | 2015-03-06 11:51:09.440 | 5 | 15 | 1 | 14 | 2013-10-17T16:08:53Z | 2015-03-06 11:51:09.440 |
| 17 | 2734 | 891 | 45453 | 2016-01-27T13:10:19Z | (null) | 906 | 17 | 3 | 14 | 2013-10-17T16:08:53Z | 2015-03-06 11:51:09.440 |

Related

partition by for iterval of 2 seconds

I have a DB with a field of timestamp,
I want to partition it for every 2 seconds (I know how to do it for 1 minute and one second)
this is an example of the DB:
create table data_t(id integer, time_t timestamp without time zone, data_t integer );
insert into data_t(id,time_t,data_t) values(1,'1999-01-08 04:05:06',248),
(2,'1999-01-08 04:05:06.03',45),
(3,'1999-01-08 04:05:06.035',98),
(4,'1999-01-08 04:05:06.9',57),
(5,'1999-01-08 04:05:07',86),
(6,'1999-01-08 04:05:08',84),
(7,'1999-01-08 04:05:08.5',832),
(8,'1999-01-08 04:05:08.7',86),
(9,'1999-01-08 04:05:08.9',863),
(10,'1999-01-08 04:05:9',866),
(11,'1999-01-08 04:05:10',862),
(12,'1999-01-08 04:05:10.5',863),
(13,'1999-01-08 04:05:10.55',826),
(14,'1999-01-08 04:05:11',816),
(15,'1999-01-08 04:05:11.7',186),
(16,'1999-01-08 04:05:12',862),
(17,'1999-01-08 04:05:12.5',826)
;
with t as (
select id,
time_t,
date_trunc('second', data_t.time_t) as time_t_1,
data_t
from data_t
), t1 as(
select *,
extract(hour from time_t_1) as h,
extract(minute from time_t_1) as m,
extract(second from time_t_1) as s
from t ) select *,
row_number() over(partition by h,m,s order by time_t_1) as t_sequence
from t1;
the output of this is:
| id | time_t | time_t_1 | data_t | h | m | s | t_sequence |
|----|--------------------------|----------------------|--------|---|---|----|------------|
| 1 | 1999-01-08T04:05:06Z | 1999-01-08T04:05:06Z | 248 | 4 | 5 | 6 | 1 |
| 2 | 1999-01-08T04:05:06.03Z | 1999-01-08T04:05:06Z | 45 | 4 | 5 | 6 | 2 |
| 3 | 1999-01-08T04:05:06.035Z | 1999-01-08T04:05:06Z | 98 | 4 | 5 | 6 | 3 |
| 4 | 1999-01-08T04:05:06.9Z | 1999-01-08T04:05:06Z | 57 | 4 | 5 | 6 | 4 |
| 5 | 1999-01-08T04:05:07Z | 1999-01-08T04:05:07Z | 86 | 4 | 5 | 7 | 1 |
| 6 | 1999-01-08T04:05:08Z | 1999-01-08T04:05:08Z | 84 | 4 | 5 | 8 | 1 |
| 7 | 1999-01-08T04:05:08.5Z | 1999-01-08T04:05:08Z | 832 | 4 | 5 | 8 | 2 |
| 8 | 1999-01-08T04:05:08.7Z | 1999-01-08T04:05:08Z | 86 | 4 | 5 | 8 | 3 |
| 9 | 1999-01-08T04:05:08.9Z | 1999-01-08T04:05:08Z | 863 | 4 | 5 | 8 | 4 |
| 10 | 1999-01-08T04:05:09Z | 1999-01-08T04:05:09Z | 866 | 4 | 5 | 9 | 1 |
| 11 | 1999-01-08T04:05:10Z | 1999-01-08T04:05:10Z | 862 | 4 | 5 | 10 | 1 |
| 12 | 1999-01-08T04:05:10.5Z | 1999-01-08T04:05:10Z | 863 | 4 | 5 | 10 | 2 |
| 13 | 1999-01-08T04:05:10.55Z | 1999-01-08T04:05:10Z | 826 | 4 | 5 | 10 | 3 |
| 14 | 1999-01-08T04:05:11Z | 1999-01-08T04:05:11Z | 816 | 4 | 5 | 11 | 1 |
| 15 | 1999-01-08T04:05:11.7Z | 1999-01-08T04:05:11Z | 186 | 4 | 5 | 11 | 2 |
| 16 | 1999-01-08T04:05:12Z | 1999-01-08T04:05:12Z | 862 | 4 | 5 | 12 | 1 |
| 17 | 1999-01-08T04:05:12.5Z | 1999-01-08T04:05:12Z | 826 | 4 | 5 | 12 | 2 |
as you can see the t_sequence start over every second but I want it to start over every 2 seconds,
is there a way to do it?
link for SQL fiddle with all the data

Return unique grouped rows with the latest timestamp [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 3 years ago.
At the moment I'm struggling with a problem that looks very easy.
Tablecontent:
Primay Keys: Timestamp, COL_A,COL_B ,COL_C,COL_D
+------------------+-------+-------+-------+-------+--------+--------+
| Timestamp | COL_A | COL_B | COL_C | COL_D | Data_A | Data_B |
+------------------+-------+-------+-------+-------+--------+--------+
| 31.07.2019 15:12 | - | - | - | - | 1 | 2 |
| 31.07.2019 15:32 | 1 | 1 | 100 | 1 | 5000 | 20 |
| 10.08.2019 09:33 | - | - | - | - | 1000 | 7 |
| 31.07.2019 15:38 | 1 | 1 | 100 | 1 | 33 | 5 |
| 06.08.2019 08:53 | - | - | - | - | 0 | 7 |
| 06.08.2019 09:08 | - | - | - | - | 0 | 7 |
| 06.08.2019 16:06 | 3 | 3 | 3 | 3 | 0 | 23 |
| 07.08.2019 10:43 | - | - | - | - | 0 | 42 |
| 07.08.2019 13:10 | - | - | - | - | 0 | 24 |
| 08.08.2019 07:19 | 11 | 111 | 111 | 12 | 0 | 2 |
| 08.08.2019 10:54 | 2334 | 65464 | 565 | 76 | 1000 | 19 |
| 08.08.2019 11:15 | 232 | 343 | 343 | 43 | 0 | 2 |
| 08.08.2019 11:30 | 2323 | rtttt | 3434 | 34 | 0 | 2 |
| 10.08.2019 14:47 | - | - | - | - | 123 | 23 |
+------------------+-------+-------+-------+-------+--------+--------+
Needed query output:
+------------------+-------+-------+-------+-------+--------+--------+
| Timestamp | COL_A | COL_B | COL_C | COL_D | Data_A | Data_B |
+------------------+-------+-------+-------+-------+--------+--------+
| 31.07.2019 15:38 | 1 | 1 | 100 | 1 | 33 | 5 |
| 06.08.2019 16:06 | 3 | 3 | 3 | 3 | 0 | 23 |
| 08.08.2019 07:19 | 11 | 111 | 111 | 12 | 0 | 2 |
| 08.08.2019 10:54 | 2334 | 65464 | 565 | 76 | 1000 | 19 |
| 08.08.2019 11:15 | 232 | 343 | 343 | 43 | 0 | 2 |
| 08.08.2019 11:30 | 2323 | rtttt | 3434 | 34 | 0 | 2 |
| 10.08.2019 14:47 | - | - | - | - | 123 | 23 |
+------------------+-------+-------+-------+-------+--------+--------+
As you can see, I'm trying to get single rows for my primary keys, using the latest timestamp, which is also a primary key.
Currently, I tried a query like:
SELECT Timestamp, COL_A, COL_B, COL_C, COL_D, Data_A, Data_B From Table XY op
WHERE Timestamp = (
SELECT MAX(Timestamp) FROM XY as tsRow
WHERE op.COL_A = tsRow.COL_A
AND op.COL_B = tsRow.COL_B
AND op.COL_C = tsRow.COL_C
AND op.COL_D = tsRow."COL_D
);
which gives me result that looks fine at first glance.
Is there a better or more safe way to get my preferred result?

demo:db<>fiddle
You can use the DISTINCT ON clause, which gives you the first record of an ordered group. Here your group is your (A, B, C, D). This is ordered by the Timestamp column, in descending order, to get the most recent record to be the first.
SELECT DISTINCT ON ("COL_A", "COL_B", "COL_C", "COL_D")
*
FROM
mytable
ORDER BY "COL_A", "COL_B", "COL_C", "COL_D", "Timestamp" DESC
If you want to get your expected order, you need a second ORDER BY after this operation:
SELECT
*
FROM (
SELECT DISTINCT ON ("COL_A", "COL_B", "COL_C", "COL_D")
*
FROM
mytable
ORDER BY "COL_A", "COL_B", "COL_C", "COL_D", "Timestamp" DESC
) s
ORDER BY "Timestamp"
Note: If you have the Timestamp column as part of the PK, are you sure, you really need the four other columns as PK as well? It seems, that the TS column is already unique.

Create trip number on tracking data

I am currently working on a Postgres database with data for car tracking which looks similar to this:
+----+--------+------------+----------+
| id | car_id | date | time |
+----+--------+------------+----------+
| 11 | 1 | 2014-12-20 | 12:12:12 |
| 12 | 1 | 2014-12-20 | 12:12:13 |
| 13 | 1 | 2014-12-20 | 12:12:14 |
| 23 | 1 | 2015-12-20 | 23:42:10 |
| 24 | 1 | 2015-12-20 | 23:42:11 |
| 31 | 2 | 2014-12-20 | 15:12:12 |
| 32 | 2 | 2014-12-20 | 15:12:14 |
+----+--------+------------+----------+
Here is the setup:
CREATE TABLE test (
id int
, car_id int
, date text
, time text
);
INSERT INTO test VALUES
(11, 1, '2014-12-20', '12:12:12'),
(12, 1, '2014-12-20', '12:12:13'),
(13, 1, '2014-12-20', '12:12:14'),
(23, 1, '2015-12-20', '23:42:10'),
(24, 1, '2015-12-20', '23:42:11'),
(31, 2, '2014-12-20', '15:12:12'),
(32, 2, '2014-12-20', '15:12:14');
I want to create a column where the traces are assigned a trip number sorted by id
id car_id date time (trip)
11 1 2014-12-20 12:12:12 1
12 1 2014-12-20 12:12:13 1
13 1 2014-12-20 12:12:14 1
23 1 2015-12-20 23:42:10 2 (trip +1 because time difference is bigger then 5 sec)
24 1 2015-12-20 23:42:11 2
31 2 2014-12-20 15:12:12 3 (trip +1 because car id is different)
32 2 2014-12-20 15:12:14 3 `
I have put op following rules
first row (lowest id) gets the value trip = 1
for the following rows: if car_id is equal to the row above and time
difference between the row and the row above is smaller then 5 then trip is
the same as the row above, else trip is the row above +1
I have tried with the following
Create table test as select
"id", "date", "time", car_id,
extract(epoch from "date" + "time") - lag(extract(epoch from "date" + "time")) over (order by "id") as diff,
Case
when t_diff < 5 and car_id - lag(car_id) over (order by "id") = 0
then lag(trip) over (order by "id")
else lag(trip) over (order by "id") + 1
end as trip
From road_1 order by "id"
but it does not work :( How can I compute the trip column?

First, use (date || ' ' || time)::timestamp AS datetime to form a timestamp out of date and time
SELECT id, test.car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test
which yields
| id | car_id | datetime |
|----+--------+---------------------|
| 11 | 1 | 2014-12-20 12:12:12 |
| 12 | 1 | 2014-12-20 12:12:13 |
| 13 | 1 | 2014-12-20 12:12:14 |
| 23 | 1 | 2015-12-20 23:42:10 |
| 24 | 1 | 2015-12-20 23:42:11 |
| 31 | 2 | 2014-12-20 15:12:12 |
| 32 | 2 | 2014-12-20 15:12:14 |
It is helpful to do this since we'll be using datetime - prev > '5 seconds'::interval
to identify rows which are 5 seconds apart. Notice that
2014-12-20 23:59:59 and 2014-12-21 00:00:00 are 5 seconds apart
but it would be difficult/tedious to determine this if all we had were separate date and time columns.
Now we can express the rule that the trip is increased by 1 when
NOT ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))
(More on why the condition is expressed in this seemingly backwards way, below).
SELECT id, car_id, prev_car_id, datetime, prev_date
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
yields
| id | car_id | prev_car_id | datetime | prev_date | new_trip |
|----+--------+-------------+---------------------+---------------------+----------|
| 11 | 1 | | 2014-12-20 12:12:12 | | 1 |
| 12 | 1 | 1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 | 0 |
| 13 | 1 | 1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 | 0 |
| 23 | 1 | 1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 | 1 |
| 24 | 1 | 1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 | 0 |
| 31 | 2 | 1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 | 1 |
| 32 | 2 | 2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 | 0 |
Now trip can be expressed as the cumulative sum over the new_trip column:
SELECT id, car_id, datetime, sum(new_trip) OVER (ORDER BY datetime) AS trip
FROM (
SELECT id, car_id, prev_car_id, datetime, prev_date
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
) t3
yields
| id | car_id | datetime | trip |
|----+--------+---------------------+------|
| 11 | 1 | 2014-12-20 12:12:12 | 1 |
| 12 | 1 | 2014-12-20 12:12:13 | 1 |
| 13 | 1 | 2014-12-20 12:12:14 | 1 |
| 31 | 2 | 2014-12-20 15:12:12 | 2 |
| 32 | 2 | 2014-12-20 15:12:14 | 2 |
| 23 | 1 | 2015-12-20 23:42:10 | 3 |
| 24 | 1 | 2015-12-20 23:42:11 | 3 |
I used
(CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END)
instead of
(CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END)
because prev_car_id and prev_date may be NULL. Thus, on the first row, (car_id != prev_car_id) returns NULL when instead we want TRUE.
By expressing the condition in the opposite way, we can identify the unintersting rows correctly:
((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval))
and use the ELSE clause to return 1 when the condition is TRUE or NULL. You can see the difference here:
SELECT id
, (CASE WHEN ((car_id = prev_car_id) AND (datetime-prev_date <= '5 seconds'::interval)) THEN 0 ELSE 1 END) AS new_trip
, (CASE WHEN ((car_id != prev_car_id) OR (datetime-prev_date > '5 seconds'::interval)) THEN 1 ELSE 0 END) AS new_trip_wrong
, car_id, prev_car_id, datetime, prev_date
FROM (
SELECT id, car_id, datetime
, lag(datetime) OVER () AS prev_date
, lag(car_id) OVER () AS prev_car_id
FROM (
SELECT id, car_id
, (date || ' ' || time)::timestamp AS datetime
FROM test ) t1
) t2
yields
| id | new_trip | new_trip_wrong | car_id | prev_car_id | datetime | prev_date |
|----+----------+----------------+--------+-------------+---------------------+---------------------|
| 11 | 1 | 0 | 1 | | 2014-12-20 12:12:12 | |
| 12 | 0 | 0 | 1 | 1 | 2014-12-20 12:12:13 | 2014-12-20 12:12:12 |
| 13 | 0 | 0 | 1 | 1 | 2014-12-20 12:12:14 | 2014-12-20 12:12:13 |
| 23 | 1 | 1 | 1 | 1 | 2015-12-20 23:42:10 | 2014-12-20 12:12:14 |
| 24 | 0 | 0 | 1 | 1 | 2015-12-20 23:42:11 | 2015-12-20 23:42:10 |
| 31 | 1 | 1 | 2 | 1 | 2014-12-20 15:12:12 | 2015-12-20 23:42:11 |
| 32 | 0 | 0 | 2 | 2 | 2014-12-20 15:12:14 | 2014-12-20 15:12:12 |
Note the difference in the new_trip versus new_trip_wrong columns.

Sort using auxiliary fields, start and end

In PostgreSQL, what is the best way to sort records using start and end fields in a generic way, without the need to include in the query the first record (where start_id=3)?
Example table:
+-------+----------+--------+--------+
| FK_ID | START_ID | END_ID | STRING |
+-------+----------+--------+--------+
| 77 | 1 | 9 | E |
| 82 | 5 | 2 | A |
| 77 | 7 | 1 | I |
| 77 | 3 | 7 | W |
| 82 | 9 | 5 | Q |
| 77 | 9 | 5 | X |
| 82 | 2 | 7 | G |
+-------+----------+--------+--------+
Sorted where FK_ID = 77:
+----+---+---+---+
| 77 | 3 | 7 | W |
| 77 | 7 | 1 | I |
| 77 | 1 | 9 | E |
| 77 | 9 | 5 | X |
+----+---+---+---+
Sorted where FK_ID = 82:
+----+---+---+---+
| 82 | 9 | 5 | Q |
| 82 | 5 | 2 | A |
| 82 | 2 | 7 | G |
+----+---+---+---+
Result query sequence:
+-------+----------+
| FK_ID | SEQUENCE |
+-------+----------+
| 82 | QAG |
| 77 | WIEX |
+-------+----------+

I do not think this is the most efficient way but you can try with a recursive CTE
WITH RECURSIVE path AS (
SELECT * FROM myTable AS t1 WHERE NOT EXISTS(
SELECT 1 FROM myTable AS t2 WHERE t1.fk_id = t2.fk_id AND t2.end_id = t1.start_id
) ORDER BY start_id LIMIT 1
UNION ALL
SELECT myTable.* FROM myTable JOIN path ON path.end_id = myTable.start_id
)
SELECT fk_id,array_to_string(array_agg(string)) FROM path GROUP BY fk_id

DB2 Query multiple select and sum by date

I have 3 tables: ITEMS, ODETAILS, OHIST.
ITEMS - a list of products, ID is the key field
ODETAILS - line items of every order, no key field
OHIST - a view showing last years order totals by month
ITEMS ODETAILS OHIST
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| ID | NAME | | OID | ODUE | ITEM_ID | ITEM_QTY | | ITEM_ID | M5QTY |
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| 10 + Widget10 | | A33 | 1180503 | 10 | 100 | | 10 | 1000 |
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| 11 + Widget11 | | A33 | 1180504 | 11 | 215 | | 11 | 1500 |
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| 12 + Widget12 | | A34 | 1180505 | 10 | 500 | | 12 | 2251 |
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| 13 + Widget13 | | A34 | 1180504 | 11 | 320 | | 13 | 4334 |
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| A34 | 1180504 | 12 | 450 |
+-----+---------+---------+----------+
| A34 | 1180505 | 13 | 125 |
+-----+---------+---------+----------+
Assuming today is May 2, 2018 (1180502).
I want my results to show ID, NAME, M5QTY, and SUM(ITEM_QTY) grouped by day
over the next 3 days (D1, D2, D3)
Desired Result
+----+----------+--------+------+------+------+
| ID | NAME | M5QTY | D1 | D2 | D3 |
+----+----------+--------+------+------+------+
| 10 | Widget10 | 1000 | 100 | | 500 |
+----+----------+--------+------+------+------+
| 11 | Widget11 | 1500 | | 535 | |
+----+----------+--------+------+------+------+
| 12 | Widget12 | 2251 | | 450 | |
+----+----------+--------+------+------+------+
| 13 | Widget13 | 4334 | | | 125 |
+----+----------+--------+------+------+------+
This is how I convert ODUE to a date
DATE(concat(concat(concat(substr(char((ODETAILS.ODUE-1000000)+20000000),1,4),'-'), concat(substr(char((ODETAILS.ODUE-1000000)+20000000),5,2), '-')), substr(char((ODETAILS.ODUE-1000000)+20000000),7,2)))

Try this (you can add the joins you need)
SELECT ITEM_ID
, SUM(CASE WHEN ODUE = INT(CURRENT DATE) - 19000000 + 1 THEN ITEM_QTY ELSE 0 END) AS D1
, SUM(CASE WHEN ODUE = INT(CURRENT DATE) - 19000000 + 2 THEN ITEM_QTY ELSE 0 END) AS D2
, SUM(CASE WHEN ODUE = INT(CURRENT DATE) - 19000000 + 3 THEN ITEM_QTY ELSE 0 END) AS D3
FROM
ODETAILS
GROUP BY
ITEM_ID