SQL query help: Calculate max of previous rows in the same query - postgresql

I want to find for each row(where B = C = D = 1), the max of A among its previous rows(where B = C = D = 1) excluding its row after its ordered in chronological order.
Data in table looks like this:
+-------+-----+-----+-----+------+------+
|Grp id | B | C | D | A | time |
+-------+---- +-----+-----+------+------+
| 111 | 1 | 0 | 0 | 52 | t |
| 111 | 1 | 1 | 1 | 33 | t+1 |
| 111 | 0 | 1 | 0 | 34 | t+2 |
| 111 | 1 | 1 | 1 | 22 | t+3 |
| 111 | 0 | 0 | 0 | 12 | t+4 |
| 222 | 1 | 1 | 1 | 16 | t |
| 222 | 1 | 0 | 0 | 18 | t2+1 |
| 222 | 1 | 1 | 0 | 13 | t2+2 |
| 222 | 1 | 1 | 1 | 12 | t2+3 |
| 222 | 1 | 1 | 1 | 09 | t2+4 |
| 222 | 1 | 1 | 1 | 22 | t2+5 |
| 222 | 1 | 1 | 1 | 19 | t2+6 |
+-------+-----+-----+-----+------+------+
Above table is resultant of below query. Its obtained after left joins as below. Joins are necessary according to my project requirement.
SELECT Grp id, B, C, D, A, time, xxx
FROM "DCR" dcr
LEFT JOIN "DCM" dcm ON "Id" = dcm."DCRID"
LEFT JOIN "DC" dc ON dc."Id" = dcm."DCID"
ORDER BY dcr."time"
Result column needs to be evaluated based on formula I mentioned above. It needs to be calculated in same pass as we need to consider only its previous rows. Above xxx needs to be replaced by a subquery/statement to obtain the result.
And the result table should look like this:
+-------+-----+-----+-----+------+------+------+
|Grp id | B | C | D | A | time |Result|
+-------+---- +-----+-----+------+------+------+
| 111 | 1 | 0 | 0 | 52 | t | - |
| 111 | 1 | 1 | 1 | 33 | t+1 | - |
| 111 | 1 | 1 | 1 | 34 | t+2 | 33 |
| 111 | 1 | 1 | 1 | 22 | t+3 | 34 |
| 111 | 0 | 0 | 0 | 12 | t+4 | - |
| 222 | 1 | 1 | 1 | 16 | t | - |
| 222 | 1 | 0 | 0 | 18 | t2+1 | - |
| 222 | 1 | 1 | 0 | 13 | t2+2 | - |
| 222 | 1 | 1 | 1 | 12 | t2+3 | 16 |
| 222 | 1 | 1 | 1 | 09 | t2+4 | 16 |
| 222 | 1 | 1 | 1 | 22 | t2+5 | 16 |
| 222 | 1 | 1 | 1 | 19 | t2+6 | 22 |
+-------+-----+-----+-----+------+------+------+

The column could be computed with a window function:
CASE WHEN b = 1 AND c = 1 AND d = 1
THEN max(a) FILTER (WHERE b = 1 AND c = 1 AND d = 1)
OVER (PARTITION BY "grp id"
ORDER BY time
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
ELSE NULL
END
I didn't test it.

Related

Return unique grouped rows with the latest timestamp [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 3 years ago.
At the moment I'm struggling with a problem that looks very easy.
Tablecontent:
Primay Keys: Timestamp, COL_A,COL_B ,COL_C,COL_D
+------------------+-------+-------+-------+-------+--------+--------+
| Timestamp | COL_A | COL_B | COL_C | COL_D | Data_A | Data_B |
+------------------+-------+-------+-------+-------+--------+--------+
| 31.07.2019 15:12 | - | - | - | - | 1 | 2 |
| 31.07.2019 15:32 | 1 | 1 | 100 | 1 | 5000 | 20 |
| 10.08.2019 09:33 | - | - | - | - | 1000 | 7 |
| 31.07.2019 15:38 | 1 | 1 | 100 | 1 | 33 | 5 |
| 06.08.2019 08:53 | - | - | - | - | 0 | 7 |
| 06.08.2019 09:08 | - | - | - | - | 0 | 7 |
| 06.08.2019 16:06 | 3 | 3 | 3 | 3 | 0 | 23 |
| 07.08.2019 10:43 | - | - | - | - | 0 | 42 |
| 07.08.2019 13:10 | - | - | - | - | 0 | 24 |
| 08.08.2019 07:19 | 11 | 111 | 111 | 12 | 0 | 2 |
| 08.08.2019 10:54 | 2334 | 65464 | 565 | 76 | 1000 | 19 |
| 08.08.2019 11:15 | 232 | 343 | 343 | 43 | 0 | 2 |
| 08.08.2019 11:30 | 2323 | rtttt | 3434 | 34 | 0 | 2 |
| 10.08.2019 14:47 | - | - | - | - | 123 | 23 |
+------------------+-------+-------+-------+-------+--------+--------+
Needed query output:
+------------------+-------+-------+-------+-------+--------+--------+
| Timestamp | COL_A | COL_B | COL_C | COL_D | Data_A | Data_B |
+------------------+-------+-------+-------+-------+--------+--------+
| 31.07.2019 15:38 | 1 | 1 | 100 | 1 | 33 | 5 |
| 06.08.2019 16:06 | 3 | 3 | 3 | 3 | 0 | 23 |
| 08.08.2019 07:19 | 11 | 111 | 111 | 12 | 0 | 2 |
| 08.08.2019 10:54 | 2334 | 65464 | 565 | 76 | 1000 | 19 |
| 08.08.2019 11:15 | 232 | 343 | 343 | 43 | 0 | 2 |
| 08.08.2019 11:30 | 2323 | rtttt | 3434 | 34 | 0 | 2 |
| 10.08.2019 14:47 | - | - | - | - | 123 | 23 |
+------------------+-------+-------+-------+-------+--------+--------+
As you can see, I'm trying to get single rows for my primary keys, using the latest timestamp, which is also a primary key.
Currently, I tried a query like:
SELECT Timestamp, COL_A, COL_B, COL_C, COL_D, Data_A, Data_B From Table XY op
WHERE Timestamp = (
SELECT MAX(Timestamp) FROM XY as tsRow
WHERE op.COL_A = tsRow.COL_A
AND op.COL_B = tsRow.COL_B
AND op.COL_C = tsRow.COL_C
AND op.COL_D = tsRow."COL_D
);
which gives me result that looks fine at first glance.
Is there a better or more safe way to get my preferred result?
demo:db<>fiddle
You can use the DISTINCT ON clause, which gives you the first record of an ordered group. Here your group is your (A, B, C, D). This is ordered by the Timestamp column, in descending order, to get the most recent record to be the first.
SELECT DISTINCT ON ("COL_A", "COL_B", "COL_C", "COL_D")
*
FROM
mytable
ORDER BY "COL_A", "COL_B", "COL_C", "COL_D", "Timestamp" DESC
If you want to get your expected order, you need a second ORDER BY after this operation:
SELECT
*
FROM (
SELECT DISTINCT ON ("COL_A", "COL_B", "COL_C", "COL_D")
*
FROM
mytable
ORDER BY "COL_A", "COL_B", "COL_C", "COL_D", "Timestamp" DESC
) s
ORDER BY "Timestamp"
Note: If you have the Timestamp column as part of the PK, are you sure, you really need the four other columns as PK as well? It seems, that the TS column is already unique.

how to fetch last record for the tranactional table which have multiple records of same id in postgresql with join?

I have two tables one master table named complaints and another transactional table named complaintstaus table.
Below is the main table complaint table
| Complaintid | Status | Reopen | Parent_complaint_id |
|-------------|--------|--------|---------------------|
| 102 | 5 | 1 | 102 |
| 103 | 0 | 0 | 103 |
| 106 | 3 | 0 | 106 |
| 154 | 5 | 1 | 154 |
| 123 | 5 | 1 | 123 |
| 132 | 5 | 1 | 132 |
| 167 | 2 | 0 | 167 |
Below is the second table named complaintstatus
| Parent_id | currentstatus | openstatus |
|-----------|---------------|------------|
| 102 | 2 | 0 |
| 102 | 5 | 0 |
| 102 | 5 | 1 |
| 102 | 0 | 0 |
| 103 | 0 | 0 |
| 106 | 3 | 0 |
| 154 | 2 | 0 |
| 154 | 5 | 0 |
| 154 | 5 | 1 |
| 154 | 0 | 0 |
| 123 | 2 | 0 |
| 123 | 5 | 0 |
| 123 | 5 | 1 |
| 123 | 0 | 0 |
| 167 | 2 | 0 |
Result should be
| Parent_id | currentstatus | openstatus |
|-----------|---------------|------------|
| 102 | 0 | 0 |
| 154 | 0 | 0 |
| 123 | 0 | 0 |
The result I need is total number of reopen =4 and pending count= 3

Check previous and next record

I'm trying to compare different costs from different periods. But I dont no how I can compare the single record with the record before and after. What I need is a yes or no in my dataset when the costs from a records is the same as record before and record after.
My dataset looks like this:
+--------+-----------+----------+------------+-------+-----------+
| Client | Provision | CAK Year | CAK Period | Costs | Serial Nr |
+--------+-----------+----------+------------+-------+-----------+
| 1 | 210 | 2017 | 13 | 150 | 1 |
+--------+-----------+----------+------------+-------+-----------+
| 1 | 210 | 2018 | 1 | 200 | 2 |
+--------+-----------+----------+------------+-------+-----------+
| 1 | 210 | 2018 | 2 | 170 | 3 |
+--------+-----------+----------+------------+-------+-----------+
| 1 | 210 | 2018 | 3 | 150 | 4 |
+--------+-----------+----------+------------+-------+-----------+
| 1 | 210 | 2018 | 4 | 150 | 5 |
+--------+-----------+----------+------------+-------+-----------+
| 1 | 210 | 2018 | 5 | 150 | 6 |
+--------+-----------+----------+------------+-------+-----------+
| 1 | 689 | 2018 | 1 | 345 | 1 |
+--------+-----------+----------+------------+-------+-----------+
| 1 | 689 | 2018 | 2 | 345 | 1 |
+--------+-----------+----------+------------+-------+-----------+
| 1 | 689 | 2018 | 3 | 345 | 1 |
+--------+-----------+----------+------------+-------+-----------+
What i've tried so far:
CASE
WHEN Provision = Provision
AND Costs = LEAD(Costs, 1, 0) OVER(ORDER BY CAK Year, CAK Period)
AND Costs = LAG(Costs, 1, 0) OVER(ORDER BY CAK Year, CAK Period)
THEN 'Yes
ELSE 'No'
END
My expected result:
+--------+-----------+----------+------------+-------+-----------+--------+
| Client | Provision | CAK Year | CAK Period | Costs | Serial Nr | Result |
+--------+-----------+----------+------------+-------+-----------+--------+
| 1 | 210 | 2017 | 13 | 150 | 1 | No
+--------+-----------+----------+------------+-------+-----------+--------+
| 1 | 210 | 2018 | 1 | 200 | 2 | No
+--------+-----------+----------+------------+-------+-----------+--------+
| 1 | 210 | 2018 | 2 | 170 | 3 | No
+--------+-----------+----------+------------+-------+-----------+--------+
| 1 | 210 | 2018 | 3 | 150 | 4 | No
+--------+-----------+----------+------------+-------+-----------+--------+
| 1 | 210 | 2018 | 4 | 150 | 5 | Yes
+--------+-----------+----------+------------+-------+-----------+--------+
| 1 | 210 | 2018 | 5 | 150 | 6 | No
+--------+-----------+----------+------------+-------+-----------+--------+
| 1 | 689 | 2018 | 1 | 345 | 1 | No
+--------+-----------+----------+------------+-------+-----------+--------+
| 1 | 689 | 2018 | 2 | 345 | 1 | Yes
+--------+-----------+----------+------------+-------+-----------+--------+
| 1 | 689 | 2018 | 3 | 345 | 1 | No
+--------+-----------+----------+------------+-------+-----------+--------+
You guys can help me further because I don't get the expected result?
You need to add in partition by Provision otherwise your lag and lead ordering will run across all Provision values:
declare #d table(Client int,Provision int,CAKYear int, CAKPeriod int, Costs int, SerialNr int);
insert into #d values
(1,210,2017,13,150,1)
,(1,210,2018,1,200,2)
,(1,210,2018,2,170,3)
,(1,210,2018,3,150,4)
,(1,210,2018,4,150,5)
,(1,210,2018,5,150,6)
,(1,689,2018,1,345,1)
,(1,689,2018,2,345,1)
,(1,689,2018,3,345,1);
select *
,case when Provision = Provision
and Costs = lead(Costs, 1, 0) over(partition by Provision order by CAKYear, CAKPeriod)
and Costs = lag(Costs, 1, 0) over(partition by Provision order by CAKYear, CAKPeriod)
then 'Yes'
else 'No'
end as Result
from #d
order by Provision
,CAKYear
,CAKPeriod;
Output
+--------+-----------+---------+-----------+-------+----------+--------+
| Client | Provision | CAKYear | CAKPeriod | Costs | SerialNr | Result |
+--------+-----------+---------+-----------+-------+----------+--------+
| 1 | 210 | 2017 | 13 | 150 | 1 | No |
| 1 | 210 | 2018 | 1 | 200 | 2 | No |
| 1 | 210 | 2018 | 2 | 170 | 3 | No |
| 1 | 210 | 2018 | 3 | 150 | 4 | No |
| 1 | 210 | 2018 | 4 | 150 | 5 | Yes |
| 1 | 210 | 2018 | 5 | 150 | 6 | No |
| 1 | 689 | 2018 | 1 | 345 | 1 | No |
| 1 | 689 | 2018 | 2 | 345 | 1 | Yes |
| 1 | 689 | 2018 | 3 | 345 | 1 | No |
+--------+-----------+---------+-----------+-------+----------+--------+

Retrieve additional columns on aggregation and date operator

I have the following PostgreSQL table structure, which gathers temperature records for every second:
+----+--------+-------------------------------+---------+
| id | value | date | station |
+----+--------+-------------------------------+---------+
| 1 | 0 | 2017-08-22 14:01:09.314625+02 | 1 |
| 2 | 0 | 2017-08-22 14:01:09.347758+02 | 1 |
| 3 | 25.187 | 2017-08-22 14:01:10.315413+02 | 1 |
| 4 | 24.937 | 2017-08-22 14:01:10.322528+02 | 1 |
| 5 | 25.187 | 2017-08-22 14:01:11.347271+02 | 1 |
| 6 | 24.937 | 2017-08-22 14:01:11.355005+02 | 1 |
| 18 | 24.875 | 2017-08-22 14:01:17.35265+02 | 1 |
| 19 | 25.187 | 2017-08-22 14:01:18.34673+02 | 1 |
| 20 | 24.875 | 2017-08-22 14:01:18.355082+02 | 1 |
| 21 | 25.187 | 2017-08-22 14:01:19.361491+02 | 1 |
| 22 | 24.875 | 2017-08-22 14:01:19.371154+02 | 1 |
| 23 | 25.187 | 2017-08-22 14:01:20.354576+02 | 1 |
| 30 | 24.937 | 2017-08-22 14:01:23.372612+02 | 1 |
| 31 | 0 | 2017-08-22 15:58:53.576238+02 | 1 |
| 32 | 0 | 2017-08-22 15:58:53.590872+02 | 1 |
| 33 | 26.625 | 2017-08-22 15:58:54.59986+02 | 1 |
| 38 | 26.375 | 2017-08-22 15:58:56.593205+02 | 1 |
| 39 | 0 | 2017-08-21 15:59:40.181317+02 | 1 |
| 40 | 0 | 2017-08-21 15:59:40.190221+02 | 1 |
| 41 | 26.562 | 2017-08-21 15:59:41.182622+02 | 1 |
| 42 | 26.375 | 2017-08-21 15:59:41.18905+02 | 1 |
+----+--------+-------------------------------+---------+
I want now to retrieve the maximum value for every hour, along with the data associated to that entry (id, date). As such, I tried the following:
select max(value) as m, (date_trunc('hour', date)) as d
from temperature
where station='1'
group by (date_trunc('hour', date));
Which works fine (fiddle), but I only get the columns m and d as a result. If I now try to add the date or id columns to the SELECT statement, I get the usual column "temperature.id" must appear in the GROUP BY clause or be used in an aggregate function error.
I have already tried approaches such as the ones described here, unfortunately to no avail, as for instance I seem to be unable to perform a join on the date_trunc-generated columns.
The result I am aiming for is this:
+----+--------+-------------------------------+---------+
| id | value | date | station |
+----+--------+-------------------------------+---------+
| 3 | 25.187 | 2017-08-22 14:01:10.315413+02 | 1 |
| 33 | 26.625 | 2017-08-22 15:58:54.59986+02 | 1 |
| 41 | 26.562 | 2017-08-21 15:59:41.182622+02 | 1 |
+----+--------+-------------------------------+---------+
It does not matter which record was retrieved in case two or more entries have the same value.
distinct on:
select distinct on (date_trunc('hour', date)) *
from temperature
where station = '1'
order by date_trunc('hour', date), value desc
Fiddle

left join 2 tables not working

I have 2 tables:
Table1: 'op_ats'
| ID1 | numero |id_cofre | id_chave | estadoAT
| 1 | 111 | 1 | 3 | 1
| 2 | 222 | 3 | 3 | 2
| 3 | 333 | 1 | 4 | 2
| 4 | 444 | 1 | 2 | 3
Table_2: 'op_ats_cofres_chaves'
| ID2 | num_chave |
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
I have this SQL:
SELECT chaves.*, ats.numero numAT, ats.estadoAT
FROM op_ats_cofres_chaves chaves
LEFT JOIN op_ats ats ON ats.id_chave_cofre = chaves.id AND ats.id_cofre = 1
With this I get the following result:
| ID2 | num_chave | numAT | estadoAT |
| 1 | A | 444 | 3 |
| 2 | B | NULL | NULL |
| 3 | C | 111 | 1 |
| 4 | D | 333 | 2 |
| 5 | E | NULL | NULL |
Now the problem is that I want to filter the rows that are in Table1 but only that have the column 'estadoAT' with values 1 and 2. I've tried to add the line
WHERE op_ats.estadoAT = 1 OR op_ats.estadoAT = 2
But this makes the following result:
| ID2 | num_chave | numAT | estadoAT |
| 1 | A | 444 | 3 |
| 3 | C | 111 | 1 |
| 4 | D | 333 | 2 |
Resuming...
My intention is to get ALL rows in the Table2 and join the Table1 rows that have the 'id_cofre = 1' and '(estadoAT = 1 OR estadoAT = 2)'.
Any help is appreciated.
You have to move condition to JOIN clause instead of WHERE.
SELECT chaves.*, ats.numero numAT, ats.estadoAT
FROM op_ats_cofres_chaves chaves
LEFT JOIN op_ats ats ON ats.id_chave_cofre = chaves.id AND ats.id_cofre = 1
AND op_ats.estadoAT = 1 OR op_ats.estadoAT = 2;