Return unique grouped rows with the latest timestamp [duplicate] - postgresql

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 3 years ago.
At the moment I'm struggling with a problem that looks very easy.
Tablecontent:
Primay Keys: Timestamp, COL_A,COL_B ,COL_C,COL_D
+------------------+-------+-------+-------+-------+--------+--------+
| Timestamp | COL_A | COL_B | COL_C | COL_D | Data_A | Data_B |
+------------------+-------+-------+-------+-------+--------+--------+
| 31.07.2019 15:12 | - | - | - | - | 1 | 2 |
| 31.07.2019 15:32 | 1 | 1 | 100 | 1 | 5000 | 20 |
| 10.08.2019 09:33 | - | - | - | - | 1000 | 7 |
| 31.07.2019 15:38 | 1 | 1 | 100 | 1 | 33 | 5 |
| 06.08.2019 08:53 | - | - | - | - | 0 | 7 |
| 06.08.2019 09:08 | - | - | - | - | 0 | 7 |
| 06.08.2019 16:06 | 3 | 3 | 3 | 3 | 0 | 23 |
| 07.08.2019 10:43 | - | - | - | - | 0 | 42 |
| 07.08.2019 13:10 | - | - | - | - | 0 | 24 |
| 08.08.2019 07:19 | 11 | 111 | 111 | 12 | 0 | 2 |
| 08.08.2019 10:54 | 2334 | 65464 | 565 | 76 | 1000 | 19 |
| 08.08.2019 11:15 | 232 | 343 | 343 | 43 | 0 | 2 |
| 08.08.2019 11:30 | 2323 | rtttt | 3434 | 34 | 0 | 2 |
| 10.08.2019 14:47 | - | - | - | - | 123 | 23 |
+------------------+-------+-------+-------+-------+--------+--------+
Needed query output:
+------------------+-------+-------+-------+-------+--------+--------+
| Timestamp | COL_A | COL_B | COL_C | COL_D | Data_A | Data_B |
+------------------+-------+-------+-------+-------+--------+--------+
| 31.07.2019 15:38 | 1 | 1 | 100 | 1 | 33 | 5 |
| 06.08.2019 16:06 | 3 | 3 | 3 | 3 | 0 | 23 |
| 08.08.2019 07:19 | 11 | 111 | 111 | 12 | 0 | 2 |
| 08.08.2019 10:54 | 2334 | 65464 | 565 | 76 | 1000 | 19 |
| 08.08.2019 11:15 | 232 | 343 | 343 | 43 | 0 | 2 |
| 08.08.2019 11:30 | 2323 | rtttt | 3434 | 34 | 0 | 2 |
| 10.08.2019 14:47 | - | - | - | - | 123 | 23 |
+------------------+-------+-------+-------+-------+--------+--------+
As you can see, I'm trying to get single rows for my primary keys, using the latest timestamp, which is also a primary key.
Currently, I tried a query like:
SELECT Timestamp, COL_A, COL_B, COL_C, COL_D, Data_A, Data_B From Table XY op
WHERE Timestamp = (
SELECT MAX(Timestamp) FROM XY as tsRow
WHERE op.COL_A = tsRow.COL_A
AND op.COL_B = tsRow.COL_B
AND op.COL_C = tsRow.COL_C
AND op.COL_D = tsRow."COL_D
);
which gives me result that looks fine at first glance.
Is there a better or more safe way to get my preferred result?

demo:db<>fiddle
You can use the DISTINCT ON clause, which gives you the first record of an ordered group. Here your group is your (A, B, C, D). This is ordered by the Timestamp column, in descending order, to get the most recent record to be the first.
SELECT DISTINCT ON ("COL_A", "COL_B", "COL_C", "COL_D")
*
FROM
mytable
ORDER BY "COL_A", "COL_B", "COL_C", "COL_D", "Timestamp" DESC
If you want to get your expected order, you need a second ORDER BY after this operation:
SELECT
*
FROM (
SELECT DISTINCT ON ("COL_A", "COL_B", "COL_C", "COL_D")
*
FROM
mytable
ORDER BY "COL_A", "COL_B", "COL_C", "COL_D", "Timestamp" DESC
) s
ORDER BY "Timestamp"
Note: If you have the Timestamp column as part of the PK, are you sure, you really need the four other columns as PK as well? It seems, that the TS column is already unique.

Related

postgreSQL question: get data by last date of each record and subtract from last date number of days

Please help me make a request. i'm at a dead end.
There are 2 tables:
“Trains”:
+----+---------+
| id | numbers |
+----+---------+
| 1 | 101 |
| 2 | 102 |
| 3 | 103 |
| 4 | 104 |
| 5 | 105 |
+----+---------+
“Passages”:
+----+--------------+-------+---------------------+
| id | train_number | speed | date_time |
+----+--------------+-------+---------------------+
| 1 | 101 | 26 | 2021-11-10 16:26:30 |
| 2 | 101 | 28 | 2021-11-12 16:26:30 |
| 3 | 102 | 24 | 2021-11-14 16:26:30 |
| 4 | 103 | 27 | 2021-11-15 16:26:30 |
| 5 | 101 | 29 | 2021-11-16 16:26:30 |
+----+--------------+-------+---------------------+
The goal is to go through the train numbers from the Trains table, take from the existing ones from the Passages table by the latest date (date_time) and the number of passages for “the last date for each train” - N days. as I understand date_time - interval "N days". should get something like:
+----+--------+---------------------+----------------+
| id | train | last_passage | count_passages |
+----+--------+---------------------+----------------+
| 1 | 101 | 2021-11-10 16:26:30 | 2 |
| 2 | 102 | 2021-11-14 16:26:30 | 1 |
| 3 | 103 | 2021-11-15 16:26:30 | 1 |
| 4 | 104 | null | 0 |
| 5 | 105 | null | 0 |
+----+--------+---------------------+----------------+
ps: "count_passages" - for example, last passage date minus 4 days
I tried through "where in" but I can’t create the necessary and correct request

How to group MVA field for faceted in sphinx

I have an index where some data's has duplicate, all fields are similar except for latitude,longitude and id (field id is not realy ID, just generated row_number() OVER () AS id).
it's example:
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy;
+------+------------+---------------+----------+-----------+
| id | vacancy_id | prof_area_ids | latitude | longitude |
+------+------------+---------------+----------+-----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 |
| 2 | 916 | 17,283,288 | 0.973178 | 0.743566 |
| 3 | 915 | 17,288 | 0.973178 | 0.743566 |
| 4 | 914 | 30,482 | 0.973178 | 0.743566 |
| 5 | 919 | 15,243 | 0.825153 | 0.692837 |
| 6 | 919 | 15,243 | 0.825162 | 0.692828 |
| 7 | 918 | 8,154 | 0.825153 | 0.692837 |
| 8 | 918 | 8,154 | 0.825162 | 0.692828 |
| 9 | 920 | 17,283,288 | 0.958914 | 1.282161 |
| 10 | 920 | 17,283,288 | 0.958915 | 1.282215 |
| 11 | 924 | 12,208 | 0.97333 | 0.658246 |
| 12 | 924 | 12,208 | 0.973336 | 0.658237 |
| 13 | 923 | 21,365 | 0.97333 | 0.658246 |
| 14 | 923 | 21,365 | 0.973336 | 0.658237 |
| 15 | 922 | 20,359 | 0.97333 | 0.658246 |
| 16 | 922 | 20,359 | 0.973336 | 0.658237 |
| 17 | 921 | 19,346 | 0.97333 | 0.658246 |
| 18 | 921 | 19,346 | 0.973336 | 0.658237 |
| 19 | 926 | 12,17,208,292 | 0.88396 | 2.389868 |
| 20 | 925 | 12,208 | 0.88396 | 2.389868 |
+------+------------+---------------+----------+-----------+
20 rows in set (0.00 sec)
Now I want to group data by vacancy_id
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy group by vacancy_id;
+------+------------+---------------+----------+-----------+
| id | vacancy_id | prof_area_ids | latitude | longitude |
+------+------------+---------------+----------+-----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 |
| 2 | 916 | 17,283,288 | 0.973178 | 0.743566 |
| 3 | 915 | 17,288 | 0.973178 | 0.743566 |
| 4 | 914 | 30,482 | 0.973178 | 0.743566 |
| 5 | 919 | 15,243 | 0.825153 | 0.692837 |
| 7 | 918 | 8,154 | 0.825153 | 0.692837 |
| 9 | 920 | 17,283,288 | 0.958914 | 1.282161 |
| 11 | 924 | 12,208 | 0.97333 | 0.658246 |
| 13 | 923 | 21,365 | 0.97333 | 0.658246 |
| 15 | 922 | 20,359 | 0.97333 | 0.658246 |
| 17 | 921 | 19,346 | 0.97333 | 0.658246 |
| 19 | 926 | 12,17,208,292 | 0.88396 | 2.389868 |
| 20 | 925 | 12,208 | 0.88396 | 2.389868 |
| 21 | 961 | 4,105 | 0.959217 | 1.280721 |
| 23 | 960 | 8,155 | 0.959217 | 1.280721 |
| 25 | 959 | 12,208 | 0.959217 | 1.280721 |
| 27 | 928 | 1,60 | 0.963734 | 1.070297 |
| 29 | 927 | 32,513 | 0.963734 | 1.070297 |
| 31 | 929 | 6,140 | 0.786553 | 0.678649 |
| 33 | 932 | 1,40,46 | 0.824627 | 0.694182 |
+------+------------+---------------+----------+-----------+
20 rows in set (0.00 sec)
Result is awesome! But problem begins when I want to get all grouped data with faceted
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy where prof_area_ids=199 group by vacancy_id facet prof_area_ids;
+------+------------+-----------------+----------+-----------+
| id | vacancy_id | prof_area_ids | latitude | longitude |
+------+------------+-----------------+----------+-----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 |
| 191 | 1004 | 11,196,199 | 0.925335 | 2.768874 |
| 313 | 1072 | 1,11,60,197,199 | 0.963968 | 1.070624 |
| 318 | 1136 | 11,196,199 | 0.96071 | 1.448998 |
| 374 | 1097 | 11,199 | 0.785255 | 0.678504 |
+------+------------+-----------------+----------+-----------+
5 rows in set (0.00 sec)
+---------------+----------+
| prof_area_ids | count(*) |
+---------------+----------+
| 202 | 1 |
| 199 | 12 |
| 11 | 12 |
| 196 | 5 |
| 197 | 3 |
| 60 | 3 |
| 1 | 3 |
+---------------+----------+
7 rows in set (0.02 sec)
Faceted result is incorrect. Because in fact data's count where prof_area_ids=199 must be 5 and not 12. So how I can group field for faceted?
Additionaly
I fount here http://sphinxsearch.com/blog/2013/06/21/faceted-search-with-sphinx/ but just written "If you have a MVA facet, you need to use the GROUPBY() function which returns the actual value on which the grouping was made." and without examle.
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude,GROUPBY() as selected,COUNT(*) from jobVacancy where prof_area_ids=199 group by vacancy_id facet prof_area_ids;
+------+------------+-----------------+----------+-----------+----------+----------+
| id | vacancy_id | prof_area_ids | latitude | longitude | selected | count(*) |
+------+------------+-----------------+----------+-----------+----------+----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 | 917 | 1 |
| 191 | 1004 | 11,196,199 | 0.925335 | 2.768874 | 1004 | 2 |
| 313 | 1072 | 1,11,60,197,199 | 0.963968 | 1.070624 | 1072 | 3 |
| 318 | 1136 | 11,196,199 | 0.96071 | 1.448998 | 1136 | 3 |
| 374 | 1097 | 11,199 | 0.785255 | 0.678504 | 1097 | 3 |
+------+------------+-----------------+----------+-----------+----------+----------+
5 rows in set (0.00 sec)
+---------------+----------+
| prof_area_ids | count(*) |
+---------------+----------+
| 202 | 1 |
| 199 | 12 |
| 11 | 12 |
| 196 | 5 |
| 197 | 3 |
| 60 | 3 |
| 1 | 3 |
+---------------+----------+
7 rows in set (0.02 sec)
Also faceted result is wrong.
Seems, wanting effectively COUNT(DISTINCT vacancy_id) on the FACET rather than the default COUNT(*), but alas it turns out
... FACET prof_area_ids,COUNT(DISTINCT vacancy_id) AS vacancies BY prof_area_ids
doesnt work. The bit before BY only supports attributes, not custom functions.
... will just have to write it out the long way, with full queries...
select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy
where prof_area_ids=199 group by vacancy_id;
SELECT GROUPBY() AS prof_area_id, COUNT(DISTINCT vacancy_id) FROM jobVacancy
WHERE prof_area_ids=199 GROUP BY prof_area_id;
Same results, just slightly more verbose. ie rather than using FACET shorthand, write it
out in full, as multiple seperate queries.
Faceted result is incorrect. Because in fact data's count where prof_area_ids=199 must be 5 and not 12. So how I can group field for faceted?
It looks like you misunderstand how FACET works. It seems to me, that you think it takes as a base the main query's result, but it actually just does another grouping. E.g. here:
mysql> select g, t from idx_mva where t = 11 group by g facet t;
+------+----------+
| g | t |
+------+----------+
| 1 | 11,12 |
| 2 | 11,13,15 |
| 3 | 9,11 |
| 5 | 11,12,15 |
+------+----------+
4 rows in set (0.00 sec)
+------+----------+
| t | count(*) |
+------+----------+
| 12 | 2 |
| 11 | 6 |
| 15 | 4 |
| 13 | 1 |
| 9 | 1 |
| 3 | 1 |
+------+----------+
6 rows in set (0.00 sec)
for t=11 you can see that as in your case it's found 3 times in the 1st query's result, but the count for that is 6 in the FACET's query result. This is because it actually occurs 6 times in the index:
mysql> select * from idx_mva where t = 11;
+------+------+----------+
| id | g | t |
+------+------+----------+
| 2 | 1 | 11,12 |
| 3 | 1 | 11,15 |
| 3 | 2 | 11,13,15 |
| 6 | 3 | 9,11 |
| 8 | 5 | 11,12,15 |
| 11 | 2 | 3,11,15 |
+------+------+----------+
6 rows in set, 1 warning (0.00 sec)
and it happens 3 times in the 1st case only because the t's value is returned only once for each of the groups. You can use group_concat() to see more values from the same group:
mysql> select g, group_concat(to_string(t)) from idx_mva where t = 11 group by g facet t;
+------+----------------------------+
| g | group_concat(to_string(t)) |
+------+----------------------------+
| 1 | 11,12,11,15 |
| 2 | 11,13,15,3,11,15 |
| 3 | 9,11 |
| 5 | 11,12,15 |
+------+----------------------------+
4 rows in set (0.00 sec)
+------+----------+
| t | count(*) |
+------+----------+
| 12 | 2 |
| 11 | 6 |
| 15 | 4 |
| 13 | 1 |
| 9 | 1 |
| 3 | 1 |
+------+----------+
6 rows in set (0.00 sec)
If you want to learn more about faceting here's an interactive course about that - https://play.manticoresearch.com/faceting/

DB2 Query multiple select and sum by date

I have 3 tables: ITEMS, ODETAILS, OHIST.
ITEMS - a list of products, ID is the key field
ODETAILS - line items of every order, no key field
OHIST - a view showing last years order totals by month
ITEMS ODETAILS OHIST
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| ID | NAME | | OID | ODUE | ITEM_ID | ITEM_QTY | | ITEM_ID | M5QTY |
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| 10 + Widget10 | | A33 | 1180503 | 10 | 100 | | 10 | 1000 |
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| 11 + Widget11 | | A33 | 1180504 | 11 | 215 | | 11 | 1500 |
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| 12 + Widget12 | | A34 | 1180505 | 10 | 500 | | 12 | 2251 |
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| 13 + Widget13 | | A34 | 1180504 | 11 | 320 | | 13 | 4334 |
+----+----------+ +-----+---------+---------+----------+ +---------+-------+
| A34 | 1180504 | 12 | 450 |
+-----+---------+---------+----------+
| A34 | 1180505 | 13 | 125 |
+-----+---------+---------+----------+
Assuming today is May 2, 2018 (1180502).
I want my results to show ID, NAME, M5QTY, and SUM(ITEM_QTY) grouped by day
over the next 3 days (D1, D2, D3)
Desired Result
+----+----------+--------+------+------+------+
| ID | NAME | M5QTY | D1 | D2 | D3 |
+----+----------+--------+------+------+------+
| 10 | Widget10 | 1000 | 100 | | 500 |
+----+----------+--------+------+------+------+
| 11 | Widget11 | 1500 | | 535 | |
+----+----------+--------+------+------+------+
| 12 | Widget12 | 2251 | | 450 | |
+----+----------+--------+------+------+------+
| 13 | Widget13 | 4334 | | | 125 |
+----+----------+--------+------+------+------+
This is how I convert ODUE to a date
DATE(concat(concat(concat(substr(char((ODETAILS.ODUE-1000000)+20000000),1,4),'-'), concat(substr(char((ODETAILS.ODUE-1000000)+20000000),5,2), '-')), substr(char((ODETAILS.ODUE-1000000)+20000000),7,2)))
Try this (you can add the joins you need)
SELECT ITEM_ID
, SUM(CASE WHEN ODUE = INT(CURRENT DATE) - 19000000 + 1 THEN ITEM_QTY ELSE 0 END) AS D1
, SUM(CASE WHEN ODUE = INT(CURRENT DATE) - 19000000 + 2 THEN ITEM_QTY ELSE 0 END) AS D2
, SUM(CASE WHEN ODUE = INT(CURRENT DATE) - 19000000 + 3 THEN ITEM_QTY ELSE 0 END) AS D3
FROM
ODETAILS
GROUP BY
ITEM_ID

Retrieve additional columns on aggregation and date operator

I have the following PostgreSQL table structure, which gathers temperature records for every second:
+----+--------+-------------------------------+---------+
| id | value | date | station |
+----+--------+-------------------------------+---------+
| 1 | 0 | 2017-08-22 14:01:09.314625+02 | 1 |
| 2 | 0 | 2017-08-22 14:01:09.347758+02 | 1 |
| 3 | 25.187 | 2017-08-22 14:01:10.315413+02 | 1 |
| 4 | 24.937 | 2017-08-22 14:01:10.322528+02 | 1 |
| 5 | 25.187 | 2017-08-22 14:01:11.347271+02 | 1 |
| 6 | 24.937 | 2017-08-22 14:01:11.355005+02 | 1 |
| 18 | 24.875 | 2017-08-22 14:01:17.35265+02 | 1 |
| 19 | 25.187 | 2017-08-22 14:01:18.34673+02 | 1 |
| 20 | 24.875 | 2017-08-22 14:01:18.355082+02 | 1 |
| 21 | 25.187 | 2017-08-22 14:01:19.361491+02 | 1 |
| 22 | 24.875 | 2017-08-22 14:01:19.371154+02 | 1 |
| 23 | 25.187 | 2017-08-22 14:01:20.354576+02 | 1 |
| 30 | 24.937 | 2017-08-22 14:01:23.372612+02 | 1 |
| 31 | 0 | 2017-08-22 15:58:53.576238+02 | 1 |
| 32 | 0 | 2017-08-22 15:58:53.590872+02 | 1 |
| 33 | 26.625 | 2017-08-22 15:58:54.59986+02 | 1 |
| 38 | 26.375 | 2017-08-22 15:58:56.593205+02 | 1 |
| 39 | 0 | 2017-08-21 15:59:40.181317+02 | 1 |
| 40 | 0 | 2017-08-21 15:59:40.190221+02 | 1 |
| 41 | 26.562 | 2017-08-21 15:59:41.182622+02 | 1 |
| 42 | 26.375 | 2017-08-21 15:59:41.18905+02 | 1 |
+----+--------+-------------------------------+---------+
I want now to retrieve the maximum value for every hour, along with the data associated to that entry (id, date). As such, I tried the following:
select max(value) as m, (date_trunc('hour', date)) as d
from temperature
where station='1'
group by (date_trunc('hour', date));
Which works fine (fiddle), but I only get the columns m and d as a result. If I now try to add the date or id columns to the SELECT statement, I get the usual column "temperature.id" must appear in the GROUP BY clause or be used in an aggregate function error.
I have already tried approaches such as the ones described here, unfortunately to no avail, as for instance I seem to be unable to perform a join on the date_trunc-generated columns.
The result I am aiming for is this:
+----+--------+-------------------------------+---------+
| id | value | date | station |
+----+--------+-------------------------------+---------+
| 3 | 25.187 | 2017-08-22 14:01:10.315413+02 | 1 |
| 33 | 26.625 | 2017-08-22 15:58:54.59986+02 | 1 |
| 41 | 26.562 | 2017-08-21 15:59:41.182622+02 | 1 |
+----+--------+-------------------------------+---------+
It does not matter which record was retrieved in case two or more entries have the same value.
distinct on:
select distinct on (date_trunc('hour', date)) *
from temperature
where station = '1'
order by date_trunc('hour', date), value desc
Fiddle

How to preserve additional keys when using "SELECT DISTINCT"?

I'm looking to preserve the sid, and cid pairs that link my tables when using SELECT DISTINCT in my query. signature, ip_src, and ip_dst is what makes it distinct. I just want the output to also include the corresponding sid and cid pairs.
QUERY:
SELECT DISTINCT signature, ip_src, ip_dst FROM
(SELECT *
FROM event
INNER JOIN sensor ON (sensor.sid = event.sid)
INNER JOIN iphdr ON (iphdr.cid = event.cid) AND (iphdr.sid = event.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
ORDER BY timestamp DESC)
as d_dup;
OUTPUT:
signature | ip_src | ip_dst
-----------+------------+------------
29177 | 3244829114 | 2887777034
29177 | 2960340989 | 2887777034
29179 | 2887777893 | 2887777556
29178 | 1208608738 | 2887777034
29178 | 1211607091 | 2887777034
29177 | 776526845 | 2887777034
29177 | 1332731268 | 2887777034
(7 rows)
SUB QUERY:
SELECT *
FROM event
INNER JOIN sensor ON (sensor.sid = event.sid)
INNER JOIN iphdr ON (iphdr.cid = event.cid) AND (iphdr.sid = event.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
ORDER BY timestamp DESC;
OUTPUT:
sid | cid | signature | timestamp | sid | hostname | interface | filter | detail | encoding | last_cid | sid | cid | ip_src | ip_dst | ip_ver | ip_hlen | ip_tos | ip_len | ip_id | ip_flags | ip_off | ip_ttl | ip_proto | ip_csum
-----+-------+-----------+-------------------------+-----+---------------------+-----------+--------+--------+----------+----------+-----+-------+------------+------------+--------+---------+--------+--------+-------+----------+--------+--------+----------+---------
3 | 13123 | 29177 | 2014-11-15 20:53:14.656 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13123 | 3244829114 | 2887777034 | 4 | 5 | 0 | 344 | 19301 | 0 | 0 | 122 | 6 | 8686
3 | 13122 | 29177 | 2014-11-15 20:53:14.43 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13122 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 19071 | 0 | 0 | 122 | 6 | 9191
3 | 13121 | 29177 | 2014-11-15 18:45:13.461 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13121 | 3244829114 | 2887777034 | 4 | 5 | 0 | 366 | 25850 | 0 | 0 | 122 | 6 | 2115
3 | 13120 | 29177 | 2014-11-15 18:45:13.23 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13120 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 25612 | 0 | 0 | 122 | 6 | 2650
3 | 13119 | 29177 | 2014-11-15 18:45:01.887 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13119 | 3244829114 | 2887777034 | 4 | 5 | 0 | 352 | 13697 | 0 | 0 | 122 | 6 | 14282
3 | 13118 | 29177 | 2014-11-15 18:45:01.681 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13118 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 13464 | 0 | 0 | 122 | 6 | 14798
4 | 51 | 29179 | 2014-11-15 18:44:02.06 | 4 | VS-101-Z1:dna2:dna3 | dna2:dna3 | | 1 | 0 | 51 | 4 | 51 | 2887777893 | 2887777556 | 4 | 5 | 0 | 80 | 18830 | 0 | 0 | 63 | 17 | 40533
3 | 13117 | 29177 | 2014-11-15 18:41:46.418 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13117 | 1332731268 | 2887777034 | 4 | 5 | 0 | 261 | 15393 | 0 | 0 | 119 | 6 | 62131
...
(30 rows)
How do I keep the sid, and cid when using SELECT DISTINCT?
This is shorter and probably faster:
SELECT DISTINCT ON (signature, ip_src, ip_dst)
signature, ip_src, ip_dst, sid, cid
FROM event e
JOIN sensor s USING (sid)
JOIN iphdr i USING (cid, sid)
WHERE timestamp >= NOW() - '1 day'::interval
ORDER BY signature, ip_src, ip_dst, timestamp DESC;
Assuming you want the latest row (greatest timestamp) from each set of dupes.
Detailed explanation:
Select first row in each GROUP BY group?
Sounds like you are looking for a window function:
SELECT *
FROM (
SELECT *,
row_number() over (partition by signature, ip_src, ip_dst order by timestamp desc) as rn
FROM event
JOIN sensor ON sensor.sid = event.sid
JOIN iphdr ON iphdr.cid = event.cid AND iphdr.sid = event.sid
WHERE timestamp >= NOW() - interval '1' day
) as d_dup
where rn = 1
order by timestamp desc;
Maybe something like this?
SELECT DISTINCT e.sid, e.cid, ip_src, ip_dst
FROM event e
INNER JOIN sensor s ON (s.sid = e.sid)
INNER JOIN iphdr i ON (i.cid = e.cid) AND (i.sid = e.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL;
If you want the combination of (signature, ip_src, ip_dst) to be unique in the result (one row for each combination) then you can try something like this:
SELECT max(e.cid), max(e.sid), signature, ip_src, ip_dst
FROM event e
INNER JOIN sensor s ON (s.sid = e.sid)
INNER JOIN iphdr i ON (i.cid = e.cid) AND (i.sid = e.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
GROUP BY signature, ip_src, ip_dst;
But it will give max cid and sid for each combination