group by breaks order by - postgresql

Sorry for the long post I'm new to this & want to make sure that I'm fully understood.
I'm trying to make an order by & group query.
I've started with the order by part:
SELECT "tId", "mId","sId","tr", "tg","tp", "date"
FROM table
WHERE "tId" =1
ORDER BY "date" DESC, "mId","sId";
the ouput:
tId | mId | sId | tr | tg | tp | date
-----+-------+------+-----+----+-------+------------------------
1 | 5 | 2 | -73 | 1 | 122 | 2007-01-01 02:03:01+02
1 | 5 | 1 | -72 | 1 | 122 | 2007-01-01 02:02:01+02
1 | 4 | 1 | -70 | 1 | 120 | 2007-01-01 01:01:01+02
1 | 1 | 1 | -30 | 0 | 0 | 2004-10-19 10:23:54+02
1 | 1 | 2 | -31 | 0 | 0 | 2004-10-19 10:23:54+02
1 | 1 | 3 | -32 | 0 | 0 | 2004-10-19 10:23:54+02
1 | 2 | 1 | -40 | 0 | 0 | 2004-10-19 10:23:54+02
1 | 2 | 2 | -41 | 0 | 0 | 2004-10-19 10:23:54+02
1 | 2 | 3 | -42 | 0 | 0 | 2004-10-19 10:23:54+02
1 | 3 | 1 | -50 | 0 | 0 | 2004-10-19 10:23:54+02
1 | 3 | 3 | -50 | 0 | 0 | 2004-10-19 10:23:54+02
The query I would like to do is to group the output of the prev' result and to get:
mId | agg_r | agg_tg | agg_tp | agg_sid | agg_date
-----+--------------+---------+-----------+----------+------------------------------------------------------------------------------
5 | {-73,-72} | {1,1} | {122,122} | {2,1} | {"2007-01-01 02:03:01+02","2007-01-01 02:02:01+02"}
4 | {-70} | {1} | {120} | {1} | {"2007-01-01 01:01:01+02"}
1 | {-30,-31,-32} | {0,0,0} | {0,0,0} | {1,2,3} | {"2004-10-19 10:23:54+02","2004-10-19 10:23:54+02","2004-10-19 10:23:54+02"}
2 | {-40,-41,-42} | {0,0,0} | {0,0,0} | {1,2,3} | {"2004-10-19 10:23:54+02","2004-10-19 10:23:54+02","2004-10-19 10:23:54+02"}
3 | {-50,-50} | {0,0} | {0,0} | {1,3} | {"2004-10-19 10:23:54+02","2004-10-19 10:23:54+02"}
So I've assumed this would work:
SELECT "mId", array_agg("tr") AS agg_r, array_agg("tg") AS agg_tg, array_agg("tp") AS agg_tp, array_agg("sId") AS agg_sid ,array_agg("date") AS agg_date
FROM (
SELECT "tId", "mId","sId","tr", "tg","tp", "date"
FROM table
WHERE "tId" =1
ORDER BY "date" DESC, "mId","sId"
)AS qRes
GROUP BY qRes."mId";
But I'm getting:
mId | agg_r | agg_tg | agg_tp | agg_sid | agg_date
-----+--------------+---------+-----------+----------+------------------------------------------------------------------------------
1 | {-30,-31,-32} | {0,0,0} | {0,0,0} | {1,2,3} | {"2004-10-19 10:23:54+02","2004-10-19 10:23:54+02","2004-10-19 10:23:54+02"}
4 | {-70} | {1} | {120} | {1} | {"2007-01-01 01:01:01+02"}
2 | {-40,-41,-42} | {0,0,0} | {0,0,0} | {1,2,3} | {"2004-10-19 10:23:54+02","2004-10-19 10:23:54+02","2004-10-19 10:23:54+02"}
3 | {-50,-50} | {0,0} | {0,0} | {1,3} | {"2004-10-19 10:23:54+02","2004-10-19 10:23:54+02"}
5 | {-73,-72} | {1,1} | {122,122} | {2,1} | {"2007-01-01 02:03:01+02","2007-01-01 02:02:01+02"}
What am I missing? why does the grouping changes the order?

Like the comment says, there isn't any order on the outer query.
Notice the last line.
SELECT "mId", array_agg("tr") AS agg_r, array_agg("tg") AS agg_tg, array_agg("tp") AS agg_tp, array_agg("sId") AS agg_sid ,array_agg("date") AS agg_date
FROM (
SELECT "tId", "mId","sId","tr", "tg","tp", "date"
FROM table
WHERE "tId" =1
ORDER BY "date" DESC, "mId","sId"
)AS qRes
GROUP BY qRes."mId"
order by max("date") desc;

Related

tableau calculate cumulative value with condition

I have a tableau table with columns like this:
| ID | ww | count_flag |
| 1 | ww1 | 0 |
| 1 | ww2 | 1 |
| 1 | ww3 | 1 |
| 1 | ww4 | 0 |
| 1 | ww5 | 1 |
| 2 | ww1 | 1 |
| 2 | ww2 | 1 |
| 2 | ww3 | 1 |
| 2 | ww4 | 0 |
| 2 | ww5 | 1 |
...
Now I'd like to add a new column to show the consistent status for each ID among all the ww(workweek), the consistent status will be reset every time when the count_flag is 0 or ID changes, so it will look like below:
|ID | ww | count_flag | consistent status|
| 1 | ww1 | 0 | 0 |
| 1 | ww2 | 1 | 1 |
| 1 | ww3 | 1 | 2 |
| 1 | ww4 | 0 | 0 |
| 1 | ww5 | 1 | 1 |
| 2 | ww1 | 1 | 1 |
| 2 | ww2 | 1 | 2 |
| 2 | ww3 | 1 | 3 |
| 2 | ww4 | 0 | 0 |
| 2 | ww5 | 1 | 1 |
...
How should I create the calculating field to add such a parameter to the table column.

count continuously postgresql data

i need help with counting some data
this what i want
| user_id | action_id | count |
-------------------------------------
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 2 |
| 4 | 3 | 1 |
| 5 | 3 | 2 |
| 6 | 3 | 3 |
| 7 | 4 | 1 |
| 8 | 5 | 1 |
| 9 | 5 | 2 |
| 10 | 6 | 1 |
this is what i have
| user_id | action_id | count |
-------------------------------
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 1 |
| 4 | 3 | 1 |
| 5 | 3 | 1 |
| 6 | 3 | 1 |
| 7 | 4 | 1 |
| 8 | 5 | 1 |
| 9 | 5 | 1 |
| 10 | 6 | 1 |
i really need it for create some research about second action from users
how do i do it?
thank you
Using ROW_NUMBER should work here:
SELECT
user_id,
action_id,
ROW_NUMBER() OVER (PARTITION BY action_id ORDER BY user_id) count
FROM yourTable
ORDER BY
user_id;
Demo

SQL query help: Calculate max of previous rows in the same query

I want to find for each row(where B = C = D = 1), the max of A among its previous rows(where B = C = D = 1) excluding its row after its ordered in chronological order.
Data in table looks like this:
+-------+-----+-----+-----+------+------+
|Grp id | B | C | D | A | time |
+-------+---- +-----+-----+------+------+
| 111 | 1 | 0 | 0 | 52 | t |
| 111 | 1 | 1 | 1 | 33 | t+1 |
| 111 | 0 | 1 | 0 | 34 | t+2 |
| 111 | 1 | 1 | 1 | 22 | t+3 |
| 111 | 0 | 0 | 0 | 12 | t+4 |
| 222 | 1 | 1 | 1 | 16 | t |
| 222 | 1 | 0 | 0 | 18 | t2+1 |
| 222 | 1 | 1 | 0 | 13 | t2+2 |
| 222 | 1 | 1 | 1 | 12 | t2+3 |
| 222 | 1 | 1 | 1 | 09 | t2+4 |
| 222 | 1 | 1 | 1 | 22 | t2+5 |
| 222 | 1 | 1 | 1 | 19 | t2+6 |
+-------+-----+-----+-----+------+------+
Above table is resultant of below query. Its obtained after left joins as below. Joins are necessary according to my project requirement.
SELECT Grp id, B, C, D, A, time, xxx
FROM "DCR" dcr
LEFT JOIN "DCM" dcm ON "Id" = dcm."DCRID"
LEFT JOIN "DC" dc ON dc."Id" = dcm."DCID"
ORDER BY dcr."time"
Result column needs to be evaluated based on formula I mentioned above. It needs to be calculated in same pass as we need to consider only its previous rows. Above xxx needs to be replaced by a subquery/statement to obtain the result.
And the result table should look like this:
+-------+-----+-----+-----+------+------+------+
|Grp id | B | C | D | A | time |Result|
+-------+---- +-----+-----+------+------+------+
| 111 | 1 | 0 | 0 | 52 | t | - |
| 111 | 1 | 1 | 1 | 33 | t+1 | - |
| 111 | 1 | 1 | 1 | 34 | t+2 | 33 |
| 111 | 1 | 1 | 1 | 22 | t+3 | 34 |
| 111 | 0 | 0 | 0 | 12 | t+4 | - |
| 222 | 1 | 1 | 1 | 16 | t | - |
| 222 | 1 | 0 | 0 | 18 | t2+1 | - |
| 222 | 1 | 1 | 0 | 13 | t2+2 | - |
| 222 | 1 | 1 | 1 | 12 | t2+3 | 16 |
| 222 | 1 | 1 | 1 | 09 | t2+4 | 16 |
| 222 | 1 | 1 | 1 | 22 | t2+5 | 16 |
| 222 | 1 | 1 | 1 | 19 | t2+6 | 22 |
+-------+-----+-----+-----+------+------+------+
The column could be computed with a window function:
CASE WHEN b = 1 AND c = 1 AND d = 1
THEN max(a) FILTER (WHERE b = 1 AND c = 1 AND d = 1)
OVER (PARTITION BY "grp id"
ORDER BY time
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
ELSE NULL
END
I didn't test it.

Retrieve additional columns on aggregation and date operator

I have the following PostgreSQL table structure, which gathers temperature records for every second:
+----+--------+-------------------------------+---------+
| id | value | date | station |
+----+--------+-------------------------------+---------+
| 1 | 0 | 2017-08-22 14:01:09.314625+02 | 1 |
| 2 | 0 | 2017-08-22 14:01:09.347758+02 | 1 |
| 3 | 25.187 | 2017-08-22 14:01:10.315413+02 | 1 |
| 4 | 24.937 | 2017-08-22 14:01:10.322528+02 | 1 |
| 5 | 25.187 | 2017-08-22 14:01:11.347271+02 | 1 |
| 6 | 24.937 | 2017-08-22 14:01:11.355005+02 | 1 |
| 18 | 24.875 | 2017-08-22 14:01:17.35265+02 | 1 |
| 19 | 25.187 | 2017-08-22 14:01:18.34673+02 | 1 |
| 20 | 24.875 | 2017-08-22 14:01:18.355082+02 | 1 |
| 21 | 25.187 | 2017-08-22 14:01:19.361491+02 | 1 |
| 22 | 24.875 | 2017-08-22 14:01:19.371154+02 | 1 |
| 23 | 25.187 | 2017-08-22 14:01:20.354576+02 | 1 |
| 30 | 24.937 | 2017-08-22 14:01:23.372612+02 | 1 |
| 31 | 0 | 2017-08-22 15:58:53.576238+02 | 1 |
| 32 | 0 | 2017-08-22 15:58:53.590872+02 | 1 |
| 33 | 26.625 | 2017-08-22 15:58:54.59986+02 | 1 |
| 38 | 26.375 | 2017-08-22 15:58:56.593205+02 | 1 |
| 39 | 0 | 2017-08-21 15:59:40.181317+02 | 1 |
| 40 | 0 | 2017-08-21 15:59:40.190221+02 | 1 |
| 41 | 26.562 | 2017-08-21 15:59:41.182622+02 | 1 |
| 42 | 26.375 | 2017-08-21 15:59:41.18905+02 | 1 |
+----+--------+-------------------------------+---------+
I want now to retrieve the maximum value for every hour, along with the data associated to that entry (id, date). As such, I tried the following:
select max(value) as m, (date_trunc('hour', date)) as d
from temperature
where station='1'
group by (date_trunc('hour', date));
Which works fine (fiddle), but I only get the columns m and d as a result. If I now try to add the date or id columns to the SELECT statement, I get the usual column "temperature.id" must appear in the GROUP BY clause or be used in an aggregate function error.
I have already tried approaches such as the ones described here, unfortunately to no avail, as for instance I seem to be unable to perform a join on the date_trunc-generated columns.
The result I am aiming for is this:
+----+--------+-------------------------------+---------+
| id | value | date | station |
+----+--------+-------------------------------+---------+
| 3 | 25.187 | 2017-08-22 14:01:10.315413+02 | 1 |
| 33 | 26.625 | 2017-08-22 15:58:54.59986+02 | 1 |
| 41 | 26.562 | 2017-08-21 15:59:41.182622+02 | 1 |
+----+--------+-------------------------------+---------+
It does not matter which record was retrieved in case two or more entries have the same value.
distinct on:
select distinct on (date_trunc('hour', date)) *
from temperature
where station = '1'
order by date_trunc('hour', date), value desc
Fiddle

How to preserve additional keys when using "SELECT DISTINCT"?

I'm looking to preserve the sid, and cid pairs that link my tables when using SELECT DISTINCT in my query. signature, ip_src, and ip_dst is what makes it distinct. I just want the output to also include the corresponding sid and cid pairs.
QUERY:
SELECT DISTINCT signature, ip_src, ip_dst FROM
(SELECT *
FROM event
INNER JOIN sensor ON (sensor.sid = event.sid)
INNER JOIN iphdr ON (iphdr.cid = event.cid) AND (iphdr.sid = event.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
ORDER BY timestamp DESC)
as d_dup;
OUTPUT:
signature | ip_src | ip_dst
-----------+------------+------------
29177 | 3244829114 | 2887777034
29177 | 2960340989 | 2887777034
29179 | 2887777893 | 2887777556
29178 | 1208608738 | 2887777034
29178 | 1211607091 | 2887777034
29177 | 776526845 | 2887777034
29177 | 1332731268 | 2887777034
(7 rows)
SUB QUERY:
SELECT *
FROM event
INNER JOIN sensor ON (sensor.sid = event.sid)
INNER JOIN iphdr ON (iphdr.cid = event.cid) AND (iphdr.sid = event.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
ORDER BY timestamp DESC;
OUTPUT:
sid | cid | signature | timestamp | sid | hostname | interface | filter | detail | encoding | last_cid | sid | cid | ip_src | ip_dst | ip_ver | ip_hlen | ip_tos | ip_len | ip_id | ip_flags | ip_off | ip_ttl | ip_proto | ip_csum
-----+-------+-----------+-------------------------+-----+---------------------+-----------+--------+--------+----------+----------+-----+-------+------------+------------+--------+---------+--------+--------+-------+----------+--------+--------+----------+---------
3 | 13123 | 29177 | 2014-11-15 20:53:14.656 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13123 | 3244829114 | 2887777034 | 4 | 5 | 0 | 344 | 19301 | 0 | 0 | 122 | 6 | 8686
3 | 13122 | 29177 | 2014-11-15 20:53:14.43 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13122 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 19071 | 0 | 0 | 122 | 6 | 9191
3 | 13121 | 29177 | 2014-11-15 18:45:13.461 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13121 | 3244829114 | 2887777034 | 4 | 5 | 0 | 366 | 25850 | 0 | 0 | 122 | 6 | 2115
3 | 13120 | 29177 | 2014-11-15 18:45:13.23 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13120 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 25612 | 0 | 0 | 122 | 6 | 2650
3 | 13119 | 29177 | 2014-11-15 18:45:01.887 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13119 | 3244829114 | 2887777034 | 4 | 5 | 0 | 352 | 13697 | 0 | 0 | 122 | 6 | 14282
3 | 13118 | 29177 | 2014-11-15 18:45:01.681 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13118 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 13464 | 0 | 0 | 122 | 6 | 14798
4 | 51 | 29179 | 2014-11-15 18:44:02.06 | 4 | VS-101-Z1:dna2:dna3 | dna2:dna3 | | 1 | 0 | 51 | 4 | 51 | 2887777893 | 2887777556 | 4 | 5 | 0 | 80 | 18830 | 0 | 0 | 63 | 17 | 40533
3 | 13117 | 29177 | 2014-11-15 18:41:46.418 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13117 | 1332731268 | 2887777034 | 4 | 5 | 0 | 261 | 15393 | 0 | 0 | 119 | 6 | 62131
...
(30 rows)
How do I keep the sid, and cid when using SELECT DISTINCT?
This is shorter and probably faster:
SELECT DISTINCT ON (signature, ip_src, ip_dst)
signature, ip_src, ip_dst, sid, cid
FROM event e
JOIN sensor s USING (sid)
JOIN iphdr i USING (cid, sid)
WHERE timestamp >= NOW() - '1 day'::interval
ORDER BY signature, ip_src, ip_dst, timestamp DESC;
Assuming you want the latest row (greatest timestamp) from each set of dupes.
Detailed explanation:
Select first row in each GROUP BY group?
Sounds like you are looking for a window function:
SELECT *
FROM (
SELECT *,
row_number() over (partition by signature, ip_src, ip_dst order by timestamp desc) as rn
FROM event
JOIN sensor ON sensor.sid = event.sid
JOIN iphdr ON iphdr.cid = event.cid AND iphdr.sid = event.sid
WHERE timestamp >= NOW() - interval '1' day
) as d_dup
where rn = 1
order by timestamp desc;
Maybe something like this?
SELECT DISTINCT e.sid, e.cid, ip_src, ip_dst
FROM event e
INNER JOIN sensor s ON (s.sid = e.sid)
INNER JOIN iphdr i ON (i.cid = e.cid) AND (i.sid = e.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL;
If you want the combination of (signature, ip_src, ip_dst) to be unique in the result (one row for each combination) then you can try something like this:
SELECT max(e.cid), max(e.sid), signature, ip_src, ip_dst
FROM event e
INNER JOIN sensor s ON (s.sid = e.sid)
INNER JOIN iphdr i ON (i.cid = e.cid) AND (i.sid = e.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
GROUP BY signature, ip_src, ip_dst;
But it will give max cid and sid for each combination