SELECT the most occuring value PostgreSQL - postgresql

I have a table structure using postgres:
tran_id (int), cust_id(int), movie(varchar(255))
Approximately the data is like this:
tran_id cust_id movie
------- ------- --------
1 1 Zootopia
2 1 Zootopia
3 1 Saw
4 2 Fight Club
5 2 Fight Club
6 3 Avengers: End Game
7 3 Avengers: End Game
8 3 Avengers: End Game
9 3 Iron Man
10 4 Fight Club
11 4 Fight Club
12 4 Percy Jackson
13 4 Space Jam
14 5 Inception
15 5 Inception
I want to query so that it will show favorite movie for each customer. The expected output is below:
cust_id fav_movie
------- -------------------
1 Zootopia
2 Fight Club
3 Avengers: End Game
4 Fight Club
5 Inception
This is my current script:
SELECT cust_id, movie, MAX(fav_movie) FROM
(SELECT cust_id, movie, COUNT(movie) AS fav_movie FROM transaction
GROUP BY cust_id, movie
ORDER BY cust_id) A
GROUP BY cust_id, movie;

Using DISTINCT ON:
SELECT DISTINCT ON (cust_id) cust_id, movie AS fav_movie
FROM "transaction"
ORDER BY cust_id, COUNT(*) OVER (PARTITION BY cust_id, movie) DESC;
Demo

I would first get the counts for each movie and then output the one with the highest count per customer:
SELECT DISTINCT ON (cust_id)
cust_id, movie
FROM (SELECT cust_id, movie,
count(*) AS c
FROM transaction
GROUP BY cust_id, movie) AS grouped
ORDER BY cust_id, c DESC;

Related

Group data based on higest value of a partcular column

I have two tables department and category.In my query result listing same department multiple times.I want to group departments based on category table priority column.
Eg Data:
id dept_name dept_category priority
1 Cardio category 2 2
2 Ortho category 3 3
3 Ortho category 1 1
4 ENT category 1 1
5 Ortho category 2 2
I wannt the reslt like:
id dept_name dept_category priority
1 Cardio category 2 2
3 Ortho category 1 1
4 ENT category 1 1
How can i fetch the resul like this.In my category table cantain priority 1,2,3 then 1 is highest priority.I need to group department based on highest priority.
We can use DISTINCT ON with Postgres:
SELECT DISTINCT ON (dept_name) *
FROM yourTable
ORDER BY dept_name, priority;

How to calculate the number of messages within 10 seconds before the previous one?

I have a table with messages and I need to find chats where were two or more messages in period of 10 seconds. table
id message_id time
1 1 2021.11.10 13:09:00
1 2 2021.11.10 13:09:01
1 3 2021.11.10 13:09:50
2 1 2021.11.10 15:18:00
2 2 2021.11.10 15:20:00
3 1 2021.11.12 15:00:00
3 2 2021.11.12 15:10:00
3 2 2021.11.12 15:10:10
So the result looks like
id
1
3
I can't come up with the idea how to group by a period or maybe it can be done other way?
select id
from t
group by id, ?
having count(message_id) > 1
You can join the table with itself, matching them on the chat id and your timeframe.
create table messages (chat_id integer,message_id integer,"time" timestamp);
insert into messages values
(1,1,'2021.11.10 13:09:00'),
(1,2,'2021.11.10 13:09:01'),
(1,3,'2021.11.10 13:09:50'),
(2,1,'2021.11.10 15:18:00'),
(2,2,'2021.11.10 15:20:00'),
(3,1,'2021.11.12 15:00:00'),
(3,2,'2021.11.12 15:10:00'),
(3,2,'2021.11.12 15:10:10');
select target_chat,
target_message,
count(*) "number of messages preceding by no more than 10 seconds"
from
(select t1.chat_id target_chat,
t1.message_id target_message,
t1.time,
t2.chat_id,
t2.message_id,
t2.time
from messages t1
inner join messages t2
on t1.chat_id=t2.chat_id
and t1.message_id<>t2.message_id
and (t2.time<=t1.time-'10 seconds'::interval and t2.time<=t1.time)) a
group by 1,2;
-- target_chat | target_message | number of messages preceding by no more than 10 seconds
---------------+----------------+---------------------------------------------------------
-- 1 | 3 | 2
-- 2 | 2 | 1
-- 3 | 2 | 2
--(3 rows)
From that you can select the records with your desired number of preceding messages.
this is a simple query that finds every previous value that is included in our interval
select id from test_table t where
t.time + interval '10 second' >=
(select time from test_table where id=t.id and time>t.time limit 1)
group by id;
results
id
----
1
3
To find rows within an period of time, you can tipically use a window function which avoids a self join on the table :
SELECT id, count(*) OVER (ORDER BY time RANGE BETWEEN CURRENT ROW AND '10 minutes' FOLLOWING)
FROM t
GROUP BY id
Then you can use this query as a sub-query if you only want the id with count(*) > 1 :
SELECT DISTINCT ON (l.id) l.id
FROM
( SELECT id, count(*) OVER (ORDER BY time RANGE BETWEEN CURRENT ROW AND '10 minutes' FOLLOWING) AS ct
FROM t
GROUP BY id
) AS l
WHERE l.ct > 1 ;

How to optimize query

I have the same problem as mentioned in In SQL, how to select the top 2 rows for each group. The answer is working fine. But it takes too much time. How to optimize this query?
Example:
sample_table
act_id: act_cnt:
1 1
2 1
3 1
4 1
5 1
6 3
7 3
8 3
9 4
a 4
b 4
c 4
d 4
e 4
Now i want to group it (or using some other ways). And i want to select 2 rows from each group. Sample Output:
act_id: act_cnt:
1 1
2 1
6 3
7 3
9 4
a 4
I am new to SQL. How to do it?
The answer you linked to uses an inefficient workaround for MySQL's lack of window functions.
Using a window function is most probably much faster as you only need to read the table once:
select name,
score
from (
select name,
score,
dense_rank() over (partition by name order by score desc) as rnk
from the_table
) t
where rnk <= 2;
SQLFiddle: http://sqlfiddle.com/#!15/b0198/1
Having an index on (name, score) should speed up this query.
Edit after the question (and the problem) has been changed
select act_id,
act_cnt
from (
select act_id,
act_cnt,
row_number() over (partition by act_cnt order by act_id) as rn
from sample_table
) t
where rn <= 2;
New SQLFiddle: http://sqlfiddle.com/#!15/fc44b/1

Counting dates that fall between two dates in the same column

I have two tables and for each ID and Level combination in table1, I need to get a count of times matching ID appears in table2 in between sequential times for levels in table1.
So for example, for ID = 1 and Level=1 in table1, two Time entries from table2 for ID=1 fall between Time of Level=1 and Level=2 in table1, so result will be 2 in the result table.
table1:
ID Level Time
1 1 6/7/13 7:03
1 2 6/9/13 7:05
1 3 6/12/13 12:02
1 4 6/17/13 5:01
2 1 6/18/13 8:38
2 3 6/20/13 9:38
2 4 6/23/13 10:38
2 5 6/28/13 1:38
table2:
ID Time
1 6/7/13 11:51
1 6/7/13 14:15
1 6/9/13 16:39
1 6/9/13 19:03
2 6/20/13 11:02
2 6/20/13 15:50
Result would be
ID Level Count
1 1 2
1 2 2
1 3 0
1 4 0
2 1 0
2 3 2
2 4 0
2 5 0
select transformed_tab1.id, transformed_tab1.level, count(tab2.id)
from
(select tab1.id, tab1.level, tm, lead(tm) over (partition by id order by tm) as next_tm
from
(
select 1 as id, 1 as level, '2013-06-07 07:03'::timestamp as tm union
select 1 as id, 2 as level, '2013-06-09 07:05 '::timestamp as tm union
select 1 as id, 3 as level, '2013-06-12 12:02'::timestamp as tm union
select 1 as id, 4 as level, '2013-06-17 05:01'::timestamp as tm union
select 2 as id, 1 as level, '2013-06-18 08:38'::timestamp as tm union
select 2 as id, 3 as level, '2013-06-20 09:38'::timestamp as tm union
select 2 as id, 4 as level, '2013-06-23 10:38'::timestamp as tm union
select 2 as id, 5 as level, '2013-06-28 01:38'::timestamp as tm) tab1
) transformed_tab1
left join
(select 1 as id, '2013-06-07 11:51'::timestamp as tm union
select 1 as id, '2013-06-07 14:15'::timestamp as tm union
select 1 as id, '2013-06-09 16:39'::timestamp as tm union
select 1 as id, '2013-06-09 19:03'::timestamp as tm union
select 2 as id, '2013-06-20 11:02'::timestamp as tm union
select 2 as id, '2013-06-20 15:50'::timestamp as tm) tab2
on transformed_tab1.id=tab2.id and tab2.tm between transformed_tab1.tm and transformed_tab1.next_tm
group by transformed_tab1.id, transformed_tab1.level
order by transformed_tab1.id, transformed_tab1.level
;
SQL Fiddle
select t1.id, level, count(t2.id)
from
(
select id, level,
tsrange(
"time",
lead("time", 1, 'infinity') over(
partition by id order by level
),
'[)'
) as time_range
from t1
) t1
left join
t2 on t1.id = t2.id and t1.time_range #> t2."time"
group by t1.id, level
order by t1.id, level
The solution starts creating a range of timestamps using the lead window function. Notice the [) parameter to the tsrange constructor. It means to include the lower and exclude the upper bound.
Then it joins the two tables with the #> range operator. It means the range includes the element.
It is necessary to left join t1 to have the zero counts.

SQL Server Multiple Running Totals

I have a table like this
UserID Score Date
5 6 2010-1-1
7 8 2010-1-2
5 4 2010-1-3
6 3 2010-1-4
7 4 2010-1-5
6 1 2010-1-6
I would like to get a table like this
UserID Score RunningTotal Date
5 6 6 2010-1-1
5 4 10 2010-1-3
6 3 3 2010-1-4
6 1 4 2010-1-6
7 8 8 2010-1-2
7 4 12 2010-1-5
Thanks!
Unlike Oracle, PostgreSQL and even MySQL, SQL Server has no efficient way to calculate running totals.
If you have few scores per UserID, you can use this:
SELECT userId,
(
SELECT SUM(score)
FROM scores si
WHERE si.UserID = so.UserID
AND si.rn <= so.rn
)
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY UserID) AS rn
FROM scores
) so
, however, this will be very inefficient for larger tables.
For larger tables, you could benefit from using (God help me) a cursor.
Would something like this work for you...?
SELECT UserID, Score,
(SELECT SUM(Score)
FROM TableName innerTable
WHERE innerTable.UserID = outerTable.userID
AND innerTable.Date <= outerTable.date) AS RunningTotal
FROM TableName outerTable
This assumes, though, that a user cannot have more than one score per day. (What is your PK?)