Postgresql : Filtering duplicate pair

Postgresql : Filtering duplicate pair - postgresql

I am asking this from mobile, so apologies for bad formatting. For the following table.
Table players
| ID | name |matches_won|
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
| 1 | bob | 3 |
| 2 | Paul | 2 |
| 3 | John | 4 |
| 4 | Jim | 1 |
| 5 | hal | 0 |
| 6 | fin | 0 |
I want to pair two players together in a query. Who have a similar or near similar the number of matches won. So the query should display the following result.
| ID | NAME | ID | NAME |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
| 3 | John | 1 | bob |
| 2 | paul | 4 | Jim |
| 5 | hal | 6 | fin |
Until now I have tried this query. But it gives repeat pairs.
Select player1.ID,player1.name,player2.ID,player2.name
From player as player1,
player as player2
Where
player1.matches_won >= player2.matches_won
And player1.ID ! = player2.ID;
The query will pair the player with the most won matches with everyone of the other players. While I only want one player to appear only once in the result. With the player who is nearest to his wins.
I have tried sub queries. But I don't know how to go about it, since it only returns one result. Also aggregates don't work in the where clause. So I am not sure how to achieve this.

An easier way, IMHO, to achieve this would be to order the players by their number of wins, divide these ranks by two to create matches and self join. CTEs (with expressions) allow you to do this relatively elegantly:
WITH wins AS (
SELECT id, name, ROW_NUMNBER() OVER (ORDER BY matches_won DESC) AS rn
FROM players
)
SELECT w1.id, w1.name, w2.id, w2.name
FROM (SELECT id, name, rn / 2 AS rn
FROM wins
WHERE rn % 2 = 1) w1
LEFT JOIN (SELECT id, name, (rn - 1) / 2 AS rn
FROM wins
WHERE rn % 2 = 0) w2 ON w1.rn = w2.rn

Add row numbers in descending order by won matches to the table and join odd row numbers with adjacent even row numbers:
with players as (
select *, row_number() over (order by matches_won desc) rn
from player)
select a.id, a.name, b.id, b.name
from players a
join players b
on a.rn = b.rn- 1
where a.rn % 2 = 1
id | name | id | name
----+------+----+------
3 | John | 1 | bob
2 | Paul | 4 | Jim
5 | hal | 6 | fin
(3 rows)

Related

Distinct Count Dates by timeframe

I am trying to find the daily count of frequent visitors from a very large data-set. Frequent visitors in this case are visitor IDs used on 2 distinct days in a rolling 3 day period.
My data set looks like the below:
ID | Date | Location | State | Brand |
1 | 2020-01-02 | A | CA | XYZ |
1 | 2020-01-03 | A | CA | BCA |
1 | 2020-01-04 | A | CA | XYZ |
1 | 2020-01-06 | A | CA | YQR |
1 | 2020-01-06 | A | WA | XYZ |
2 | 2020-01-02 | A | CA | XYZ |
2 | 2020-01-05 | A | CA | XYZ |
This is the result I am going for. The count in the visits column is equal to the count of distinct days from the date column, -2 days for each ID. So for ID 1 on 2020-01-05, there was a visit on the 3rd and 4th, so the count is 2.
Date | ID | Visits | Frequent Prior 3 Days
2020-01-01 |Null| Null | Null
2020-01-02 | 1 | 1 | No
2020-01-02 | 2 | 1 | No
2020-01-03 | 1 | 2 | Yes
2020-01-03 | 2 | 1 | No
2020-01-04 | 1 | 3 | Yes
2020-01-04 | 2 | 1 | No
2020-01-05 | 1 | 2 | Yes
2020-01-05 | 2 | 1 | No
2020-01-06 | 1 | 2 | Yes
2020-01-06 | 2 | 1 | No
2020-01-07 | 1 | 1 | No
2020-01-07 | 2 | 1 | No
2020-01-08 | 1 | 1 | No
2020-01-09 | 1 | null | Null
I originally tried to use the following line to get the result for the visits column, but end up with 3 in every successive row at whichever date it first got to 3 for that ID.
,
count(ID) over (Partition by ID order by Date ASC rows between 3 preceding and current row) as visits
I've scoured the forum, but every somewhat similar question seems to involve counting the values rather than the dates and haven't been able to figure out how to tweak to get what I need. Any help is much appreciated.

You can aggregate the dataset by user and date, then use window functions with a range frame to look at the three preceding rows.
You did not tell which database you are running - and not all databases support the window ranges, nor have the same syntax for literal intervals. In standard SQL, you would go:
select
id,
date,
count(*) cnt_visits
case
when sum(count(*)) over(
partition by id
order by date
range between interval '3' day preceding and current row
) >= 2
then 'Yes'
else 'No'
end is_frequent_visitor
from mytable
group by id, date
On the other hand, if you want a record for every user and every day (event when there is no visit), then it is a bit different. You can generate the dataset first, then bring the table with a left join:
select
i.id,
d.date,
count(t.id) cnt_visits,
case
when sum(count(t.id)) over(
partition by i.id
order by d.date
rows between '3' day preceding and current row
) >= 2
then 'Yes'
else 'No'
end is_frequent_visitor
from (select distinct id from mytable) i
cross join (select distinct date from mytable) d
left join mytable t
on t.date = d.date
and t.id = i.id
group by i.id, d.date

I would be inclined to approach this by expanding out the days and visitors using a cross join and then just window functions. Assuming you have all dates in the data:
select i.id, d.date,
count(t.id) over (partition by i.id
order by d.date
rows between 2 preceding and current row
) as cnt_visits,
(case when count(t.id) over (partition by i.id
order by d.date
rows between 2 preceding and current row
) >= 2
then 'Yes' else 'No'
end) as is_frequent_visitor
from (select distinct id from t) i cross join
(select distinct date from t) d left join
(select distinct id, date from t) t
on t.date = d.date and
t.id = i.id;

How to display the max value of a column with respect to only one other column?

I have a dataset like so:
story_name | users | age | reading_counts
-------------------+--------+-----+----------------
Humpty Dumpty | Elaine | 5 | 10
Wheels on the Bus | Simon | 3 | 15
Dr.Seuss | Simon | 3 | 12
asd | Simon | 3 | 10
dsf | Simon | 3 | 6
Dr.Seuss | Elaine | 5 | 3
asd | Elaine | 5 | 7
(7 rows)
I want to be able to write a Query to display a the MAX reading counts with respect to each unique user. So something like this:
story_name | users | reading_counts
-------------------+--------+----------------
Humpty Dumpty | Elaine | 10
Wheels on the Bus | Simon | 15
So far I have this query:
SELECT story_name, users, reading_counts
FROM story
WHERE reading_counts IN (SELECT MAX(reading_counts) FROM story GROUP BY users);
and I get this result:
story_name | users | reading_counts
-------------------+--------+----------------
Humpty Dumpty | Elaine | 10
Wheels on the Bus | Simon | 15
asd | Simon | 10
(3 rows)

You can use window function rank() in a subquery to assign a rank to each record within groups of records having the same user, order by descending reading count, and then filter on the top record in each group in the outer query:
select story_name, users, reading_counts
from (
select
t.*,
rank() over(partition by users order by reading_counts desc) rn
from mytable t
) x
where rn = 1
Note: if there top ties for a given user (ie several records that have the same, maximum reading count), they will all be returned.
Another solution, that could possibly offer better performance, is to use a correlated subquery to do the filtering, like so:
select story_name, users, reading_counts
from mytable t
where reading_counts = (
select max(reading_counts) from mytable t1 where t1.users = t.users
)
This query would take advantage of an index on users.

You couls use a subquery for max for each user and join
SELECT story_name, users, reading_counts
FROM story
INNER JOIN (
SELECT user, MAX(reading_counts) max_x_user
FROM story
GROUP BY users
) t on t.user = story.user and story.reading_counts = t.max_x_user

How to find the number of matches a player has played?

I have two tables namely match and player. I am trying to find the total number of matches played by each player by adding no_of_wins and no_of_loses columns.
player:
id | name
----|----
1 | Suhas
2 | Srivats
3 | James
4 | Watson
match:
id | winner | loser
----|--------|-------
1 | 1 | 2
2 | 1 | 3
3 | 1 | 4
4 | 2 | 4
5 | 4 | 3
6 | 3 | 2
I tried the following SQL command:
select p.id, p.name, count(m.winner) as no_of_wins,count(m.loser) as no_of_loses from player as p left join match as m on p.id=m.winner group by p.id order by p.id;
This command shows the wrong output for the number of loses.
id | name | no_of_wins | no_of_loses
----|---------|------------|-------------
1 | Suhas | 3 | 3
2 | Srivats | 1 | 1
3 | James | 1 | 1
4 | Watson | 1 | 1
Kindly help.

Calculate aggregated numbers of wins and loses for a player in two queries and (full) join them by a player id:
select
name,
coalesce(wins, 0) as no_of_wins,
coalesce(loses, 0) as no_of_loses,
coalesce(wins, 0) + coalesce(loses, 0) as total
from (
select winner as id, count(*) as wins
from match
group by 1
) w
full join (
select loser as id, count(*) as loses
from match
group by 1
) l using (id)
full join player using(id)
order by id;
name | no_of_wins | no_of_loses | total
---------+------------+-------------+-------
Suhas | 3 | 0 | 3
Srivats | 1 | 2 | 3
James | 1 | 2 | 3
Watson | 1 | 2 | 3
(4 rows)

Your query will cause an error because you didn't add p.name to the GROUP BY clause.
You'll have to join match twice, because these are two independent joins:
SELECT p.id,
p.name,
COALESCE(w.wins, 0) no_of_wins,
COALESCE(l.losses, 0) no_of_losses
FROM player p
LEFT JOIN
(SELECT winner id,
count(*) wins
FROM match
GROUP BY winner
) w
USING (id)
LEFT JOIN
(SELECT loser id,
count(*) losses
FROM match
GROUP BY loser
) l
USING (id);

Join tables and count instances of different values

user
---------------------------
| ID | Name |
---------------------------
| 1 | Jim Rice |
| 2 | Wade Boggs |
| 3 | Bill Buckner |
---------------------------
at_bats
----------------------
| ID | User | Bases |
----------------------
| 1 | 1 | 2 |
| 2 | 2 | 1 |
| 3 | 1 | 2 |
| 4 | 3 | 0 |
| 5 | 1 | 3 |
----------------------
What I want my query to do is get the count of the different base values in a join table like:
count_of_hits
---------------------
| ID | 1B | 2B | 3B |
---------------------
| 1 | 0 | 2 | 1 |
| 2 | 1 | 0 | 0 |
| 3 | 0 | 0 | 0 |
---------------------
I had a query where I was able to get the bases individually, but not them all unless I did some complicated Joins and I'd imagine there is a better way. This was the foundational query though:
SELECT id, COUNT(ab.*)
FROM user
LEFT OUTER JOIN (SELECT * FROM at_bats WHERE at_bats.bases=2) ab ON ab.user=user.id

PostgreSQL 9.4+ provides a much cleaner way to do this:
SELECT
users,
count(*) FILTER (WHERE bases=1) As B1,
count(*) FILTER (WHERE bases=2) As B2,
count(*) FILTER (WHERE bases=3) As B3,
FROM at_bats
GROUP BY users
ORDER BY users;

I think the following query would solve your problem. However, I am not sure if it is the best approach:
select distinct a.users, coalesce(b.B1, 0) As B1, coalesce(c.B2, 0) As B2 ,coalesce(d.B3, 0) As B3
FROM at_bats a
LEFT JOIN (SELECT users, count(bases) As B1 FROM at_bats WHERE bases = 1 GROUP BY users) as b ON a.users=b.users
LEFT JOIN (SELECT users, count(bases) As B2 FROM at_bats WHERE bases = 2 GROUP BY users) as c ON a.users=c.users
LEFT JOIN (SELECT users, count(bases) As B3 FROM at_bats WHERE bases = 3 GROUP BY users) as d ON a.users=d.users
Order by users
the coalesce() function is just to replace the nulls with zeros. I hope this query helps you :D
UPDATE 1
I found a better way to do it, look to the following:
SELECT users,
count(case bases when 1 then 1 else null end) As B1,
count(case bases when 2 then 1 else null end) As B2,
count(case bases when 3 then 1 else null end) As B3
FROM at_bats
GROUP BY users
ORDER BY users;
It it is more efficient compared to my first query. You can check the performance by using EXPLAIN ANALYSE before the query.
Thanks to Guffa from this post: https://stackoverflow.com/a/1400115/4453190

How to list the train operators that use the second oldest trains (PostgreSQL)

train_operators:
| train_operator_id | name |
------------------------------
| 1 | Virgin |
| 2 | First |
journeys:
| journey_id | train_operator | train_type |
--------------------------------------------
| 1 | 2 | 2 |
| 2 | 2 | 1 |
| 3 | 1 | 3 |
| 4 | 1 | 2 |
train_types:
| train_type_id | date_made |
------------------------------
| 1 | 1999-02-15 |
| 2 | 2001-03-11 |
| 3 | 2000-12-05 |
How would you write a query to find all the train operators that use the second oldest type of train?
With the given schema the query should result with just Virgin since it is the only train operator that uses the second oldest train type

Try this:
select distinct train_operator from journeys
inner join (Select * from train_types order by date_made LIMIT 1 OFFSET 1) sectrain
on sectrain.train_type_id = journeys.train_type
You're into the UK Rail Network are you? I used to work for Funkwerk IT, who in turn used to provide the timetable planning software for Network Rail...

It can be pretty easy using the power of window functions in pg
SELECT DISTINCT train_operator_id,
name
FROM (SELECT t.train_operator_id,
t.name,
Rank() OVER (ORDER BY tt.date_made) AS rank
FROM train_operators AS t
JOIN journeys AS j
ON j.train_operator = t.train_operator_id
JOIN train_types AS tt
ON tt.train_type_id = j.train_type) AS q
WHERE rank = 2;
http://sqlfiddle.com/#!12/98816/8

select to.name
from
train_operators to
inner join
journeys j on to.train_operator_id = j.train_operator
where
j.train_type = (
select train_type_id
from train_types
order by date_made
limit 1 offset 1
)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgresql : Filtering duplicate pair - postgresql

Related

Distinct Count Dates by timeframe

How to display the max value of a column with respect to only one other column?

How to find the number of matches a player has played?

Join tables and count instances of different values

How to list the train operators that use the second oldest trains (PostgreSQL)

Categories

Resources