How can I join table to delete duplicates entry in PostgreSQL - postgresql

I actually have this which is working :
DELETE FROM "WU_MatchingUsers" WHERE "id" IN (SELECT "id" FROM (SELECT "id", ROW_NUMBER() OVER( PARTITION BY "IDWU_User1", "IDWU_User2" ORDER BY "id" ASC) AS row_num FROM "WU_MatchingUsers") t WHERE t.row_num >1);
This delete all duplicates entry by the more recent one in "WU_MatchingUsers" but now I have another table which is : "WU_UsersSpheres" which contain Sphere id associate with user ID.
Now I would like that my query can filter / delete only Users from a specific Spheres.
So Wu_UserSpheres look like this :
id | idSpheres | IDUser
1 1 1
2 1 2
3 2 3
4 2 4
5 2 5
So the goal is to only delete duplicate of my matching where id of the users are in a specific Spheres.

Related

Get all elements which have both values grouped by ParentId

I have the following problem:
I have a relation table like this:
ParentId | ValueId
1 1
1 2
2 3
2 4
2 1
Then, I want to get the ParentId who have exactly the values which query say, no more, no less.
I have this query actually:
SELECT "ParentId" FROM public."ParentValueRelation"
WHERE "ValueId" = 1 AND "ValueId" = 2
GROUP BY "ParentId"
Expected to receive 1 but getting null
Answer in sequelize could be great but not necessary
There are number of ways to do this in Postgres. Like this for instance:
SELECT "ParentId" FROM public."ParentValueRelation"
WHERE "ValueId" = 1 OR "ValueId" = 2
GROUP BY "ParentId"
HAVING COUNT("ValueID")=2
If there are duplicates in the table, you need to replace the having clause with
HAVING COUNT(DISTINCT "ValueID")=2
Best regards,Bjarni

Delete matching with already matched user in all ways in PostgreSQL

I need to delete matching of my user which are already match. So I have a table : "MatchingUser" that look like this :
id | idUser1 | idUser2
1 1 2
2 1 3
3 1 2
4 2 1
In this example I would like to delete entry 3 and 4 because the matching is the same as the entry 1 like : if 1 match with 2 I need to delete 3 because it's the same matching and also 4 because 2 matching 1 is the same as 1 matching 2.
I already have this :
DELETE FROM "WU_MatchingUsers" WHERE "id" IN (SELECT "id" FROM (SELECT "id", ROW_NUMBER() OVER( PARTITION BY "IDWU_User1", "IDWU_User2" ORDER BY "id" DESC) AS row_num FROM "WU_MatchingUsers") t WHERE t.row_num >1);
This one already delete same matching so in our example this one already delete the entry 3 but not the 4, I would like to add something to this query to also delete the entry 4.
I would use an EXISTS condition
delete from MatchingUser mu1
where exists (select *
from MatchingUser mu2
where mu2.id < mu1.id
and least(mu2.iduser1, mu2.iduser2) = least(mu1.iduser1, mu1.iduser2)
and greatest(mu2.iduser1, mu2.iduser2) = greatest(mu1.iduser1, mu1.iduser2))
This deletes all rows where the combination of iduser1/iduser2 is the same but have a higher ID value than the existing ones. So in this case rows with ID = 3 and ID = 4.

how to list records that conform to a sequentially incrementing id in postgres

Is there a way to select records are sequentially incremented?
for example, for a list of records
id 0
id 1
id 3
id 4
id 5
id 8
a command like:
select id incrementally from 3
Will return values 3,4 and 5. It won't return 8 because it's not sequentially incrementing from 5.
step-by-step demo:db<>fiddle
WITH groups AS ( -- 2
SELECT
*,
id - row_number() OVER (ORDER BY id) as group_id -- 1
FROM mytable
)
SELECT
*
FROM groups
WHERE group_id = ( -- 4
SELECT group_id FROM groups WHERE id = 3 -- 3
)
row_number() window function create a consecutive row count. With this difference you are able to create groups of consecutive records (id values which are increasing by 1)
This query is put into a WITH clause because we reuse the result twice in the next step
Select the recently created group_id
Filter the table for this group.
Additionally: If you want to start your output at id = 4, for example, you need to add a AND id >= 4 filter to the WHERE clause

delete duplicates in a table and update references

I have a table with id, we now added a new field where we calculated uniques from an external source, which made us realize we actually have duplicates in the database:
Main Table
id | unique_id | ...
---|------------
4 | A |
5 | A
6 | B
We can see: 5 is actually a duplicate of 4, as they both have the same unique_id.
Now this needs to be cleaned up.
I sadly can not simply delete those duplicates (5), as other tables depend on it:
Other Table (OtherTable.main_id REFERENCES MainTable.id)
id | main_id | ...
---|------------
1 | 4 | Blah
2 | 5
3 | 6
Now I have to clean up the duplicates, here
UPDATE OtherTable SET main_id = 5 WHERE main_id=4
How can I do that in an efficient update?
I tried to simply update every reference to the first one with that same unique_id, however that didn't complete in a day.
UPDATE "OtherTable" SET "main_id" = (SELECT "id" FROM "MainTable" WHERE "unique_id" = (SELECT "unique_id" FROM "MainTable" WHERE "id" == "OtherTable"."main_id") LIMIT 1)
If it helps, the MainTable contains about 750,000 entries, the OtherTable contains 12,000,000 rows.
Probably that's because those tripple select one is quite inefficient.
For the simple part of deletion the duplicates (after I would be done with changing the references to the first one of it's kind) I found this query to work swiftly enough:
DELETE FROM MainTable
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY unique_id
ORDER BY id ) AS row_num
FROM MainTable ) t
WHERE t.row_num > 1 );
However I need a way to update the references to the non-deleted ones of the duplicates.
Instead of UPDATE with a nested query, I'd suggest using UPDATE FROM for a join, and the same window function as in your DELETE statement:
UPDATE "OtherTable" AS other
SET main_id = main.min_id
FROM (SELECT
id,
first_value(id) OVER (PARTITION BY unique_id ORDER BY id) AS min_id
FROM "MainTable"
) AS main
WHERE main.id = other.main_id
AND main.id <> main.min_id

Postgresql selecting with limit equal values?

I have one postgresql table where I store some stories from different sites.
At this table I got story_id and site_id fields.
Where story_id is the primary key and site_id is the id of the site where I got this story from.
I need to make SELECT from this table picking the latest 30 added stories.
But I dont want to get more than 2 stories comming from same site...
So if I have something like this:
story_id | site_id
1 | 1
2 | 1
3 | 2
4 | 1
5 | 3
My results must be : story_ids = 1,2,3,5!
4 must be skipped because I have already picked 2 ids with site_id 1.
select story_id,
site_id
from (
select story_id,
site_id,
row_number() over (partition by site_id order by story_id desc) as rn
from the_table
) t
where rn <= 2
order by story_id desc
limit 30
If you want more or less than 2 entries "per group" you have to adjust the value in the outer where clause.