remove duplicate records in postgres where all records are duplicate - postgresql

My postgres table model have exactly duplicate record, I need to write a query to delete them.
id | model | model_id | dependent_on_model
-----+-------+----------+--------------------
1 | Card | 72 | Metric
1 | Card | 72 | Metric
2 | Card | 79 | Metric
2 | Card | 79 | Metric
3 | Card | 83 | Metric
3 | Card | 83 | Metric
5 | Card | 86 | Metric
using Cte is not helping as i am getting the error
relation "cte" does not exist.
Please suggest a query which delete the duplicate row and i will have just 4 distinct records at the end.

My suggestion is to duplicate the table in a TEMPORARY TABLE WITH OIDS. This way you have some other id to distinguish the two identical rows.
Idea:
Duplicate the data with another ID in a temporary table.
Remove duplicates in temporary table.
Delete actual table
Copy data back into actual table from temporary table.
Delete the TEMPORARY TABLE
You'll have to perform some destructive action on your actual table so make sure your TEMPORARY TABLE has what you want remaining before deleting anything from your actual table.
This is how you would create the TEMPORARY TABLE:
CREATE TEMPORARY TABLE dups_with_oids
( id integer
, model text
, model_id integer
, dependent_on_model text
) WITH OIDS;
Here is the DELETE query:
WITH temp AS
(
SELECT d.id AS keep
, d.oid AS keep_oid
, d2.id AS del
, d2.oid AS del_oid
FROM dups_with_oids d
JOIN dups_with_oids d2 ON (d.id = d2.id AND d.oid < d2.oid)
)
DELETE FROM dups_with_oids d
WHERE d.oid IN (SELECT temp.del_oid FROM temp);
SQLFiddle to prove the theory.
I should add that if id were a PRIMARY KEY or UNIQUE these duplicates wouldn't have been possible.

Related

How can I ensure that a join table is referencing two tables with a composite FK, one of the two column being in common on both tables?

I have 3 tables : employee, event, and these are N-N so the 3rd table employee_event.
The trick is, they can only N-N within the same group
employee
+---------+--------------+
| id | group |
+---------+--------------+
| 1 | A |
| 2 | B |
+---------+--------------+
event
+---------+--------------+
| id | group |
+---------+--------------+
| 43 | A |
| 44 | B |
+----
employee_event
+---------+--------------+
| employee_id | event_id |
+-------------+--------------+
| 1 | 43 |
| 2 | 44 |
+---------+--------------+
So the combination employee_id=1 event_id=44 should not be possible, because employee from group A can not attend an event from group B. How can I secure my DB with this?
My first idea is to add the column employee_event.group so that I can make my two FK (composite) with employee_id + group and event_id + group respectively to the table employee and event. But is there a way to avoid adding a column in the join table for the only purpose of FKs?
Thx!
You may create a function and use it as a check constraint on table employee_event.
create or replace function groups_match (employee_id integer, event_id integer)
returns boolean language sql as
$$
select
(select group from employee where id = employee_id) =
(select group from event where id = event_id);
$$;
and then add a check constraint on table employee_event.
ALTER TABLE employee_event
ADD CONSTRAINT groups_match_check
CHECK groups_match(employee_id, event_id);
Still bear in mind that rows in employee_event that used to be valid may become invalid but still remain intact if certain changes in tables employee and event occur.

Join Same Column from Same Table Twice

I am new to the SQL world. I would like to replace the Games.home_team_id and Games.away_team_id with the Corresponding entry in the Teams.name column.
First I start by initializing a small table of data:
CREATE TABLE Games (id,away_team_id INT,away_team_score INT,home_team_id INT, home_team_score INT);
CREATE TABLE
INSERT INTO Games (id,away_team_id,away_team_score,home_team_id,home_team_score)
VALUES
(1,1,1,2,4),
(2,1,3,3,2),
(3,1,1,4,1),
(4,2,0,3,2),
(5,2,3,4,1),
(6,3,5,4,2)
;
INSERT 0 6
Then I create a template of a reference table
CREATE TABLE Teams (id INT, name VARCHAR(63);
CREATE TABLE
INSERT INTO Teams (id, name)
VALUES
(1, 'Oogabooga FC'),
(2, 'FC Milawnchair'),
(3, 'Ron\'s Footy United'),
(4, 'Pylon City FC')
;
INSERT 0 4
I would like to have the table displayed as such:
| id | away_team_name | away_team_score | home_team_name | home_team_score |
-----+----------------+-----------------+----------------+------------------
| 1 | Oogabooga FC | 1 | FC Milawnchair | 4 |
...
I managed to get a join query to show the first value from Teams.name in the away_team_name field using this JOIN:
SELECT
Games.id,
Teams.name AS away_team_name,
Games.away_team_score,
Teams.name AS home_team_name,
Games.home_team_score
FROM Games
JOIN Teams ON Teams.id = Games.away_team_id
;
| id | away_team_name | away_team_score | home_team_name | home_team_score |
-----+----------------+-----------------+----------------+------------------
| 1 | Oogabooga FC | 1 | Oogabooga FC | 4 |
...
But now I am stuck when I call it twice as a JOIN it shows the error:
SELECT
Games.id,
Teams.name AS away_team_name,
Games.away_team_score,
Teams.name AS home_team_name,
Games.home_team_score
FROM Games
JOIN Teams ON Teams.id = Games.away_team_id
JOIN Teams ON Teams.id = Games.home_team_id
;
ERROR: table name "teams" specified more than once
How do you reference the same reference the same column of the same table twice for a join?
You need to specify an alias for at least one of the instances of the table; preferably both.
SELECT
Games.id,
Away.name AS away_team_name,
Games.away_team_score,
Home.name AS home_team_name,
Games.home_team_score
FROM Games
JOIN Teams AS Away ON Away.id = Games.away_team_id
JOIN Teams AS Home ON Home.id = Games.home_team_id
Explanation: As you are joining to the same table twice, the DBMS (in your case, PostgreSQL) is unable to identify which of the tables you're referencing to when using its fields; the way to solve this is to assign an alias to the joined tables the same way you assign aliases for your columns. This way you can specify which of the joined instances are you referencing to in your SELECT, JOIN and WHERE statements.

Is it possible to select columns from table, update it and copy updated data to another table with postgres DB?

I want to select data from the table to modify that data without a change in the original table and copy that modified data to another table.
for example, I have table name student
student
id | name |isPresent
1 | xyz | 1
2 | lmn | 1
and I want to show copy data like
id | name |isPresent
1 | xyz | 0
2 | lmn | 0
how could I do this with the help of a query?
You could use an INSERT INTO ... SELECT:
INSERT INTO copyTable (id, name, isPresent)
SELECT id, name, 0
FROM yourTable;
This assumes that you want to copy all data in your original table (yourTable) into a new table (copyTable), with the requirement that all isPresent values be set to zero for that original data in the new table.

Delete Duplicate Data on PostgreSQL

How to delete duplicate data on a table which have kind data like these.
I want to keep it with the latest updated_at at each attribute id.
Like as follows:
attribute id | created at | product_id
1 | 2020-04-28 15:31:11 | 112235
4 | 2020-04-28 15:30:25 | 112235
1 | 2020-04-29 15:30:25 | 112236
4 | 2020-04-29 15:30:25 | 112236
You can use an EXISTS condition.
delete from the_table t1
where exists (select *
from the_table t2
where t2.created_at > t1.created_at
and t2.attribute_id = t1.attribute_id);
This will delete all rows where another row for the same attribute_id exists that has bigger created_at value (thus keeping only the row with the highest created_at for each attribute_id). Note that if two created_at values are identical, nothing will be deleted for that attribute_id
Online example

Postgresql remove values from foreign key that has a cyclic reference and also is referenced in a primary table

There are 2 tables:
the first one is the Father Table
create table win_folder_principal(
id_folder_principal serial primary key not null,
folder_name varchar(300)not null
);
and the table that has a cyclic reference
create table win_folder_dependency(
id_folder_dependency serial primary key not null,
id_folder_father int not null,
id_folder_son int not null,
foreign key(id_folder_father)references win_folder_principal(id_folder_principal),
foreign key(id_folder_son)references win_folder_principal(id_folder_principal)
);
however i found a very interesting situation, if i wanna remove a value from the table father that has a kid and that kid has more kids, is there any way to remove the values from the last to the first but also those values be removed from the Father table?
**WIN_FOLDER_PRINCIPAL**
| Id | Folder_Name|
| 23 | new2 |
| 24 | new3 |
| 13 | new0 |
| 22 | new1 |
| 12 | nFol |
And this are the value stored in the Win_Folder_Dependency
**WIN_FOLDER_DEPENDENCY**
| Id_Father | Id_Son |
| 12 | 13 |
| 13 | 22 |
| 22 | 23 |
| 23 | 24 |
and this is the query that i use to know the values in the dependency and principal table.
SELECT m2.id_folder_principal AS "Principal",
m.folder_name AS "Dependency",
m2.id_folder_principal AS id_principal,
m.id_folder_principal AS id_dependency
FROM ((win_folder_dependency md
JOIN win_folder_principal m ON ((m.id_folder_principal = md.id_folder_son)))
JOIN win_folder_principal m2 ON ((m2.id_folder_principal = md.id_folder_father)))
If i wanna remove the folder with the Id_Principal 13 i need to remove the other relations that exists in the Folder_Dependency table, but also remove the value from the Folder_Principal
is there any way to achieve that cyclic delete?
This anonymous code block will accumulate all the principles rooted with ID 13 searching down the dependency tree in an array parameter named l_Principles. It then deletes all the dependency records where either the father or son (or both) are contained in l_Principles, and then deletes all the principle records identified in l_Principles:
DO $$DECLARE
l_principles int[];
BEGIN
with recursive t1(root, child, pinciples) as (
select id_folder_father
, id_folder_son
, array[id_folder_father, id_folder_son]
from win_folder_dependency
where id_folder_father = 13
union all
select root
, id_folder_son
, pinciples||id_folder_son
from win_folder_dependency
join t1
on id_folder_father = child
and not id_folder_son = any(pinciples) -- Avoid cycles
)
select max(pinciples) into l_principles from t1 group by root;
delete from win_folder_dependency
where id_folder_father = any(l_principles)
or id_folder_son = any(l_principles);
delete from win_folder_principal
where id_folder_principal = any(l_principles);
end$$;
/
With your provided sample data, the end result will be only one record remaining in the win_folder_principal and no records in the win_folder_dependency table.
If you wan to delete a record from win_folder_principal you must first remove the references to it in win_folder_dependency like so:
delete from win_folder_dependency where 13 in (id_folder_father, id_folder_son);
before you delete the record from win_folder_principal like so:
delete from win_folder_principal where id_folder_principal = 13;
Alternatively if you build your second table like this:
create table win_folder_dependency(
id_folder_dependency serial primary key not null,
id_folder_father int not null,
id_folder_son int not null,
foreign key(id_folder_father)references win_folder_principal(id_folder_principal) on delete cascade,
foreign key(id_folder_son)references win_folder_principal(id_folder_principal) on delete cascade
);
Note the on delete cascade directives, then you can just delete from the principal table, and the references in the dependency table will be deleted as well.