delete duplicate rows matching with other table - postgresql

I have two tables
Table A Table B
-------- ---------
a b c a b c
a b c a b c
a b c a b c
e f g a b c
h i j e f g
k l m k l m
k l m
x y z
s t u
a b c
a b c
Now i want to remove rows in Table B matching on column 1, 2 and 3 with table A where the count of each duplicate row in Table B should be less than or equal to table A.
So the output should be
Table A Table B
-------- ---------
a b c a b c
a b c a b c
a b c a b c
e f g e f g
h i j k l m
k l m x y z
s t u
I have tried using inner join and intersect but failed to get the desired result.

Try:
DELETE FROM tableB
WHERE ctid IN (
SELECT BB.ctid
FROM (
SELECT a, b, c, count(*) cnt
FROM tablea
GROUP BY a, b, c
) AA
JOIN (
SELECT ctid,
a, b, c,
row_number() over (partition by a,b,c) cnt
FROM tableb
) BB
ON AA.a = BB.a
AND AA.b = BB.b
AND AA.c = BB.c
AND AA.cnt < BB.cnt
)
demo: http://sqlfiddle.com/#!12/73e99/1

I think if table isn't big the simply way is to delete all rows from TableB which exist in TableA and then insert TableA into TableB. Another ways IMHO are required at least a primary key in TableB.
DELETE FROM TableB
WHERE EXISTS(SELECT * FROM TableA
WHERE C1=TableA.C1
AND C2=TableA.C2
AND C3=TableA.C3) ;
INSERT INTO TableB SELECT * FROM TableA;

Related

Finding set of rows in table based on matching rows from another table

I know the topic is a bit vague at best, but cannot find a way to describe my problem better...
An example, I have the following two tables:
TableA
IdA
Code
Value
123
A
1
123
B
2
123
C
3
456
A
4
456
F
6
456
E
7
...
TableB
IdB
Code
Value
X
A
1
X
B
2
X
C
3
Y
G
2
Y
D
8
Y
C
3
Z
A
1
Z
B
2
Z
C
3
Z
D
5
...
A set of records for a given IdA in TableA correlates to an equivalent set of records in TableB having a specific IdB.
For instance, for IdA = 123 in TableA, I have exactly three rows with certain codes and values, this would "map" to rows with IdB = X in TableB because it has the same combination of Codes and Values and the same number of rows. Note that it would not map to IdB = Z in TableB, because it has an additional row for Code D which IdA = 123 doesn't have in TableA.
Given only IdA, how to best write a query to find IdB?
If the codes and values were known, I could have done something similar to this:
SELECT b.IdB FROM TableB b
WHERE
EXISTS(SELECT * FROM TableB x WHERE x.IdB = b.IdB AND x.Code = 'A' AND x.Value = '1') AND
EXISTS(SELECT * FROM TableB x WHERE x.IdB = b.IdB AND x.Code = 'B' AND x.Value = '2') AND
EXISTS(SELECT * FROM TableB x WHERE x.IdB = b.IdB AND x.Code = 'C' AND x.Value = '3') AND
(SELECT COUNT(*) FROM TableB x WHERE x.IdB = b.IdB) = 3
But now I'm only given a value for IdA, so I need to look up values from TableA and combine that in the query for TableB. Any clever ideas on how to tackle this?
This is a question of Relational Division Without Remainder.
There are many solutions, here is one:
Take TableB and left join TableA to it
But calculate a total over the whole set of values from A
Group by IdB
Filter so we only have rows where the total count is equal to the number of matches to A (because COUNT(IdA) only counts non-nulls) and the total count must also be the same as the total number of rows that we want to match to.
DECLARE #idA int = 123;
SELECT
b.IdB
FROM TableB b
LEFT JOIN (
SELECT *,
total = COUNT(*) OVER ()
FROM TableA a
WHERE a.IdA = #idA
) a ON b.Code = a.Code AND b.Value = a.Value
GROUP BY
b.IdB
HAVING COUNT(*) = COUNT(a.IdA)
AND COUNT(*) = MIN(a.total);
db<>fiddle

KDB get only the rows present only in one table

I have two tables table1 and table2. Both has 4 columns with same column names. table1 has 50 rows and table2 has 100 rows. How can I get only those rows from table2, which are not there in table1. I tried performing left join, but I am not able to do that, since we can't do left join using all columns.
Since tables are lists of dictionaries, you could use the except keyword to exclude all rows from table2 which are found in table1.
For example:
q)table1:([]a:til 3;b:3#.Q.a;c:3#.Q.A)
q)table1
a b c
-----
0 a A
1 b B
2 c C
q)table2:([]a:til 6;b:6#.Q.a;c:6#.Q.A)
q)table2
a b c
-----
0 a A
1 b B
2 c C
3 d D
4 e E
5 f F
q)table2 except table1
a b c
-----
3 d D
4 e E
5 f F

How to get distinct values from table in Postgres

I have a table with 20 columns, and I like to get distinct values of each column
So if I have
A B C D ....
----------
z c c d
z f c f
a c f d
z c c d
b f b d
z c a d
I want to get back
{ 'A':[z,a,b],
'B':[c,f],
'C': [c,f,b,a]
'D': [d,f]
....
}
How would the query look like ?
Maybe you need in
SELECT array_agg(DISTINCT a) a,
array_agg(DISTINCT b) b,
array_agg(DISTINCT c) c,
array_agg(DISTINCT d) d
FROM test;
?
fiddle

Full Join on multiple tables (postgresql)

In postgresql I got 4 tables
Table A:
-----------
a_id
a_date
Table B
-----------
a_id b_id
Table C:
-------------------
c_id
b_id
invoice_number
Table D
-------------------
d_id
invoice_number
value_D
Multiple records have value_D
I would like to select Table A, Table B, Table C and Table D, where a_date BETWEEN X AND Y.
However, I would also like to select all the other value_D that are not included in my selection (so A innerjoin B innerjoin C full outerjoin D)
my code
SELECT
Table A, Table B, Table C, Table D
FROM
Table A
JOIN
Table B ON A.a_id = B.a_id
JOIN
Table C ON B.b_id = C.b_id
FULL OUTER JOIN
Table D ON C.invoice_number = D.invoice_number
WHERE
A.a_date BETWEEN X AND Y;
It only shows D.value_d for the A.a_id, where A.a_date BETWEEN X and Y.
I would like however that D.value_d would also be shown for A.a_id, where A.a_date is also other.
I am kinda a newbie, so hopefully it is understandable and you could help me.
Thanks in advance
You can also add more conditions to the Where clause, for example:
"a_date BETWEEN X AND Y OR a_date > '2015-04-21'". This will retrieve the union of both conditions.
Regards
I think I solved it.
SELECT Table A, Table B, Table C, Table D
FROM Table A
JOIN Table B ON A.a_id = B.a_id
JOIN Table C ON B.b_id = C.b_id
JOIN Table D ON C.invoice_number = D.invoice_number
WHERE A.a_date BETWEEN X AND Y OR
D.value_D IN (SELECT D.value_D
FROM Table D
JOIN Table C ON D.invoice_number on C.invoice_number
JOIN Table B ON C.b_id = B.b_id
JOIN Table A ON B.a_id = A.a_id
WHERE A.a_date BETWEEN X AND Y);
Thank you all for the help guys!

JOIN triangle relationship

The relationship:
Profil>Branch>City
Profil>Hotel>City
command:
from p in Profil.getData()
join b in Branch.getData() on p equals b
join h in Hotel.getData() on p equals h
join c in City.getData()
^how to reuse the equals join
Can I join City to Branch and Hotel table?
Can I just clone the c without City.getData()?
Sure, just do (following your pseudo code):
from p in Profil.getData()
join b in Branch.getData() on p equals b
join h in Hotel.getData() on p equals h
join c1 in City.getData() on b equals c1
join c2 in City.getData() on h equals c2
EF will translate this into two aliases for the City table. So you don't reuse or clone the cities, but in SQL that's not possible either. SQL server will be able to optimize it in the query's execution plan though.