Self join in Django - django-orm

To perform a query with self-join on the following table:
> select * from test_users
id email
--- -----------
1 "a#abc.com"
2 "a#abc.com"
3 "b#abc.com"
I can use SQL:
> select u1.id u1id, u2.id u2id from test_users u1 inner join test_users u2 on u1.email=u2.email and u1.id !=u2.id
u1id u2id
----- ------
1 2
2 1
Question:
How can I write this in Django ORM?
If I want to remove the duplicate so that I get only 1 row in above example, how can I achieve that in Django?

You can retrieve the Users for which there is another item with a primary key that is greater than (or less than) the primary key of the user with an Exists subquery [Django-doc]:
from django.db.models import Exists, OuterRef
Users.objects.filter(
Exists(
User.objects.filter(
pk__gt=OuterRef('pk'),
email=OuterRef('email')
)
)
)
If you thus call .delete() on these, you will remove all Users for wich another Users object exists with a greater primary key.
Prior to django-3.0, one should move the Exists subquery to an .annotate(…) clause, and then filter on this:
from django.db.models import Exists, OuterRef
Users.objects.annotate(
has_other=Exists(
User.objects.filter(
pk__gt=OuterRef('pk'),
email=OuterRef('email')
)
)
).filter(has_other=True)

Related

How can I flatten a case sensitive column that is also used in a relation

Hello postgres experts,
I have a users table
id
email
1
john#example.com
2
John#example.com
And a posts table
id
userId
content
1
1
foo
2
2
bar
The emails of the two users are the same when ignoring case so I am looking for a way to drop the rows that have duplicated emails and update the userId in the posts table to point to the user that remains.
So the final result will be:
id
email
1
john#example.com
id
userId
content
1
1
foo
2
1
bar
I'm not concerned which version of the email address I end up with (i.e. doesn't have to be the one that's all lowercase).
What's the best way to get this done?
You can update the posts table by taking the smallest id:
update posts p
set userid = u.user_id
from (
select min(id) user_id, array_agg(id) as user_ids
from users u
group by lower(email)
having count(*) > 1
) u
where p.userid = any(u.user_ids)
and p.userid <> u.user_id
;
The SELECT in the derived table returns all users that have more than one email address. The WHERE clause then updates the posts table to use one of the IDs. Once that is done, you can delete the no longer used users
delete from users
where not exists (select *
from posts p
where users.id = p.userid);
Online example
The key to deduplicating rows is breaking ties by some kind of row ID, which you already have. We're going to keep the user with the lowest ID for each case-insensitive email, keeping in mind that there may be more than 2 duplicates for some.
First, for each post, set the user to any user with an equivalent email, for which there exists no other user also with an equivalent email but a lower ID. If I'm doing this right, that should match exactly one user row every time, either the original user or another one.
UPDATE posts p SET "userId" = u2.id
FROM users u, users u2
WHERE u.id = p."userId"
AND lower(u2.email) = lower(u.email)
AND NOT EXISTS (
SELECT 1
FROM users u3
WHERE u3.id < u2.id
AND lower(u3.email) = lower(u2.email)
);
(You could also do this with a MIN or DISTINCT subquery, but in my experience this is the fastest.)
Then delete any users for which there exists a user with an equivalent email and a lower ID:
DELETE FROM users u
WHERE EXISTS (
SELECT 1 FROM users u2
WHERE u2.id < u.id
AND lower(u2.email) = lower(u.email)
);
Optionally, seal the deal with a uniqueness constraint on lower-case email. I don't remember the exact syntax, but this should be close:
CREATE INDEX user_lower_email ON users(lower(email));

Merge two tables in Postgresql giving preference to one particular table

I have two tables, Users and Masters. Users are having User specific settingkey-value. Masters is having master settingkey-value. I want to display key-value from the two tables, where
if users do not have that particular key, need to take it from masters
2 if the users do not exists in the table, need to display all from masters key-value
if users having key-value, have to display users key-value
Example:
Inputs being - UserID and appID = 1.
I tried with left join combination, but not getting desired result if Users do not exists at all in the Users table.
Could you please give me some advise.
step-by-step demo:db<>fiddle
SELECT
COALESCE(m.app_id, u.app_id) as app_id,
COALESCE(m.setting_key, u.setting_key) as setting_key,
COALESCE(u.setting_value, m.setting_value) as setting_value -- 2
FROM
master_table m
FULL OUTER JOIN -- 1
user_table u
ON m.app_id = u.app_id AND m.setting_key = u.setting_key
WHERE COALESCE(m.app_id, u.app_id) = 1 -- 3
AND (u.user_id = 1 OR u.user_id IS NULL)
You need a FULL OUTER JOIN to join also data set that the other table does not contain
COALESCE(a, b) gives you the first non-null value. So, if a (here the user value) is available, it will be returned. Otherwise b (here the master value)
Filter by app_id and user_id; second needs to be filtered by user_id == NULL too, to get all setting_keys. Of course, you could use here COALESCE as well: COALESCE(u.user_id, 1) whereas the last 1 is the specific user_id you're asking
Edit: If User does not exist, give out the Masters values for app_id:
step-by-step demo:db<>fiddle:
SELECT DISTINCT ON (app_id, setting_key) -- 3
*
FROM (
SELECT
COALESCE(user_app_id, master_app_id) AS app_id, -- 2
COALESCE(user_setting_key, master_setting_key) AS setting_key,
COALESCE(user_setting_value, master_setting_value) AS setting_value,
user_id
FROM (
SELECT
app_id as master_app_id,
setting_key as master_setting_key,
setting_value as master_setting_value,
null as user_id,
null as user_app_id,
null as user_setting_key,
null as user_setting_value
FROM
master_table m
UNION -- 1
SELECT
*
FROM
master_table m
FULL OUTER JOIN
user_table u
ON m.app_id = u.app_id AND m.setting_key = u.setting_key
) s
) s
WHERE app_id = 1
AND (user_id = 2 OR user_id IS NULL)
ORDER BY app_id, setting_key, user_id NULLS LAST -- 3
This is a little more complicated. You need a separate data set for user_id == NULL which could be fetched. So, the NULL user represents the unknown user.
You can achieve this by adding the Master table with NULL values using an UNION.
Now you can create the expected columns with the COALESCE() functions as described above.
The third trick is using the DISTINCT ON clause on the app_id and the setting_key columns. When you ordered the NULL columns from the default UNION part in (1) last, then the DISTINCT ON will fetch the user record. However, when the user didn't exist, then the DISTINCT ON will fetch the default Master record.

show records that have only one matchin row in another table

I need to write a sql code that probably is very simple but I am very new to it.
I need to find all the records from one table that have matching id (but no more than one) from the other table. eg. one table contains records of the employees and the second one with employees' telephone numbers. i need to find all employees with only one telephone no
Sample data would be nice. In absence of:
SELECT
employees.employee_id
FROM
employees
LEFT JOIN
(SELECT distinct on(employee_id) employee_id FROM emp_phone) AS phone
ON
employees.employee_id = phone.employee_id
WHERE
phone.employee_id IS NOT NULL;
You need a join of the 2 tables, group by employee and the condition in the having clause:
SELECT e.employee_id, e.name
FROM employees e INNER JOIN numbers n
ON e.employee_id = n.employee_id
GROUP BY e.employee_id, e.name
HAVING COUNT(*) = 1;
If there can be more than a few numbers per employee in the table with the employees' telephone numbers (calling it tel), then it's cheaper to avoid GROUP BY and HAVING which has to process all rows. Find employees with "unique" numbers using a self-anti-join with NOT EXISTS.
While you don't need more than the employee_id and their unique phone number, you don't even have to involve the employee table at all:
SELECT *
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
);
If you need additional columns from the employee table:
SELECT * -- or any columns you need
FROM (
SELECT employee_id AS id, tel_number -- or any columns you need
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
)
) t
JOIN employee e USING (id);
The column alias in the subquery (employee_id AS id) is just for convenience. Then the outer join condition can be USING (id), and the ID column is only included once in the result, even with SELECT * ...
Simpler with a smart naming convention that uses employee_id for the employee ID everywhere. But it's a widespread anti-pattern to use employee.id instead.
Related:
JOIN table if condition is satisfied, else perform no join

Returning NULL if a value is not found in an array

I'd like to check if a value exists in an array, and if it does not, I'd like to return a NULL row for it, instead of no row.
SELECT
users.id
FROM
users
WHERE
users.name = ANY('{ John, avocado, Carl }'::text[]);
Currently returns
id
1
2
I'd like it to return
id
1
NULL
2
Since avocado is not present in our users table.
SELECT users.id
FROM unnest('{John, avocado, Carl}'::text[]) WITH ORDINALITY AS a(name, num)
LEFT JOIN users USING (name)
ORDER BY a.num;
id
----
2
1
(3 rows)

Identifying rows with multiple IDs linked to a unique value

Using ms-sql 2008 r2; am sure this is very straightforward. I am trying to identify where a unique value {ISIN} has been linked to more than 1 Identifier. An example output would be:
isin entity_id
XS0276697439 000BYT-E
XS0276697439 000BYV-E
This is actually an error and I want to look for other instances where there may be more than one entity_id linked to a unique ISIN.
This is my current working but it's obviously not correct:
select isin, entity_id from edm_security_entity_map
where isin is not null
--and isin = ('XS0276697439')
group by isin, entity_id
having COUNT(entity_id) > 1
order by isin asc
Thanks for your help.
Elliot,
I don't have a copy of SQL in front of me right now, so apologies if my syntax isn't spot on.
I'd start by finding the duplicates:
select
x.isin
,count(*)
from edm_security_entity_map as x
group by x.isin
having count(*) > 1
Then join that back to the full table to find where those duplicates come from:
;with DuplicateList as
(
select
x.isin
--,count(*) -- not used elsewhere
from edm_security_entity_map as x
group by x.isin
having count(*) > 1
)
select
map.isin
,map.entity_id
from edm_security_entity_map as map
inner join DuplicateList as dup
on dup.isin = map.isin;
HTH,
Michael
So you're saying that if isin-1 has a row for both entity-1 and entity-2 that's an error but isin-3, say, linked to entity-3 in two separe rows is OK? The ugly-but-readable solution to that is to pre-pend another CTE on the previous solution
;with UniqueValues as
(select distinct
y.isin
,y.entity_id
from edm_security_entity_map as y
)
,DuplicateList as
(
select
x.isin
--,count(*) -- not used elsewhere
from UniqueValues as x
group by x.isin
having count(*) > 1
)
select
map.isin
,map.entity_id
from edm_security_entity_map as map -- or from UniqueValues, depening on your objective.
inner join DuplicateList as dup
on dup.isin = map.isin;
There are better solutions with additional GROUP BY clauses in the final query. If this is going into production I'd be recommending that. Or if your table has a bajillion rows. If you just need to do some analysis the above should suffice, I hope.