Postgres - Find duplicate values after lowering the values - postgresql

Hello StackOverflow users... I have a tricky situation and I have yet to find an answer. Maybe you can help me.
Database: PostgreSQL 8.4 (can't upgrade)
In this database, there is a users table. Sadly, the usernames that users can provide when they create a user profile is case sensitive, so a username of Alex is not the same as a username of alex.
There is a new system going out and username is no longer case sensitive. I'm trying to find all of the usernames that would be considered duplicates in the old system. This way we can reach out and have them update the usernames manually and then migrate their users to a newer system (without conflict of username).
I have the following query which will show me the counts of each username matching another with the "lower()" function.
select count(*), lower(username)
from users
where deleted = false
group by lower(username) having count(*) > 1
This returns results like the following:
|count|lower |
|-----+--------+
|3 |alex |
|2 |george |
What I need to do is get this data into a temp table and display all of those duplicate users and other details so that we have a list to go through.
I have part of the temp table figured out, but my main issue is: How do I get the distinct values of all of these duplicates? So in the long run, I get results that look like the following (and maybe even without a temp table if possible):
|lower |username|
|-------+--------+
|alex |Alex |
|alex |alex |
|george |georGe |
|george |George |
Restrictions:
I can't change the version of postgres from 8.4
Some duplicates will have more than 2 hits (the most I've seen so far is 3)
Since the users must be informed, there is no way to change the data other then to contact them prior (which is why the list is needed)
I appreciate any suggestions/feedback you may be able to provide.

How about this. Just generate your above list as a CTE, then join with it in the main query:
WITH dups AS (
SELECT lower(username) uname, count(*) ucount
FROM users
WHERE deleted = false
GROUP BY lower(username) HAVING count(*) > 1)
SELECT username, uname, ucount
FROM users INNER JOIN dups ON lower(username) = uname
WHERE deleted = false
ORDER BY ucount DESC, uname ASC;
username | uname | ucount
----------+--------+--------
Alex | alex | 3
alex | alex | 3
ALEX | alex | 3
GeorGe | george | 2
george | george | 2
(5 rows)
Or even simpler if you only want a bare list of the affected users:
SELECT username
FROM users
WHERE deleted = false AND lower(username) IN (
SELECT lower(username)
FROM users
WHERE deleted = false
GROUP BY lower(username) HAVING count(*) > 1)
ORDER BY lower(username) ASC;
username
----------
Alex
alex
ALEX
GeorGe
george
(5 rows)

I would usually use string_agg, but it looks like it's not supported in 8.4. There appears to be a workaround, but note that I haven't tested due to not having a local copy of 8.4 handy. Something like this should work:
select
(max(u1.username)),
array_to_string(array_agg(u2.username), ',') as duplicates
from users u1
inner join users u2 on u1.id < u2.id
and lower(u1.username) = lower(u2.username)
left join users u3 on u1.id > u3.id
and lower(u1.username) = lower(u3.username)
and u3.deleted = false
where u1.deleted = false
and u2.deleted = false
and u3.id is null
group by u1.id;
This will get the "earliest" user by ID (assuming there is a primary key that isn't username. It could be modified to show the actual lowercase username, and then the rest in the duplicates column.
Edit: to show a row for each duplicate:
select
lower(u1.username),
u2.username
from users u1
inner join users u2 on u1.id < u2.id
and lower(u1.username) = lower(u2.username)
left join users u3 on u1.id > u3.id
and lower(u1.username) = lower(u3.username)
and u3.deleted = false
where u1.deleted = false
and u2.deleted = false
and u3.id is null
order by u1.username;

Related

Jsonb_object_keys() does not return any rows in left join if the right side table does not have any matching records

This is db query .
select users.Id,jsonb_object_keys(orders.metadata::jsonb) from users left join orders on users.userId=orders.userId where users.userId=2;
users table orders table
------------------- -----------------------------------------------------
|userId| name | | userId|orderId|metadata |
| 1 | john | | 1 | 1 | {"orderName":"chess","quantity":1}|
| 2 | doe | | 1 | 2 | {"orderName":"cube" ,"quantity":1}|
------------------- -----------------------------------------------------
Why there are no rows returned by the query ?
Very Nice and tricky question. to achieve what you want you should try below query:
select
t1.userid,
t2.keys
from
users t1
left join (select userid, orderid, jsonb_object_keys(metadata) as keys from orders) t2
on t1.userid=t2.userid
Your Query seems correct but there is catch. When you are left joining both tables without jsonb_object_keys(metadata), it will work as you are expecting. But when you use with this function then this function will return a set of records for each rows of select statement and perform simple join with rest of the columns internally. That's why it will remove the rows having NULL value in second column.
You should left join to the result of the jsonb_each() call:
select users.userid, meta.*
from users
left join orders on users.userid = orders.userid
left join jsonb_object_keys(orders.metadata::jsonb) as meta on true
where users.userid = 2;

Transforming information in postgresql

So, I have 2 tables,
In the 1st table, there is an Information of users
user_id | name
1 | Albert
2 | Anthony
and in the other table, I have information
where some users have address information where it can either be home, office or both home and office
user_id| address_type | address
1 | home | a
1 | office | b
2 | home | c
and the final result I want is this
user_id | name | home_address | office_address
1 | Albert | a | b
2 | Anthony | c | null
I have tried using left join and json_agg but the information that way is not readable,
any suggestions on how I can do this?
You can use two outer joins, one for the office address and one for the home address.
select t1.user_id, t1.name,
ha.address as home_address,
oa.address as office_address
from table1 t1
left join table2 ha on ha.user_id = t1.user_id and ha.address_type = 'home'
left join table2 oa on oa.user_id = t1.user_id and ha.address_type = 'office';
A solution using JSON could look like this
select t1.user_id, t1.name,
a.addresses ->> 'home' as home_address,
a.addresses ->> 'office' as office_address
from table1 t1
left join (
select user_id, jsonb_object_agg(address_type, address) as addresses
from table2
group by user_id
) a on a.user_id = t1.user_id;
Which might be a bit more flexible, because you don't need to add a new join for each address type. The first query is likely to be faster if you need to retrieve a large number of rows.

How to use COUNT() in more that one column?

Let's say I have this 3 tables
Countries ProvOrStates MajorCities
-----+------------- -----+----------- -----+-------------
Id | CountryName Id | CId | Name Id | POSId | Name
-----+------------- -----+----------- -----+-------------
1 | USA 1 | 1 | NY 1 | 1 | NYC
How do you get something like
---------------------------------------------
CountryName | ProvinceOrState | MajorCities
| (Count) | (Count)
---------------------------------------------
USA | 50 | 200
---------------------------------------------
Canada | 10 | 57
So far, the way I see it:
Run the first SELECT COUNT (GROUP BY Countries.Id) on Countries JOIN ProvOrStates,
store the result in a table variable,
Run the second SELECT COUNT (GROUP BY Countries.Id) on ProvOrStates JOIN MajorCities,
Update the table variable based on the Countries.Id
Join the table variable with Countries table ON Countries.Id = Id of the table variable.
Is there a possibility to run just one query instead of multiple intermediary queries? I don't know if it's even feasible as I've tried with no luck.
Thanks for helping
Use sub query or derived tables and views
Basically If You You Have 3 Tables
select * from [TableOne] as T1
join
(
select T2.Column, T3.Column
from [TableTwo] as T2
join [TableThree] as T3
on T2.CondtionColumn = T3.CondtionColumn
) AS DerivedTable
on T1.DepName = DerivedTable.DepName
And when you are 100% percent sure it's working you can create a view that contains your three tables join and call it when ever you want
PS: in case of any identical column names or when you get this message
"The column 'ColumnName' was specified multiple times for 'Table'. "
You can use alias to solve this problem
This answer comes from #lotzInSpace.
SELECT ct.[CountryName], COUNT(DISTINCT p.[Id]), COUNT(DISTINCT c.[Id])
FROM dbo.[Countries] ct
LEFT JOIN dbo.[Provinces] p
ON ct.[Id] = p.[CountryId]
LEFT JOIN dbo.[Cities] c
ON p.[Id] = c.[ProvinceId]
GROUP BY ct.[CountryName]
It's working. I'm using LEFT JOIN instead of INNER JOIN because, if a country doesn't have provinces, or a province doesn't have cities, then that country or province doesn't display.
Thanks again #lotzInSpace.

Fetch matching results from the integer array satisfying the condition which is given as text

I've an array of integer data stored in a particular field in the user table. This array represents the groups in which the user belongs. A user can have any number of groups.
ie,
Table: user
user_id | user_name | user_groups
---------+-------------+-------------
1 | harry | {1,2,3}
2 | John | {4,5,6}
Table: Groups
group_id | group_name
------------+--------------
1 | Arts
2 | Science
3 | Security
4 | Sports
(Pardon, It should have been an 1-N relationship). I need to execute a query as follows,
SELECT * from user where user_groups = ANY(x);
where x will be text values Arts,Science,Security,Sports.
So when x= Arts, the result of harry is returned. The database that I'm using is Postgresql8.4
You can use #> contains operator:
SELECT *
FROM Users
WHERE user_groups #> (SELECT ARRAY[group_id]
FROM Groups
WHERE group_name = 'Arts')
SqlFiddleDemo
EDIT:
Is there any way by which I could display user_groups like
{Arts,Science,Security}, instead of {1,2,3}
You could use correlated subquery:
SELECT user_id, user_name, (SELECT array_agg(g.group_name)
FROM Groups g
WHERE ARRAY[g.group_id] <# u.user_groups) AS user_groups
FROM Users u
WHERE user_groups #> (SELECT ARRAY[group_id]
FROM Groups
WHERE group_name = 'Arts')
SqlFiddleDemo2

How can I feed back a subquery as a boolean column in PostgreSQL?

We store our accounts information in a PostgreSQL database.
Accounts are in the "accounts" table, groups in the "grp" table, and they're tied together by the "account_grp" table, which maps account_id to grp_id.
I'm trying to craft a query which will give me a view which lets me search for whether members of one group are members of another group, i.e. I want an "is_in_foobar_group" column in the view, so I can SELECT * FROM my_view WHERE grp_id = 1234; and get back
username | is_in_foobar_group | grp_id
---------+--------------------+-------
bob | true | 1234
alice | false | 1234
The foobar bit is hardcoded, and will not need to change.
Any suggestions?
Simpler, faster, more convenient:
WITH x AS (SELECT 1234 AS foobar) -- optional, to enter value only once
SELECT a.username
,EXISTS (
SELECT 1 FROM account_grp g
WHERE g.account_id = a.account_id
AND g.grp_id = x.foobar
) AS is_in_foobar_group
,x.foobar AS grp_id
FROM accounts a, x
Maybe using the EXISTS operator would help:
http://www.postgresql.org/docs/9.2/static/functions-subquery.html#FUNCTIONS-SUBQUERY-EXISTS
I'm not sure you can use it in a SELECT statement, and I don't have a PostgreSQL instance to check it.
Worst case you'll have to do 2 queries, something like:
SELECT username, true, grp_id
FROM accounts a INNER JOIN account_grp g1 on a.account_id = g.account_id
WHERE EXIST (SELECT 1 FROM account_grp g2
WHERE g2.account_id = a.account_id and g2.grp_id = [foobar])
UNION
SELECT username, false, grp_id
FROM accounts a INNER JOIN account_grp g1 on a.account_id = g.account_id
WHERE NOT EXIST (SELECT 1 FROM account_grp g2
WHERE g2.account_id = a.account_id and g2.grp_id = [foobar])