Merge two tables in Postgresql giving preference to one particular table - postgresql

I have two tables, Users and Masters. Users are having User specific settingkey-value. Masters is having master settingkey-value. I want to display key-value from the two tables, where
if users do not have that particular key, need to take it from masters
2 if the users do not exists in the table, need to display all from masters key-value
if users having key-value, have to display users key-value
Example:
Inputs being - UserID and appID = 1.
I tried with left join combination, but not getting desired result if Users do not exists at all in the Users table.
Could you please give me some advise.

step-by-step demo:db<>fiddle
SELECT
COALESCE(m.app_id, u.app_id) as app_id,
COALESCE(m.setting_key, u.setting_key) as setting_key,
COALESCE(u.setting_value, m.setting_value) as setting_value -- 2
FROM
master_table m
FULL OUTER JOIN -- 1
user_table u
ON m.app_id = u.app_id AND m.setting_key = u.setting_key
WHERE COALESCE(m.app_id, u.app_id) = 1 -- 3
AND (u.user_id = 1 OR u.user_id IS NULL)
You need a FULL OUTER JOIN to join also data set that the other table does not contain
COALESCE(a, b) gives you the first non-null value. So, if a (here the user value) is available, it will be returned. Otherwise b (here the master value)
Filter by app_id and user_id; second needs to be filtered by user_id == NULL too, to get all setting_keys. Of course, you could use here COALESCE as well: COALESCE(u.user_id, 1) whereas the last 1 is the specific user_id you're asking
Edit: If User does not exist, give out the Masters values for app_id:
step-by-step demo:db<>fiddle:
SELECT DISTINCT ON (app_id, setting_key) -- 3
*
FROM (
SELECT
COALESCE(user_app_id, master_app_id) AS app_id, -- 2
COALESCE(user_setting_key, master_setting_key) AS setting_key,
COALESCE(user_setting_value, master_setting_value) AS setting_value,
user_id
FROM (
SELECT
app_id as master_app_id,
setting_key as master_setting_key,
setting_value as master_setting_value,
null as user_id,
null as user_app_id,
null as user_setting_key,
null as user_setting_value
FROM
master_table m
UNION -- 1
SELECT
*
FROM
master_table m
FULL OUTER JOIN
user_table u
ON m.app_id = u.app_id AND m.setting_key = u.setting_key
) s
) s
WHERE app_id = 1
AND (user_id = 2 OR user_id IS NULL)
ORDER BY app_id, setting_key, user_id NULLS LAST -- 3
This is a little more complicated. You need a separate data set for user_id == NULL which could be fetched. So, the NULL user represents the unknown user.
You can achieve this by adding the Master table with NULL values using an UNION.
Now you can create the expected columns with the COALESCE() functions as described above.
The third trick is using the DISTINCT ON clause on the app_id and the setting_key columns. When you ordered the NULL columns from the default UNION part in (1) last, then the DISTINCT ON will fetch the user record. However, when the user didn't exist, then the DISTINCT ON will fetch the default Master record.

Related

Restrict string_agg order by in postgres

While working with postgres db, I came across a situation where I will have to display column names based on their ids stored in a table with comma separated. Here is a sample:
table1 name: labelprint
id field_id
1 1,2
table2 name: datafields
id field_name
1 Age
2 Name
3 Sex
Now in order to display the field name by picking ids from table1 i.e. 1,2 from field_id column, I want the field_name to be displayed in same order as their respective ids as
Expected result:
id field_id field_name
1 2,1 Name,Age
To achieve the above result, I have written the following query:
select l.id,l.field_id ,string_agg(d.field_name,',') as field_names
from labelprint l
join datafields d on d.id = ANY(string_to_array(l.field_id::text,','))
group by l.id
order by l.id
However, the string_agg() functions sort the final string in ascending order and displays the output as shown below:
id field_id field_name
1 2,1 Age, Name
As you can see the order is not maintained in the field_name column which I want to display as per field_id value order.
Any suggestion/help is highly appreciated.
Thanks in advance!
Already mentioned in the description.
While this will probably be horrible for performance, as well as readability and maintainability, you can dynamically compute the order you want:
select l.id,l.field_id,
string_agg(d.field_name,','
order by array_position(string_to_array(l.field_id::text,','),d.id)
) as field_names
from labelprint l
join datafields d on d.id = ANY(string_to_array(l.field_id::text,','))
group by l.id
order by l.id;
You should at least store your array as an actual array, not as a comma delimited string. Or maybe use an intermediate table and don't store arrays at all.
With a small modification to your existing query you could do it as follows :
select l.id, l.field_id, string_agg(d.field_name,',') as field_names
from labelprint l
join datafields d on d.id::varchar = ANY(string_to_array(l.field_id,','))
group by l.id, l.field_id
order by l.id
Demo here

How can I flatten a case sensitive column that is also used in a relation

Hello postgres experts,
I have a users table
id
email
1
john#example.com
2
John#example.com
And a posts table
id
userId
content
1
1
foo
2
2
bar
The emails of the two users are the same when ignoring case so I am looking for a way to drop the rows that have duplicated emails and update the userId in the posts table to point to the user that remains.
So the final result will be:
id
email
1
john#example.com
id
userId
content
1
1
foo
2
1
bar
I'm not concerned which version of the email address I end up with (i.e. doesn't have to be the one that's all lowercase).
What's the best way to get this done?
You can update the posts table by taking the smallest id:
update posts p
set userid = u.user_id
from (
select min(id) user_id, array_agg(id) as user_ids
from users u
group by lower(email)
having count(*) > 1
) u
where p.userid = any(u.user_ids)
and p.userid <> u.user_id
;
The SELECT in the derived table returns all users that have more than one email address. The WHERE clause then updates the posts table to use one of the IDs. Once that is done, you can delete the no longer used users
delete from users
where not exists (select *
from posts p
where users.id = p.userid);
Online example
The key to deduplicating rows is breaking ties by some kind of row ID, which you already have. We're going to keep the user with the lowest ID for each case-insensitive email, keeping in mind that there may be more than 2 duplicates for some.
First, for each post, set the user to any user with an equivalent email, for which there exists no other user also with an equivalent email but a lower ID. If I'm doing this right, that should match exactly one user row every time, either the original user or another one.
UPDATE posts p SET "userId" = u2.id
FROM users u, users u2
WHERE u.id = p."userId"
AND lower(u2.email) = lower(u.email)
AND NOT EXISTS (
SELECT 1
FROM users u3
WHERE u3.id < u2.id
AND lower(u3.email) = lower(u2.email)
);
(You could also do this with a MIN or DISTINCT subquery, but in my experience this is the fastest.)
Then delete any users for which there exists a user with an equivalent email and a lower ID:
DELETE FROM users u
WHERE EXISTS (
SELECT 1 FROM users u2
WHERE u2.id < u.id
AND lower(u2.email) = lower(u.email)
);
Optionally, seal the deal with a uniqueness constraint on lower-case email. I don't remember the exact syntax, but this should be close:
CREATE INDEX user_lower_email ON users(lower(email));

Query users on filter applied to a one-to-many relationship table postgresql

We currently have a users table with a one-to-many relationship on a table called steps. Each user can have either four steps or seven steps. The steps table schema is as follows:
id | user_id | order | status
-----------------------------
# | # |1-7/1-4| 0 or 1
I am trying to query all of the users who have a status of 1 on all of their steps. So if they have either 4 or 7 steps, they must all have a status of 1.
I tried a join with a check on step 4 (since a step cannot be complete without the previous one being complete as well) but this has issues if someone with 7 steps completed step 4 but not 7.
select u.first_name, u.last_name, u.email, date(s.updated_at) as completed_date
from users u
join steps s on u.id = s.user_id
where s.order = 4 and s.status = 1;
The bool_and aggregate function should help you to identify the users with all their steps at status = 1 whatever the number of steps.
Then the array_agg aggregate function can help to find the updated_at date associated to the last step for each user by ordering the dates according to order DESC and selecting the first value in the resulting array [1] :
SELECT u.first_name, u.last_name, u.email
, s.completed_date
FROM users u
INNER JOIN
( SELECT user_id
, (array_agg(updated_at ORDER BY order DESC))[1] :: date as completed_date
FROM steps
GROUP BY user_id
HAVING bool_and(status :: boolean) -- filter the users with all their steps status = 1
) AS s
ON u.id = s.user_id

show records that have only one matchin row in another table

I need to write a sql code that probably is very simple but I am very new to it.
I need to find all the records from one table that have matching id (but no more than one) from the other table. eg. one table contains records of the employees and the second one with employees' telephone numbers. i need to find all employees with only one telephone no
Sample data would be nice. In absence of:
SELECT
employees.employee_id
FROM
employees
LEFT JOIN
(SELECT distinct on(employee_id) employee_id FROM emp_phone) AS phone
ON
employees.employee_id = phone.employee_id
WHERE
phone.employee_id IS NOT NULL;
You need a join of the 2 tables, group by employee and the condition in the having clause:
SELECT e.employee_id, e.name
FROM employees e INNER JOIN numbers n
ON e.employee_id = n.employee_id
GROUP BY e.employee_id, e.name
HAVING COUNT(*) = 1;
If there can be more than a few numbers per employee in the table with the employees' telephone numbers (calling it tel), then it's cheaper to avoid GROUP BY and HAVING which has to process all rows. Find employees with "unique" numbers using a self-anti-join with NOT EXISTS.
While you don't need more than the employee_id and their unique phone number, you don't even have to involve the employee table at all:
SELECT *
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
);
If you need additional columns from the employee table:
SELECT * -- or any columns you need
FROM (
SELECT employee_id AS id, tel_number -- or any columns you need
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
)
) t
JOIN employee e USING (id);
The column alias in the subquery (employee_id AS id) is just for convenience. Then the outer join condition can be USING (id), and the ID column is only included once in the result, even with SELECT * ...
Simpler with a smart naming convention that uses employee_id for the employee ID everywhere. But it's a widespread anti-pattern to use employee.id instead.
Related:
JOIN table if condition is satisfied, else perform no join

Getting NULL values in JOINED table with LIMIT

There are many similar questions which I've learned from, but my result set isn't returning the expected results.
My Objective:
Build a query that will return a result set containing all rows in table demo1 with user_id = "admin", and the only row of table demo2 with user_id = "admin". Each row in demo2 has a unique user_id so there's always only one row with "admin" as user_id.
However, I don't want demo2 data to wastefully repeat on every subsequent row of demo1. I only want the first row of the result set to contain demo2 data as non-null values. Null values for demo2 columns should only be returned for rows 2+ in the result set.
Current Status:
Right now my query is returning the appropriate columns (all demo1 and all demo2) but
all the data returned from demo2 is null.
Demo1:
id user_id product quantity warehouse
1 admin phone 3 A
2 admin desk 1 D
3 k45 chair 5 B
Demo2:
id user_id employee job country
1 admin james tech usa
2 c39 cindy tech spain
Query:
SELECT *
from demo1
left join (SELECT * FROM demo2 WHERE demo2.user_id = 'X' LIMIT 1) X
on (demo1.user_id = x.user_id)
WHERE demo1.user_id = 'admin'
Rationale:
The subquery's LIMIT 1 was my attempt to retrieve demo2 values for row 1 only, thinking the rest would be null. Instead, all values are null.
Current Result:
id user_id product quantity warehouse id employee job country
1 admin phone 3 A null null null null
2 admin desk 1 D null null null null
Desired Result:
id user_id product quantity warehouse id employee job country
1 admin phone 3 A 1 james tech usa
2 admin desk 1 D null null null null
I've tried substituting left join for left inner join, right join, full join, but nothing returns the desired result.
Your join is going to bring through ANY records that satisfies the join condition for your two tables. There is no changing that.
But you could suppress subsequent records in your result set from displaying the matching demo2 record that satisfied the join condition AFTER it's joined:
SELECT demo1.id ,
demo1.user_id,
demo1.product,
demo1.quantity,
demo1.warehouse
CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.id END as demo2_id,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.employee END AS demo2_employee,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.job END as demo2_job,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.country END as demo2_country
from demo1
left join demo2
on demo1.user_id = demo2.user_id
AND demo2.user_id = 'X'
WHERE demo1.user_id = 'admin'
That's just a quick rewrite of your original sql with the addition CASE expressions included.
That being said, this sql will produce no results for demo2 since the demo2.user_id can't satisfy both conditions in this query:
The join condition demo1.user_id = demo2.user_id with the where predicate of demo1.user_id = 'admin'
Also hold the value X.
It's either admin and satisfies your first join condition, but fails your second. Or it's X and satisfies your second condition, but nor your first.
Here is another nice approach:
sqlfiddle