Map column value to table name and join - postgresql

I have a composite type that looks like
CREATE TYPE member AS (
id BIGINT,
type CHAR(1)
);
I have a table that relies on this member type with an array.
CREATE TABLE relation (
id BIGINT PRIMARY KEY,
members member[]
);
I have three other tables each with a different schema (but having common id field)
CREATE TABLE table_x (
id BIGINT PRIMARY KEY,
some_text TEXT
);
CREATE TABLE table_y (
id BIGINT PRIMARY KEY,
some_int INT
);
CREATE TABLE table_z (
id BIGINT PRIMARY KEY,
some_date TIMESTAMP
);
type field in member type is just one character to find out table that specific member belongs to. A row in relation table can have a mix of different types.
I have a scenario which requires returning relation ids with at least one member fulfilling a certain condition based on it's type (let's say for x => some_text is not empty or y => some_int is greater than 10 or z => some_date is a week is from now).
I can implement this scenario on the application side by making multiple requests to the database:
unnest relation table
collect member data per relation
make new requests to find out relations
I am wondering if there is a way to map column values to table names and join them.

Assumption
I´m assuming that relation.members array does not have more than one member element of the same type. Correct?
Query to try
with unnested_members as (
-- Unnest members array
select id, unnest(members) members
from relation
)
, members_joined as (
-- left join on a per type basis with table_x, table_y and table_z.
select r.id, (r.members).id idext, (r.members).type,
x.some_text, y.some_int, z.some_date -- more types, more columns here
from unnested_members r
left join table_x x on (x.id = (r.members).id and (r.members).type = 'x')
left join table_y y on (y.id = (r.members).id and (r.members).type = 'y')
left join table_z z on (z.id = (r.members).id and (r.members).type = 'z')
-- More types, more tables to left join
)
select id,
max(some_text) some_text, -- use max() to get not null value for this id
max(some_int) some_int, -- use max() to get not null value for this id
max(some_date) some_date -- use max() to get not null value for this id
-- more types, more max() columns here
from members_joined
group by id -- get one row per relation.id with data from joined table_* columns
If you need to include more tables then you have to include these tables in the left join part, include the column in the select list and in the max() section as well.

#JNevill had a good point about this database design. Although this approach may not seem optimal, it keeps the table definitions clearly separate without any relations in between them. Also the size of relation table is fairly small compared to other three tables.
I solved the problem by simply fetching rows per type and merging them:
SELECT relation.* FROM relation, UNNEST(relation.members) member INNER JOIN table_x ON member.id = table_x.id WHERE member.type = 'x' AND table_x.some_text = 'some text value'
UNION
SELECT relation.* FROM relation, UNNEST(relation.members) member INNER JOIN table_y ON member.id = table_y.id WHERE member.type = 'y' AND table_y.some_int = 123
UNION
SELECT relation.* FROM relation, UNNEST(relation.members) member INNER JOIN table_z ON member.id = table_z.id WHERE member.type = 'z' AND table_z.some_date > '2017-01-11 00:00:00';

Related

filter taking to much time in posgresdb on gender field

I have one table with 100M plus rows which looks like this
Create table member (
id bigint,
gender text,
//..other fields
primary key (id)
);
Now the gender field has two possible value 'M' or 'F'
Whenever I am using the gender field then it's taking to much time I have indexes on other fields like id, member details, mobile number
select
count(1) filter (where mod.is_active and m.gender = 'M') as male,
count(1) filter (where mod.is_active and m.gender = 'F') as female
from member_other_details mod
inner join member m on m.id = mod.member_id
This query is taking hrs to complete
How can I optimize this?
Personnally i would execute this query
select m.gender,count(*)
from member_other_details mod inner join member m on m.id = mod.member_id
where mod.is_active
group by m.gender

Optional filter on a column of an outer joined table in the where clause

I have got two tables:
create table student
(
studentid bigint primary key not null,
name varchar(200) not null
);
create table courseregistration
(
studentid bigint not null,
coursenamename varchar(200) not null,
isfinished boolean default false
);
--insert some data
insert into student values(1,'Dave');
insert into courseregistration values(1,'SQL',true);
Student is fetched with id, so it should be always returned in the result. Entry in the courseregistration is optional and should be returned if there are matching rows and those matching rows should be filtered on isfinished=false. This means I want to get the course regsitrations that are not finished yet. Tried to outer join student with courseregistration and filter courseregistration on isfinished=false. Note that, I still want to retrieve the student.
Trying this returns no rows:
select * from student
left outer join courseregistration using(studentid)
where studentid = 1
and courseregistration.isfinished = false
What I'd want in the example above, is a result set with 1 row student, but course rows null (because the only example has the isfinished=true). One more constraint though. If there is no corresponding row in courseregistration, there should still be a result for the student entry.
This is an adjusted example. I can tweak my code to solve the problem, but I really wonder, what is the "correct/smart way" of solving this in postgresql?
PS I have used the (+) in Oracle previously to solve similar issues.
Isn't this what you are looking for :
select * from student s
left outer join courseregistration cr
on s.studentid = cr.studentid
and cr.isfinished = false
where s.studentid = 1
db<>fiddle here

show records that have only one matchin row in another table

I need to write a sql code that probably is very simple but I am very new to it.
I need to find all the records from one table that have matching id (but no more than one) from the other table. eg. one table contains records of the employees and the second one with employees' telephone numbers. i need to find all employees with only one telephone no
Sample data would be nice. In absence of:
SELECT
employees.employee_id
FROM
employees
LEFT JOIN
(SELECT distinct on(employee_id) employee_id FROM emp_phone) AS phone
ON
employees.employee_id = phone.employee_id
WHERE
phone.employee_id IS NOT NULL;
You need a join of the 2 tables, group by employee and the condition in the having clause:
SELECT e.employee_id, e.name
FROM employees e INNER JOIN numbers n
ON e.employee_id = n.employee_id
GROUP BY e.employee_id, e.name
HAVING COUNT(*) = 1;
If there can be more than a few numbers per employee in the table with the employees' telephone numbers (calling it tel), then it's cheaper to avoid GROUP BY and HAVING which has to process all rows. Find employees with "unique" numbers using a self-anti-join with NOT EXISTS.
While you don't need more than the employee_id and their unique phone number, you don't even have to involve the employee table at all:
SELECT *
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
);
If you need additional columns from the employee table:
SELECT * -- or any columns you need
FROM (
SELECT employee_id AS id, tel_number -- or any columns you need
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
)
) t
JOIN employee e USING (id);
The column alias in the subquery (employee_id AS id) is just for convenience. Then the outer join condition can be USING (id), and the ID column is only included once in the result, even with SELECT * ...
Simpler with a smart naming convention that uses employee_id for the employee ID everywhere. But it's a widespread anti-pattern to use employee.id instead.
Related:
JOIN table if condition is satisfied, else perform no join

Postgres: Need to match records from two tables based on key value and earliest dates in each table

I'm dealing with a pretty unique record matching problem within postgres right now. Essentially I have a table (A) with a lot of records in it, including a key value that I need to match on and the date of the record. Then I have this other table (B) that I want to match the first table on that key value. However, there can be multiple of the same 'key values' in both tables. To get around this I need to match the earliest key value from table A to the earliest key value to table B, the second earliest to the second earliest, and so on... However, if table B runs out of key value matches in table B then I want to default to the latest key value match in A, even though something else already matched on it.
My initial thought is to use a something like this on both tables:
ROW_NUMBER() OVER ( PARTITION BY key_value ORDER BY date) AS rank
And then join on the rank and key_value field. However, I'm not exactly sure how to get that default scenario to work with this method. And if records are added to one table and not the other and I try the join again, I feel like it might get out of sync.
My other thought was to use a cursor, but I'm really struggling to see how I'd implement that.
Any help would be greatly appreciated!
first you need number all your rows, the find the one with matching ranks.
After that match the one without matching to the latest_date
with cteA as (
SELECT *, ROW_NUMBER() OVER ( PARTITION BY key_value ORDER BY date) AS rank
FROM tableA
), cteB as (
SELECT *, ROW_NUMBER() OVER ( PARTITION BY key_value ORDER BY date) AS rank
FROM tableB
), ranked_match as (
SELECT ctA.*, cteB.*
FROM cteA
LEFT JOIN cteB
ON cteA.key_value = cteB.key_value
AND cteA.rank = cteB.rank
), latest_row as (
SELECT *, ROW_NUMBER() OVER ( PARTITION BY key_value ORDER BY date DESC) AS rank
FROM tableB
)
SELECT *
FROM ranked_match
WHERE cteB.key_value IS NOT NULL
UNION ALL
SELECT *
FROM ranked_match
JOIN latest_row
ON ranked_match.key_value = latest_row .key_value
WHERE cteB.key_value IS NULL
AND latest_row .rank = 1

Is there a way to have an "automatic" join in postgresql?

I mean the following:
I have 2 parent tables :
table1
id PRIMARY KEY
name TEXT
table2
id PRIMARY KEY
...
and a child table, used fo n-n Relations :
table_child
id PRIMARY KEY
id_1 INT
id_2 INT
where id_1 and id_2 in table_child refer to the column id in table1 and table2.
Now : i often perform request, with a join between table_1 and table_child ON table1.id = table_child.id1, only because i need the value of the column table1.name.
I'm wondering if there is a way to avoid these joins, and declare somehow a "pseudo" column name in table_child, which would not be a real column, but a link to the corresponding column in table_1, so that :
* I can acces the value through table_child.name
* But it is always synchronized with the value table1.name
I hope my explanation was understandable...
Further to my comment above, the answer you're really looking for is something like:
CREATE VIEW
table1_child_view AS
SELECT
table1.name,
table1_child.*
FROM
table1_child
INNER JOIN
table1 ON
table1.id = table1_child.id_1
Then you can run your queries on the new view, such as:
SELECT name FROM table1_child_view WHERE ...