More Efficient Way to Join Three Tables Together in Postgres - postgresql

I am attempting to link three tables together in postgres.
All three tables are generated from subqueries. The first table is linked to the second table by the variable call_sign as a FULL JOIN (because I want the superset of entries from both tables). The third table has an INNER JOIN with the second table also on call_sign (but theoretically could have been linked to the first table)
The query runs but is quite slow and I feel will become even slower as I add more data. I realize that there are certain things that I can do to speed things up - like not pulling unnecessary data in the subqueries and not converting text to numbers on the fly. But is there a better way to structure the JOINs between these three tables?
Any advice would be appreciated because I am a novice in postgres.
Here is the code:
select
(CASE
WHEN tmp1.frequency_assigned is NULL
THEN tmp2.lower_frequency
ELSE tmp1.frequency_assigned END) as master_frequency,
(CASE
WHEN tmp1.call_sign is NULL
THEN tmp2.call_sign
ELSE tmp1.call_sign END) as master_call_sign,
(CASE
WHEN tmp1.entity_type is NULL
THEN tmp2.entity_type
ELSE tmp1.entity_type END) as master_entity_type,
(CASE
WHEN tmp1.licensee_id is NULL
THEN tmp2.licensee_id
ELSE tmp1.licensee_id END) as master_licensee_id,
(CASE
WHEN tmp1.entity_name is NULL
THEN tmp2.entity_name
ELSE tmp1.entity_name END) as master_entity_name,
tmp3.market_name
FROM
(select cast(replace(frequency_assigned, ',','.') as decimal) AS frequency_assigned,
frequency_upper_band,
f.uls_file_number,
f.call_sign,
entity_type,
licensee_id,
entity_name
from combo_fr f INNER JOIN combo_en e
ON f.call_sign=e.call_sign
ORDER BY frequency_assigned DESC) tmp1
FULL JOIN
(select cast(replace(lower_frequency, ',','.') as decimal) AS lower_frequency,
upper_frequency,
e.uls_file_number,
mf.call_sign,
entity_type,
licensee_id,
entity_name
FROM market_mf mf INNER JOIN combo_en e
ON mf.call_sign=e.call_sign
ORDER BY lower_frequency DESC) tmp2
ON tmp1.call_sign=tmp2.call_sign
INNER JOIN
(select en.call_sign,
mk.market_name
FROM combo_mk mk
INNER JOIN combo_en en
ON mk.call_sign=en.call_sign) tmp3
ON tmp2.call_sign=tmp3.call_sign
ORDER BY master_frequency DESC;

you'll want to unwind those queries and do it all in one join, if you can. Soemthing like:
select <whatever you need>
from combo_fr f
JOIN combo_en e ON f.call_sign=e.call_sign
JOIN market_mf mf mf ON mf.call_sign=e.call_sign
JOIN combo_mk mk ON mk.call_sign=en.call_sign
I can't completely grok what you're doing, but some of the join clauses might have to become LEFT JOINs in order to deal with places where the call sign does or does not appear.

After creating indexes on call_sign for all four involved tables, try this:
WITH nodup AS (
SELECT call_sign FROM market_mf
EXCEPT SELECT call_sign FROM combo_fr
) SELECT
CAST(REPLACE(u.master_frequency_string, ',','.') AS DECIMAL)
AS master_frequency,
u.call_sign AS master_call_sign,
u.entity_type AS master_entity_type,
u.licensee_id AS master_licensee_id,
u.entity_name AS master_entity_name,
combo_mk.market_name
FROM (SELECT frequency_assigned AS master_frequency_string, call_sign,
entity_type, licensee_id, entity_name
FROM combo_fr
UNION ALL SELECT lower_frequency, call_sign,
entity_type, licensee_id, entity_name
FROM market_mf INNER JOIN nodup USING (call_sign)
) AS u
INNER JOIN combo_en USING (call_sign)
INNER JOIN combo_mk USING (call_sign)
ORDER BY 1 DESC;
I post this because this is the simplest way to understand what you need.
If there are no call_sign values which appear in both market_mf and
combo_fr, WITH nodup ... and INNER JOIN nodup ... can be omitted.
I am making the assumption that call_sign is unique in both combo_fr and market_mf ( = there are no two records in each table with the same value), even if there can be values which can appear in both tables.
It is very unfortunate that you order by a computed column, and that the computation is so silly. A certain optimization would be to convert the frequency strings once and for all in the table itself. The steps would be:
(1) add numeric frequncy columns to your tables (2) populate them with the values converted from the current text columns (3) convert new values directly into the new columns, by inputting them with a locale which has the desired decimal separator.

Related

Join two tables on all columns to determine if they contain identical information

I want to check if tables table_a and table_b are identical. I thought I could full outer join both tables on all columns and count the number of rows and missing values. However, both tables have many columns and I do not want to explicitly type out every column name.
Both tables have the same number of columns as well as names. How can I full outer join both of them on all columns without explicitly typing every column name?
I would like to do something along this syntax:
select
count(1)
,sum(case when x.id is null then 1 else 0 end) as x_nulls
,sum(case when y.id is null then 1 else 0 end) as y_nulls
from
x
full outer join
y
on
*
;
You can use NATURAL FULL OUTER JOIN here. The NATURAL key word will join on all columns that have the same name.
Just testing if the tables are identical could then be:
SELECT *
FROM x NATURAL FULL OUTER JOIN y
WHERE x.id IS NULL OR y.id IS NULL
This will show "orphaned" rows in either table.
You might use except operators.
For example the following would return an empty set if both tables contain the same rows:
select * from t1
except
select * from t2;
If you want to find rows in t1 that are different to those in t2 you could do
select * from t1
where not exists (select * from t1 except select * from t2);
Provided the number and types of columns match you can use select *, the tables' columns can vary in names; you could also invert the above and union to return combined differences.

How to get unique rows by one column but sort by the second

There is an example request in which there are several joins.
SELECT DISTINCT ON(a.id_1) 1, a.name, b.task, c.created_at
FROM a
INNER JOIN b ON a.id_2 = b.id
INNER JOIN c ON a.ID_2 = c.id
WHERE a.deleted_at IS NULL
ORDER BY a.id_1 desc
In this case, the query will work, sorting by unique values ​​of id_1 will take place. But I need to sort by the column a.name. In this case, postresql will swear with the words ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions.
The following query can serve as a solution to the problem:
SELECT *
FROM(
SELECT DISTINCT ON(a.id_1) a.name, b.task, c.created_at
FROM a
INNER JOIN b ON a.id_2 = b.id
INNER JOIN c ON a.ID_2 = c.id
WHERE a.deleted_at IS NULL
)
ORDER_BY a.name desc
But in reality the database is very large and such a query is not optimal. Are there other ways to sort by the selected column while keeping one uniqueness?

How to join vertical and horizontal table together table

I have two table with one of them is vertical i.e store only key value pair with ref id from table 1. i want to join both table and dispaly key value pair as a column in select. and also perform sorting on few keys.
T1 having (id,empid,dpt)
T2 having (empid,key,value)
select
T1.*,
t21.value,
t22.value,
t23.value,
t24.value
from Table1 t1
join Table2 t21 on t1.empid = t21.empid
join Table2 t22 on t1.empid = t22.empid
join Table2 t23 on t1.empid = t23.empid
where
t21.key = 'FNAME'
and t22.key = 'LNAME'
and t23.key='AGE'
The query you demonstrate is very inefficient (another join for each additional column) and also has a potential problem: if there isn't a row in T2 for every key in the WHERE clause, the whole row is excluded.
The second problem can be avoided with LEFT [OUTER] JOIN instead of [INNER] JOIN. But don't bother, the solution to the first problem is a completely different query. "Pivot" T2 using crosstab() from the additional module tablefunc:
SELECT * FROM crosstab(
'SELECT empid, key, value FROM t2 ORDER BY 1'
, $$VALUES ('FNAME'), ('LNAME'), ('AGE')$$ -- more?
) AS ct (empid int -- use *actual* data types
, fname text
, lname text
, age text);
-- more?
Then just join to T1:
select *
from t1
JOIN (<insert query from above>) AS t2 USING (empid);
This time you may want to use [INNER] JOIN.
The USING clause conveniently removes the second instance of the empid column.
Detailed instructions:
PostgreSQL Crosstab Query

Selecting non-repeating values in Postgres

SELECT DISTINCT a.s_id, select2Result.s_id, select2Result."mNrPhone",
select2Result."dNrPhone"
FROM "Table1" AS a INNER JOIN
(
SELECT b.s_id, c."mNrPhone", c."dNrPhone" FROM "Table2" AS b, "Table3" AS c
WHERE b.a_id = 1001 AND b.s_id = c.s_id
ORDER BY b.last_name) AS select2Result
ON a.a_id = select2Result.student_id
WHERE a.k_id = 11211
It returns:
1001;1001;"";""
1002;1002;"";""
1002;1002;"2342342232123";"2342342"
1003;1003;"";""
1004;1004;"";""
1002 value is repeated twice, but it shouldn't because I used DISTINCT and no other table has an id repeated twice.
You can use DISTINCT ON like this:
SELECT DISTINCT ON (a.s_id)
a.s_id, select2Result.s_id, select2Result."mNrPhone",
select2Result."dNrPhone"
...
But like other persons have told you, the "repeated records" are different really.
The qualifier DISTINCT applies to the entire row, not to the first column in the select-list. Since columns 3 and 4 (mNrPhone and dNrPhone) are different for the two rows with s_id = 1002, the DBMS correctly lists both rows. You have to write your query differently if you only want the s_id = 1002 to appear once, and you have to decide which auxilliary data you want shown.
As an aside, it is strongly recommended that you always use the explicit JOIN notation (which was introduced in SQL-92) in all queries and sub-queries. Do not use the old implicit join notation (which is all that was available in SQL-86 or SQL-89), and especially do not use a mixture of explicit and implicit join notations (where your sub-query uses the implicit join, but the main query uses explicit join). You need to know the old notation so you can understand old queries. You should write new queries in the new notation.
First of all, the query displayed does not work at all, student_id is missing in the sub-query. You use it in the JOIN later.
More interestingly:
Pick a certain row out of a set with DISTINCT
DISTINCT and DISTINCT ON return distinct values by sorting all rows according to the set of columns to be distinct, then it picks the first row from every set. It sorts by all rows for a general DISTINCT and only the specified rows for DISTINCT ON. Here lies the opportunity to pick certain rows out of a set over other.
For instance if you prefer rows with not-empty "mNrPhone" in your example:
SELECT DISTINCT ON (a.s_id) -- sure you didn't want a.a_id?
,a.s_id AS a_s_id -- use aliases to avoid dupe name
,s.s_id AS s_s_id
,s."mNrPhone"
,s."dNrPhone"
FROM "Table1" a
JOIN (
SELECT b.s_id, c."mNrPhone", c."dNrPhone", ??.student_id -- misssing!
FROM "Table2" b
JOIN "Table3" c USING (s_id)
WHERE b.a_id = 1001
-- ORDER BY b.last_name -- pointless, DISTINCT will re-order
) s ON a.a_id = s.student_id
WHERE a.k_id = 11211
ORDER BY a.s_id -- first col must agree with DISTINCT ON, could add DESC though
,("mNrPhone" <> '') DESC -- non-empty first
ORDER BY cannot disagree with DISTINCT on the same query level. To get around this you can either use GROUP BY instead or put the whole query in a sub-query and run another SELECT with ORDER BY on it.
The ORDER BY you had in the sub-query is voided now.
In this particular case, if - as it seems - the dupes come only from the sub-query (you'd have to verify), you could instead:
SELECT a.a_id, s.s_id, s."mNrPhone", s."dNrPhone" -- picking a.a_id over s_id
FROM "Table1" a
JOIN (
SELECT DISTINCT ON (b.s_id)
,b.s_id, c."mNrPhone", c."dNrPhone", ??.student_id -- misssing!
FROM "Table2" b
JOIN "Table3" c USING (s_id)
WHERE b.a_id = 1001
ORDER BY b.s_id, (c."mNrPhone" <> '') DESC -- pick non-empty first
) s ON a.a_id = s.student_id
WHERE a.k_id = 11211
ORDER BY a.a_id -- now you can ORDER BY freely

Join table variable vs join view

I have a stored procedure which is running quite slow. Therefore I want to extract some of the query in a separate view.
My code looks something like this:
DECLARE #tmpTable TABLE(..)
INSERT INTO #tmpTable (..) *query* (returns 3000 rows)
Select ... from table1
inner join table2
inner join table3
inner join #tmpTable
...
I then extract (copy-paste) the *query* and put it in a view - i.e. vView.
Doing this will then give me a different result:
Select ... from table1
inner join table2
inner join table3
inner join vView
...
Why? I can see that the vView and the #tmpTable both returns 3000 rows, so they should match (also did a except query to check).
Any comments would be much appriciated as I feel quite stuck with this..
EDITED:
This is the full query for getting the result (using #tmpTable or vView gives me different results, although the appear the same):
select dep.sid as depsid, dep.[name], COUNT(b.sid) as possiblelogins, count(ls.clientsid) as logins
from department dep
inner join relationship r on dep.sid=r.primarysid and r.relationshiptypeid=27 and r.validto is null
inner join [user] u on r.secondarysid=u.sid
inner join relationship r2 on u.sid=r2.secondarysid and r2.validto is null and r2.relationshiptypeid in (1,37)
inner join client c on r2.primarysid=c.sid
inner join ***#tmpTable or vView*** b on b.sid = c.sid
left outer join (select distinct clientsid from logonstatistics) as ls on b.sid=ls.clientsid
GROUP BY dep.sid, dep.[name],dep.isdepartment
HAVING dep.isdepartment=1
You maybe don't need the view/table if you change to this.
It joins on to client c and appears to be there only to JOIN onto logonstatistics
--remove inner join ***#tmpTable or vView*** b on b.sid = c.sid
--change JOIN
left outer join (select distinct clientsid from logonstatistics) as ls on c.sid=ls.clientsid
And change COUNT(b.sid) to COUNT(c.sid) in the SELECT clause
Otherwise, if you get different results you have two options I can see:
Table and view have different data. Have you run a line by line comparsion?
One has NULL, one has a value (especially for the sid column which will affect the JOIN)
Finally, when you says "different results" do you mean you get x2 or x3 rows? A different COUNT? What?