Join two tables on all columns to determine if they contain identical information - postgresql

I want to check if tables table_a and table_b are identical. I thought I could full outer join both tables on all columns and count the number of rows and missing values. However, both tables have many columns and I do not want to explicitly type out every column name.
Both tables have the same number of columns as well as names. How can I full outer join both of them on all columns without explicitly typing every column name?
I would like to do something along this syntax:
select
count(1)
,sum(case when x.id is null then 1 else 0 end) as x_nulls
,sum(case when y.id is null then 1 else 0 end) as y_nulls
from
x
full outer join
y
on
*
;

You can use NATURAL FULL OUTER JOIN here. The NATURAL key word will join on all columns that have the same name.
Just testing if the tables are identical could then be:
SELECT *
FROM x NATURAL FULL OUTER JOIN y
WHERE x.id IS NULL OR y.id IS NULL
This will show "orphaned" rows in either table.

You might use except operators.
For example the following would return an empty set if both tables contain the same rows:
select * from t1
except
select * from t2;
If you want to find rows in t1 that are different to those in t2 you could do
select * from t1
where not exists (select * from t1 except select * from t2);
Provided the number and types of columns match you can use select *, the tables' columns can vary in names; you could also invert the above and union to return combined differences.

Related

dynamically choose fields from different table based on existense

I have two tables A and B.
Both the tables have same number of columns.
Table A always contains all ids of Table B.
Need to fetch row from Table B first if it does not exist then have
to fetch from Table A.
I was trying to dynamically do this
select
CASE
WHEN b.id is null THEN
a.*
ELSE
b.*
END
from A a
left join B b on b.id = a.id
I think this syntax is not correct.
Can some one suggest how to proceed.
It looks like you want to select all columns from table A except when a matching ID exists in table B. In that case you want to select all columns from table B.
That can be done with this query as long as the number and types of columns in both tables are compatible:
select * from a where not exists (select 1 from b where b.id = a.id)
union all
select * from b
If the number, types, or order of columns differs you will need to explicitly specify the columns to return in each sub query.

SQL left join on maximum date

I have two tables: contracts and contract_descriptions.
On contract_descriptions there is a column named contract_id which is equal on contracts table records.
I am trying to join the latest record on contract_descriptions:
SELECT *
FROM contracts c
LEFT JOIN contract_descriptions d ON d.contract_id = c.contract_id
AND d.date_description =
(SELECT MAX(date_description)
FROM contract_descriptions t
WHERE t.contract_id = c.contract_id)
It works, but is it the performant way to do it? Is there a way to avoid the second SELECT?
You could also alternatively use DISTINCT ON:
SELECT * FROM contracts c LEFT JOIN (
SELECT DISTINCT ON (cd.contract_id) cd.* FROM contract_descriptions cd
ORDER BY cd.contract_id, cd.date_description DESC
) d ON d.contract_id = c.contract_id
DISTINCT ON selects only one row per contract_id while the sort clause cd.date_description DESC ensures that it is always the last description.
Performance depends on many values (for example, table size). In any case, you should compare both approaches with EXPLAIN.
Your query looks okay to me. One typical way to join only n rows by some order from the other table is a lateral join:
SELECT *
FROM contracts c
CROSS JOIN LATERAL
(
SELECT *
FROM contract_descriptions cd
WHERE cd.contract_id = c.contract_id
ORDER BY cd.date_description DESC
FETCH FIRST 1 ROW ONLY
) cdlast;

More Efficient Way to Join Three Tables Together in Postgres

I am attempting to link three tables together in postgres.
All three tables are generated from subqueries. The first table is linked to the second table by the variable call_sign as a FULL JOIN (because I want the superset of entries from both tables). The third table has an INNER JOIN with the second table also on call_sign (but theoretically could have been linked to the first table)
The query runs but is quite slow and I feel will become even slower as I add more data. I realize that there are certain things that I can do to speed things up - like not pulling unnecessary data in the subqueries and not converting text to numbers on the fly. But is there a better way to structure the JOINs between these three tables?
Any advice would be appreciated because I am a novice in postgres.
Here is the code:
select
(CASE
WHEN tmp1.frequency_assigned is NULL
THEN tmp2.lower_frequency
ELSE tmp1.frequency_assigned END) as master_frequency,
(CASE
WHEN tmp1.call_sign is NULL
THEN tmp2.call_sign
ELSE tmp1.call_sign END) as master_call_sign,
(CASE
WHEN tmp1.entity_type is NULL
THEN tmp2.entity_type
ELSE tmp1.entity_type END) as master_entity_type,
(CASE
WHEN tmp1.licensee_id is NULL
THEN tmp2.licensee_id
ELSE tmp1.licensee_id END) as master_licensee_id,
(CASE
WHEN tmp1.entity_name is NULL
THEN tmp2.entity_name
ELSE tmp1.entity_name END) as master_entity_name,
tmp3.market_name
FROM
(select cast(replace(frequency_assigned, ',','.') as decimal) AS frequency_assigned,
frequency_upper_band,
f.uls_file_number,
f.call_sign,
entity_type,
licensee_id,
entity_name
from combo_fr f INNER JOIN combo_en e
ON f.call_sign=e.call_sign
ORDER BY frequency_assigned DESC) tmp1
FULL JOIN
(select cast(replace(lower_frequency, ',','.') as decimal) AS lower_frequency,
upper_frequency,
e.uls_file_number,
mf.call_sign,
entity_type,
licensee_id,
entity_name
FROM market_mf mf INNER JOIN combo_en e
ON mf.call_sign=e.call_sign
ORDER BY lower_frequency DESC) tmp2
ON tmp1.call_sign=tmp2.call_sign
INNER JOIN
(select en.call_sign,
mk.market_name
FROM combo_mk mk
INNER JOIN combo_en en
ON mk.call_sign=en.call_sign) tmp3
ON tmp2.call_sign=tmp3.call_sign
ORDER BY master_frequency DESC;
you'll want to unwind those queries and do it all in one join, if you can. Soemthing like:
select <whatever you need>
from combo_fr f
JOIN combo_en e ON f.call_sign=e.call_sign
JOIN market_mf mf mf ON mf.call_sign=e.call_sign
JOIN combo_mk mk ON mk.call_sign=en.call_sign
I can't completely grok what you're doing, but some of the join clauses might have to become LEFT JOINs in order to deal with places where the call sign does or does not appear.
After creating indexes on call_sign for all four involved tables, try this:
WITH nodup AS (
SELECT call_sign FROM market_mf
EXCEPT SELECT call_sign FROM combo_fr
) SELECT
CAST(REPLACE(u.master_frequency_string, ',','.') AS DECIMAL)
AS master_frequency,
u.call_sign AS master_call_sign,
u.entity_type AS master_entity_type,
u.licensee_id AS master_licensee_id,
u.entity_name AS master_entity_name,
combo_mk.market_name
FROM (SELECT frequency_assigned AS master_frequency_string, call_sign,
entity_type, licensee_id, entity_name
FROM combo_fr
UNION ALL SELECT lower_frequency, call_sign,
entity_type, licensee_id, entity_name
FROM market_mf INNER JOIN nodup USING (call_sign)
) AS u
INNER JOIN combo_en USING (call_sign)
INNER JOIN combo_mk USING (call_sign)
ORDER BY 1 DESC;
I post this because this is the simplest way to understand what you need.
If there are no call_sign values which appear in both market_mf and
combo_fr, WITH nodup ... and INNER JOIN nodup ... can be omitted.
I am making the assumption that call_sign is unique in both combo_fr and market_mf ( = there are no two records in each table with the same value), even if there can be values which can appear in both tables.
It is very unfortunate that you order by a computed column, and that the computation is so silly. A certain optimization would be to convert the frequency strings once and for all in the table itself. The steps would be:
(1) add numeric frequncy columns to your tables (2) populate them with the values converted from the current text columns (3) convert new values directly into the new columns, by inputting them with a locale which has the desired decimal separator.

Full outer join on multiple tables in PostgreSQL

In PostgreSQL, I have N tables, each consisting of two columns: id and value. Within each table, id is a unique identifier and value is numeric.
I would like to join all the tables using id and, for each id, create a sum of values of all the tables where the id is present (meaning the id may be present only in subset of tables).
I was trying the following query:
SELECT COALESCE(a.id, b.id, c.id) AS id,
COALESCE(a.value,0) + COALESCE(b.value,0) + COALESCE(c.value.0) AS value
FROM
a
FULL OUTER JOIN
b
ON (a.id=b.id)
FULL OUTER JOIN
c
ON (b.id=c.id)
But it doesn't work for cases when the id is present in a and c, but not in b.
I suppose I would have to do some bracketing like:
SELECT COALESCE(x.id, c.id) AS id, x.value+c.value AS value
FROM
(SELECT COALESCE(a.id, b.id), a.value+b.value AS value
FROM
a
FULL OUTER JOIN
b
ON (a.id=b.id)
) AS x
FULL OUTER JOIN
c
ON (x.id = c.id)
It was only 3 tables and the code is ugly enough already imho. Is there some elegant, systematic ways how to do the join for N tables? Not to get lost in my code?
I would also like to point out that I did some simplifications in my example. Tables a, b, c, ..., are actually results of quite complex queries over several materialized views. But the syntactical problem remains the same.
I understood you need to sum the values from N tables and group them by id, correct?
For that I would do this:
Select x.id, sum (x.value) from (
Select * from a
Union all
Select * from b
Union all........
) as x group by x.id;
Since the n tables are composed by the same fields you can union them all creating a big table full of all the id - value tuples from all tables. Use union all because union filters for duplicates!
Then just sum all the values grouped by id.

Firebird get the list with all available id

In a table I have records with id's 2,4,5,8. How can I receive a list with values 1,3,6,7. I have tried in this way
SELECT t1.id + 1
FROM table t1
WHERE NOT EXISTS (
SELECT *
FROM table t2
WHERE t2.id = t1.id + 1
)
but it's not working correctly. It doesn't bring all available positions.
Is it possible without another table?
You can get all the missing ID's from a recursive CTE, like this:
with recursive numbers as (
select 1 number
from rdb$database
union all
select number+1
from rdb$database
join numbers on numbers.number < 1024
)
select n.number
from numbers n
where not exists (select 1
from table t
where t.id = n.number)
the number < 1024 condition in my example limit the query to the max 1024 recursion depth. After that, the query will end with an error. If you need more than 1024 consecutive ID's you have either run the query multiple times adjusting the interval of numbers generated or think in a different query that produces consecutive numbers without reaching that level of recursion, which is not too difficult to write.