Join postgres table on two columns? - postgresql

I can't find a straightforward answer. My query is spitting out the wrong result, and I think it's because it's not seeing the "AND" as an actual join.
Can you do something like this and if not, what is the correct approach:
SELECT * from X
LEFT JOIN Y
ON
y.date = x.date AND y.code = x.code
?

This is possible:
The ON clause is the most general kind of join condition: it takes a Boolean value expression of the same kind as is used in a WHERE clause. A pair of rows from T1 and T2 match if the ON expression evaluates to true for them.
http://www.postgresql.org/docs/9.1/static/queries-table-expressions.html#QUERIES-FROM
Your SQL looks OK.

It's fine. In fact, you can put any condition in the ON clause, even one
not related to the key columns or even the the tables at all, eg:
SELECT * from X
LEFT JOIN Y
ON y.date = x.date
AND y.code = x.code
AND EXTRACT (dow from current_date) = 1

Another, arguably more readable way of writing the join is to use tuples of columns:
SELECT * from X
LEFT JOIN Y
ON
(y.date, y.code) = (x.date, x.code)
;
, which clearly indicates that the join is based on the equality on several columns.

This solution has good performance:
select * from(
select md5(concat(date, code)) md5_x from x ) as x1
left join (select md5(concat(date, code)) md5_y from y) as y1
on x1.md5_x = y1.md5_y

Related

Join two tables on all columns to determine if they contain identical information

I want to check if tables table_a and table_b are identical. I thought I could full outer join both tables on all columns and count the number of rows and missing values. However, both tables have many columns and I do not want to explicitly type out every column name.
Both tables have the same number of columns as well as names. How can I full outer join both of them on all columns without explicitly typing every column name?
I would like to do something along this syntax:
select
count(1)
,sum(case when x.id is null then 1 else 0 end) as x_nulls
,sum(case when y.id is null then 1 else 0 end) as y_nulls
from
x
full outer join
y
on
*
;
You can use NATURAL FULL OUTER JOIN here. The NATURAL key word will join on all columns that have the same name.
Just testing if the tables are identical could then be:
SELECT *
FROM x NATURAL FULL OUTER JOIN y
WHERE x.id IS NULL OR y.id IS NULL
This will show "orphaned" rows in either table.
You might use except operators.
For example the following would return an empty set if both tables contain the same rows:
select * from t1
except
select * from t2;
If you want to find rows in t1 that are different to those in t2 you could do
select * from t1
where not exists (select * from t1 except select * from t2);
Provided the number and types of columns match you can use select *, the tables' columns can vary in names; you could also invert the above and union to return combined differences.

How can I trigger a [42P09] error (ambiguous alias)

I have tried several ways to trigger a 42P09 (ambiguous alias) error, but none of them succeeded.
This just "works", even when combined with group by x or order by x:
select 1x, 1x;
This also "works":
with t as (select 1x, 1x, 1x) select * from t;
select x from (with x as (select 1x) select x.x x from x group by x order by x) as x;
These only raise a [42712] (duplicate alias):
with t as (select 1x), t as (select 1x) select * from t;
with t as (select 1x, 1x) select * from t, t;
These only raise a [42702] (ambiguous column):
select x from (select 1x, 1x) as t;
with t1 as (select 1x), t2 as (select 1x) select x from t1, t2;
My google-foo didn't bring up any relevant results, either. I could only find mentions of this error existing, but noone seems to encounter it.
Is there any way to trigger a 42P09 error?
Based on #Adrian Klaver's answer, here's a SQL statement that actually yields a 42P09 error:
with t1 as (select 1x), t2 as (select 1y)
select * from t1 x CROSS JOIN
(t2 x CROSS JOIN lateral ( select * from t2 where t1.x = t2.y ) y) z;
If it helps, from source(backend/parser/parse_relation.c):
"
Search the query's table namespace for an RTE matching the
given unqualified refname. Return the RTE if a unique match, or NULL
if no match. Raise error if multiple matches.
Note: it might seem that we shouldn't have to worry about the possibility
of multiple matches; after all, the SQL standard disallows duplicate table
aliases within a given SELECT level. Historically, however, Postgres has
been laxer than that. For example, we allow
SELECT ... FROM tab1 x CROSS JOIN (tab2 x CROSS JOIN tab3 y) z
on the grounds that the aliased join (z) hides the aliases within it,
therefore there is no conflict between the two RTEs named "x". However,
if tab3 is a LATERAL subquery, then from within the subquery both "x"es
are visible. Rather than rejecting queries that used to work, we allow
this situation, and complain only if there's actually an ambiguous
reference to "x".
"

More Efficient Way to Join Three Tables Together in Postgres

I am attempting to link three tables together in postgres.
All three tables are generated from subqueries. The first table is linked to the second table by the variable call_sign as a FULL JOIN (because I want the superset of entries from both tables). The third table has an INNER JOIN with the second table also on call_sign (but theoretically could have been linked to the first table)
The query runs but is quite slow and I feel will become even slower as I add more data. I realize that there are certain things that I can do to speed things up - like not pulling unnecessary data in the subqueries and not converting text to numbers on the fly. But is there a better way to structure the JOINs between these three tables?
Any advice would be appreciated because I am a novice in postgres.
Here is the code:
select
(CASE
WHEN tmp1.frequency_assigned is NULL
THEN tmp2.lower_frequency
ELSE tmp1.frequency_assigned END) as master_frequency,
(CASE
WHEN tmp1.call_sign is NULL
THEN tmp2.call_sign
ELSE tmp1.call_sign END) as master_call_sign,
(CASE
WHEN tmp1.entity_type is NULL
THEN tmp2.entity_type
ELSE tmp1.entity_type END) as master_entity_type,
(CASE
WHEN tmp1.licensee_id is NULL
THEN tmp2.licensee_id
ELSE tmp1.licensee_id END) as master_licensee_id,
(CASE
WHEN tmp1.entity_name is NULL
THEN tmp2.entity_name
ELSE tmp1.entity_name END) as master_entity_name,
tmp3.market_name
FROM
(select cast(replace(frequency_assigned, ',','.') as decimal) AS frequency_assigned,
frequency_upper_band,
f.uls_file_number,
f.call_sign,
entity_type,
licensee_id,
entity_name
from combo_fr f INNER JOIN combo_en e
ON f.call_sign=e.call_sign
ORDER BY frequency_assigned DESC) tmp1
FULL JOIN
(select cast(replace(lower_frequency, ',','.') as decimal) AS lower_frequency,
upper_frequency,
e.uls_file_number,
mf.call_sign,
entity_type,
licensee_id,
entity_name
FROM market_mf mf INNER JOIN combo_en e
ON mf.call_sign=e.call_sign
ORDER BY lower_frequency DESC) tmp2
ON tmp1.call_sign=tmp2.call_sign
INNER JOIN
(select en.call_sign,
mk.market_name
FROM combo_mk mk
INNER JOIN combo_en en
ON mk.call_sign=en.call_sign) tmp3
ON tmp2.call_sign=tmp3.call_sign
ORDER BY master_frequency DESC;
you'll want to unwind those queries and do it all in one join, if you can. Soemthing like:
select <whatever you need>
from combo_fr f
JOIN combo_en e ON f.call_sign=e.call_sign
JOIN market_mf mf mf ON mf.call_sign=e.call_sign
JOIN combo_mk mk ON mk.call_sign=en.call_sign
I can't completely grok what you're doing, but some of the join clauses might have to become LEFT JOINs in order to deal with places where the call sign does or does not appear.
After creating indexes on call_sign for all four involved tables, try this:
WITH nodup AS (
SELECT call_sign FROM market_mf
EXCEPT SELECT call_sign FROM combo_fr
) SELECT
CAST(REPLACE(u.master_frequency_string, ',','.') AS DECIMAL)
AS master_frequency,
u.call_sign AS master_call_sign,
u.entity_type AS master_entity_type,
u.licensee_id AS master_licensee_id,
u.entity_name AS master_entity_name,
combo_mk.market_name
FROM (SELECT frequency_assigned AS master_frequency_string, call_sign,
entity_type, licensee_id, entity_name
FROM combo_fr
UNION ALL SELECT lower_frequency, call_sign,
entity_type, licensee_id, entity_name
FROM market_mf INNER JOIN nodup USING (call_sign)
) AS u
INNER JOIN combo_en USING (call_sign)
INNER JOIN combo_mk USING (call_sign)
ORDER BY 1 DESC;
I post this because this is the simplest way to understand what you need.
If there are no call_sign values which appear in both market_mf and
combo_fr, WITH nodup ... and INNER JOIN nodup ... can be omitted.
I am making the assumption that call_sign is unique in both combo_fr and market_mf ( = there are no two records in each table with the same value), even if there can be values which can appear in both tables.
It is very unfortunate that you order by a computed column, and that the computation is so silly. A certain optimization would be to convert the frequency strings once and for all in the table itself. The steps would be:
(1) add numeric frequncy columns to your tables (2) populate them with the values converted from the current text columns (3) convert new values directly into the new columns, by inputting them with a locale which has the desired decimal separator.

TSQL -- Where Statements on Multiple columns in Update

My basic question has to do with updating multiple columns at once from specified values in my query. The reason I want to do this is that I am updating my values from a ginormous table so I only want to query it once in order to reduce run time. Here is an example of an example select statement that returns the value I want for just one of the columns I need to update:
select a.Value
from Table1
left outer join
(
select ID, FilterCol1, FilterCol2, Value
from Table2
) a on a.ID = Table1.ID
where {Condition1a on FilterCol1}
and {Condition2a on FilterCol2}
In order to update multiple columns at once I would like to be able do something like this (but it returns NULL):
Update T1
set T1Value1 = (select a.Value where {Condition1a on FilterCol1}
and {Condition2a on FilterCol2)
,T1Value2 = (select a.Value where {Condition1b on FilterCol1}
and {Condition2b on FilterCol2})
from Table1 T1
left outer join
(
select ID, FilterCol1, FilterCol2, Value
from Table2
) a on a.ID = Table1.ID
Any help figuring this out would be greatly appreciated, let me know if you have any questions or if I made any errors. Thanks!
EDIT: I think I have identified the problem, but I'm not sure of a solution yet. I think seeing the issue requires a little more context: The select from table 2 is actually an unpivot on a wide table. This means that when the left outer join is applied, there will be multiple rows for a given ID. What the case statement that Earl suggested seems to be doing (and I assume this is happening with the where clause as well) is comparing my Conditions to only the first row of the columns from a. Since my conditions are meant to help determine which of the rows from a is chosen, they will always evaluate false for the first row (I know this just from what I know about the data), hence my perpetual NULL values. Does anyone know of a workaround to look at the other rows in a?
UPDATE T1
SET T1Value1 = CASE WHEN (FilterCol1 = Condition1a AND FilterCol2 = Condition2a) THEN a.Value END,
T1Value2 = CASE WHEN (FilterCol1 = Condition1b AND FilterCol2 = Condition2b) THEN a.Value END
FROM Table1 T1
left outer join
(
select ID, FilterCol1, FilterCol2, Value
) a on a.ID = Table1.ID

TSQL, join to multiple fields of which one could be NULL

I have a simple query:
SELECT * FROM Products p
LEFT JOIN SomeTable st ON st.SomeId = p.SomeId AND st.SomeOtherId = p.SomeOtherId
So far so good.
But the first join to SomeId can be NULL, In that case the check should be IS NULL, and that's where the join fails. I tried to use a CASE, but can't get that to work also.
Am I missing something simple here?
From Undocumented Query Plans: Equality Comparisons.
SELECT *
FROM Products p
LEFT JOIN SomeTable st
ON st.SomeOtherId = p.SomeOtherId
AND EXISTS (SELECT st.SomeId INTERSECT SELECT p.SomeId)