Postgresql :joining on fields using CASE expression - postgresql

I am trying to join two tables on two fields with a below condition
If condition 1 is satisfied then join ON a.field_1 = b.field_1
If condition 2 is satisfied then join ON a.field_2 = b.field_2
In order to do so, I am writing the below query
SELECT
a.field_1,a.field_2,
b.field_1,b.field_2 FROM table a
INNER JOIN table b
CASE WHEN COALESCE(TRIM(a.field_1),'') = '' THEN a.field_1 = b.field_1
ELSE a.field_2 = b.field_2 END
I am not sure whether this would run.

From the manual:
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 ON boolean_expression
A JOIN accepts an arbitrary "boolean expression", which may reference columns from both joined relations. While the trivial form is t1.col = t2.col, the boolean expression is not limited to it and so comparing two sets of columns based on some other column is totally fine.
Please note your syntax error #sticky bit pointed out.

Related

how can i combine tables on condition base in postgres functions

SELECT status
from orders (CASE WHEN true THEN 'INNER JOIN runningmenus ON orders.runningmenu_id = runningmenus.id' ELSE '' END);
I am getting error near 'CASE', How can i combine table base on condition?
You want to use a join only if a condition is true.
The trick is to always do the join but to use the parameterized condition in the join condition. Fields from both tables will always be included in the output (which is a good thing, the output is clearly defined) but the data may be set or null.
SELECT *
from orders
INNER JOIN runningmenus
ON my_parameter_condition = true AND orders.runningmenu_id = runningmenus.id
Edit : Following #404 comment, the behavior is a bit more complex. You would have to make a left join and keep rows when the parameter condition is false (== no join) or when the joined table has a matching rown (== an inner join):
SELECT *
from orders
LEFT JOIN runningmenus
ON my_parameter_condition = true AND orders.runningmenu_id = runningmenus.id
WHERE (my_parameter_condition = false OR runningmenus.id IS NOT NULL)

convert SQL join query to pyspark syntax

I'm working to convert a known working SQL query to work in pyspark, given two dataframes, using methods such as: .join, .where, filter, etc.
Here are examples of SQL queries that work (only selecting r.id where I will normally select more columns):
# "invalid" records, where there is a matching `record_id` for rv_df
SELECT DISTINCT(r.id) FROM core_record AS r LEFT OUTER JOIN core_recordvalidation rv ON r.id = rv.record_id WHERE r.job_id = 41 AND rv.record_id is not null;
# "valid" records, where there is no matching `record_id` for rv_df
SELECT DISTINCT(r.id) FROM core_record AS r LEFT OUTER JOIN core_recordvalidation rv ON r.id = rv.record_id WHERE r.job_id = 41 AND rv.record_id is not null;
I'm 80/20 close, but having trouble wrapping my head around the the last few steps, and/or how to do this most efficiently.
I've got a Dataframe r_df with column id that I'd like to join with Dataframe rv_df on column record_id. As output, I'd like only distinct r.id, and only columns from r_df, none from rv_df. Finally, I'd like two different calls where there is a match (what will be "invalid" records for me), and where there is not a match (what I consider "valid" records).
I have pyspark queries that get close, but not terribly clear on how to ensure that r_df.id is distinct, and select only columns from r_df, none from rv_df.
Any help would be much appreciated!
Just had to walk away for a couple hours. Found a solution that works for my use case.
First, selecting only distinct record_id from rv_df:
rv_df = rv_df.select('record_id').distinct()
Then use that for intersection and disjoints:
# Intersection:
j_df = r_df.join(rv_df, r_df.id == rv_df.record_id, 'leftsemi').select(r_df['*'])
# Disjoint:
j_df = r_df.join(rv_df, r_df.id == rv_df.record_id, 'leftanti').select(r_df['*'])

T-SQL: Joining on two separate fields based on specific criteria in a query

I have a query in which I am trying to get additional fields from another table through a join field that I manually create. The issue is when the field I create is null, then I want to use another field to join on. I am not sure how to do that without getting duplicate results. I tried a UNION query, but that just displays everything where the values are null when the manually created field value is null. Here is the query:
SELECT
BU = m.BU,
BUFBA = m.BUFBA,
a.CostCenter,
Delegate = m.Delegate,
a.DistrictLookup,
PCOwner = m.PCOwner,
a.PGr,
a.POrg,
PrimaryContact = m.PrimaryContact,
WarehouseManager = m.WarehouseManager,
Zone = m.Zone,
ZoneFBA = m.ZoneFBA
FROM
(SELECT
e.CostCenter,
e.District,
DistrictLookup =
CASE
WHEN e.PGr IN ('N01','BQE','BQA') THEN 'GSS'
WHEN e.PGr = 'BQB' THEN 'BG'
WHEN e.PGr = 'BQF' THEN 'FP'
ELSE e.District
END,
e.PGr,
e.POrg
FROM dbo.E1P e (NOLOCK)
WHERE
e.CoCd = '4433'
) a
LEFT JOIN dbo.Mapping m (NOLOCK) ON m.District = a.DistrictLookup
When the DistrictLookup field is NULL, I need a different join to occur so that the additional fields populate. That join would be:
LEFT JOIN dbo.Mapping m (NOLOCK) ON m.CostCenter = a.CostCenter
How can I write in this second join and not get duplicate results? This is a separate join on different fields and I think it differs from the other methods of doing a conditional join. If it, can someone please explain how to implement that logic into my query?
I believe this is what you are after...
LEFT JOIN dbo.Mapping m (NOLOCK)
ON (a.DistrictLookup IS NOT NULL AND m.District = a.DistrictLookup)
OR (a.DistrictLookup IS NULL AND m.CostCenter = a.CostCenter)

What does a column assignment using an aggregate in the columns area of a select do?

I'm trying to decipher another programmer's code who is long-gone, and I came across a select statement in a stored procedure that looks like this (simplified) example:
SELECT #Table2.Col1, Table2.Col2, Table2.Col3, MysteryColumn = CASE WHEN y.Col3 IS NOT NULL THEN #Table2.MysteryColumn - y.Col3 ELSE #Table2.MysteryColumn END
INTO #Table1
FROM #Table2
LEFT OUTER JOIN (
SELECT Table3.Col1, Table3.Col2, Col3 = SUM(#Table3.Col3)
FROM Table3
INNER JOIN #Table4 ON Table4.Col1 = Table3.Col1 AND Table4.Col2 = Table3.Col2
GROUP BY Table3.Col1, Table3.Col2
) AS y ON #Table2.Col1 = y.Col1 AND #Table2.Col2 = y.Col2
WHERE #Table2.Col2 < #EnteredValue
My question, what does the fourth column of the primary selection do? does it produce a boolean value checking to see if the values are equal? or does it set the #Table2.MysteryColumn equal to some value and then inserts it into #Table1? Or does it just update the #Table2.MysteryColumn and not output a value into #Table1?
This same thing seems to happen inside of the sub-query on the third column, and I am equally at a loss as to what that does as well.
MysteryColumn = gives the expression a name also called a column alias. The fact that a column in the table#2 also has the same name is besides the point.
Since it uses INTO syntax it also gives the column its name in the resulting temporary table. See the SELECT CLAUSE and note | column_alias = expression and the INTO CLAUSE

Difference in query joins

I have a query which is dynamically generated.
SELECT '' + CAST(GalleryGallery_tGallery._Name AS VARCHAR(4000)) + '' AS NewName
FROM Photographers_tGalleries
LEFT OUTER JOIN Gallery_tGallery AS GalleryGallery_tGallery
ON BaseContent_tGalleries.[Gallery] = GalleryGallery_tGallery._Guid
LEFT OUTER JOIN BaseContent_tGalleries
ON Photographers_tGalleries._Guid =
BaseContent_tGalleries._Guid_Structure_Content
The joins appear correct to me. However, the query errors with The multi-part identifier "BaseContent_tGalleries.Gallery" could not be bound.
The following query does work. While the joins are matching the correct fields, they are in a different order. I am wondering why this one works and the other does not. We would like to fix the top one but since it is dynamic, I am looking for the least amount of change.
SELECT '' + CAST(GalleryGallery_tGallery._Name AS VARCHAR(4000)) + '' AS NewName
FROM Gallery_tGallery AS GalleryGallery_tGallery
LEFT OUTER JOIN BaseContent_tGalleries
ON GalleryGallery_tGallery._Guid = BaseContent_tGalleries.Gallery
LEFT OUTER JOIN Photographers_tGalleries
ON BaseContent_tGalleries._Guid_Structure_Content =
Photographers_tGalleries._Guid
Your join ordering for the first query is wrong. You need to reference BaseContent_tGalleries before Gallery_tGallery.
SELECT '' + CAST(g._Name AS VARCHAR(4000)) + '' AS NewName
FROM Photographers_tGalleries AS g
LEFT OUTER JOIN BaseContent_tGalleries AS b
ON g._Guid = b._Guid_Structure_Content
LEFT OUTER JOIN Gallery_tGallery AS gg
ON b.[Gallery] = gg._Guid;
Who named your tables and aliases by the way? GalleryGallery_tGallery, really? I've converted to shorter aliases to compensate for whoever really likes typing. A LOT.
The first query doesn't work since you're trying to use the table BaseContent_tGalleries in an ON statement, but it has not been joined yet. In other words, you're using a table as a join condition, but the table itself hasn't been joined yet.