Left Join with multiple criteria which are partly empty or null - left-join

I have a table with tracking data. Among other values the table has the columns traffic_medium, traffic_source and traffic_campaign. The columns do contain sometimes (none) or null as value.
I would like to match the sum of visitors from an other table using a left join with medium, scource and campaign as matching criteria.
This works fine if all columns contain data. It does not work, if one column has (none) or null as value.
I use BigQuery and legacy SQL.
SELECT
A.id,
A.trafficSource_medium,
A.trafficSource_source,
A.trafficSource_campaign,
B.sum_visitor AS sum_visitor
FROM [table] AS A
left outer join (Select
count(distinct fullvisitorID) as sum_visitor,
trafficSource_medium,
trafficSource_source,
trafficSource_campaign
FROM [table2]
GROUP BY trafficSource_medium,
trafficSource_source,
trafficSource_campaign)
AS B
on A.trafficSource_medium=B.trafficSource_medium AND
A.trafficSource_source=B.trafficSource_source AND
A.trafficSource_campaign=B.trafficSource_campaign
Thanks for your help!

Try something like below
Assuming respective fields are of STRING type. If they are INT - replace 'n/a' with let's say -999 - important to choose constant that is not used as a value for respective field
#legacySQL
SELECT
A.id,
CASE WHEN A.trafficSource_medium = 'n/a' THEN NULL ELSE A.trafficSource_medium END AS trafficSource_medium,
CASE WHEN A.trafficSource_source = 'n/a' THEN NULL ELSE A.trafficSource_source END AS trafficSource_source,
CASE WHEN A.trafficSource_campaign = 'n/a' THEN NULL ELSE A.trafficSource_campaign END AS trafficSource_campaign,
B.sum_visitor AS sum_visitor
FROM (
SELECT
id,
IFNULL(trafficSource_medium, 'n/a') AS trafficSource_medium,
IFNULL(trafficSource_source, 'n/a') AS trafficSource_source,
IFNULL(trafficSource_campaign 'n/a') AS trafficSource_campaign
FROM [table]
) AS A
LEFT OUTER JOIN (
SELECT
COUNT(DISTINCT fullvisitorID) AS sum_visitor,
IFNULL(trafficSource_medium, 'n/a') AS trafficSource_medium,
IFNULL(trafficSource_source, 'n/a') AS trafficSource_source,
IFNULL(trafficSource_campaign 'n/a') AS trafficSource_campaign
FROM [table2]
GROUP BY
trafficSource_medium,
trafficSource_source,
trafficSource_campaign
) AS B
ON A.trafficSource_medium = B.trafficSource_medium
AND A.trafficSource_source = B.trafficSource_source
AND A.trafficSource_campaign = B.trafficSource_campaign
Idea here is to "transform" NULLs to some value so they are JOIN'able - and then "transform" it back to NULL in final SELECT
If you can migrate to Standard SQL - you can try below instead - it is less changes to do - just mostly in ON clause
#standardSQL
SELECT
A.id,
A.trafficSource_medium,
A.trafficSource_source,
A.trafficSource_campaign,
B.sum_visitor AS sum_visitor
FROM `table` AS A
LEFT OUTER JOIN (
SELECT
COUNT(DISTINCT fullvisitorID) AS sum_visitor,
trafficSource_medium,
trafficSource_source,
trafficSource_campaign
FROM `table2`
GROUP BY
trafficSource_medium,
trafficSource_source,
trafficSource_campaign
) AS B
ON IFNULL(A.trafficSource_medium, 'n/a') = IFNULL(B.trafficSource_medium, 'n/a')
AND IFNULL(A.trafficSource_source, 'n/a') = IFNULL(B.trafficSource_source, 'n/a')
AND IFNULL(A.trafficSource_campaign, 'n/a') = IFNULL(B.trafficSource_campaign, 'n/a')

Related

Strange Behaviour on Postgresql query

We created a view in Postgres and I am getting strange result.
View Name: event_puchase_product_overview
When I try to get records with *, I get the correct result. but when I try to get specific fields, I get wrong values.
I hope the screens attached here can explain the problem well.
select *
from event_purchase_product_overview
where id = 15065;
select id, departure_id
from event_puchase_product_overview
where id = 15065;
VIEW definition:
CREATE OR REPLACE VIEW public.event_puchase_product_overview AS
SELECT row_number() OVER () AS id,
e.id AS departure_id,
e.type AS event_type,
e.name,
p.id AS product_id,
pc.name AS product_type,
product_date.attribute AS option,
p.upcomming_date AS supply_date,
pr.date_end AS bid_deadline,
CASE
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_hotel'::text) tt)) THEN e.maximum_rooms
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_flight'::text) tt)) THEN e.maximum_seats
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_bike'::text) tt)) THEN e.maximum_bikes
ELSE e.maximum_seats
END AS departure_qty,
CASE
WHEN now()::date > pr.date_end AND po.state::text = 'draft'::text THEN true
ELSE false
END AS is_deadline,
pl.product_qty::integer AS purchased_qty,
pl.comments,
pl.price_unit AS unit_price,
rp.id AS supplier,
po.id AS po_ref,
po.state AS po_state,
po.date_order AS po_date,
po.user_id AS operator,
pl.po_state_line AS line_status
FROM event_event e
LEFT JOIN product_product p ON p.related_departure = e.id
LEFT JOIN product_template pt ON pt.id = p.product_tmpl_id
LEFT JOIN product_category pc ON pc.id = pt.categ_id
LEFT JOIN purchase_order_line pl ON pl.product_id = p.id
LEFT JOIN purchase_order po ON po.id = pl.order_id
LEFT JOIN purchase_order_purchase_requisition_rel prr ON prr.purchase_order_id = po.id
LEFT JOIN purchase_requisition pr ON pr.id = prr.purchase_requisition_id
LEFT JOIN res_partner rp ON rp.id = po.partner_id
LEFT JOIN ( SELECT p_1.id AS product_id,
pav.name AS attribute
FROM product_product p_1
LEFT JOIN product_attribute_value_product_product_rel pa ON pa.prod_id = p_1.id
LEFT JOIN product_attribute_value pav ON pav.id = pa.att_id
LEFT JOIN product_attribute pat ON pat.id = pav.attribute_id
WHERE pat.name::text <> ALL (ARRAY['Date'::character varying, 'Departure'::character varying]::text[])) product_date ON product_date.product_id = p.id
WHERE (p.id IN ( SELECT DISTINCT mrp_bom_line.product_id
FROM mrp_bom_line)) AND p.active
ORDER BY e.id, pt.categ_id, p.id;
If I add new event_event or new product_product I'll get a new definition of row_number in my view, then the column ID of my view is not stable.
at least you can't use row_number as Id of the view,
If you insist to use row_number, you can use the Order By "creation DATE" by this way all new records will be as last lines in the view and this will not change the correspondency between ID (row_number) and other columns.
Hope that helps !
Very likely the execution plan of your query depends on the columns you select. Compare the execution plans!
Your id is generated using the row_number window function. Now window functions are executed before the ORDER BY clause, so the order will depend on the execution plan and hence on the columns you select.
Using row_number without an explicit ordering doesn't make any sense.
To fix that, don't use
row_number() OVER ()
but
row_number() OVER (ORDER BY e.id, pt.categ_id, p.id)
so that you have a reliable ordering.
In addition, you should omit the ORDER BY clause at the end.

T-SQL check to see if date in one table is between two dates in another table then set value

I have two tables shown below. I want to create a new variable (VALUE) based on the logic below and show results in a 3rd table? How can I do this in T SQL?
TABLE_1
ID, DATE
TABLE_2
ID, DATE1, DATE2
Logic to set VALUE:
FOR ALL TABLE_1.ID
IF TABLE_1.DATE IS BETWEEN TABLE_2.DATE1 AND TABLE_2.DATE2
THEN VALUE = 1
ELSE VALUE = 0
IF TABLE_1.ID NOT IN TABLE_2
THEN VALUE = NULL
If you want to see the results for all rows where table_1.id = table_2.id (and table_1 rows that do not have a match on id), then we can use a left join and a case expression:
select
t.id
, t.date
, IsBetween = case
when t.date between t2.Date1 and t2.Date2
then 1
when t2.id is null
then null
else 0
end
, t2.*
from table_1 as t
left join table_2 as t2
on t.id = t2.id
If you only want one row for each row in table_1, and want to know if table_1.data is between any corresponding row in table_2 or not, then we can use a outer apply to select top 1 and a case expression:
select
t.id
, t.date
, IsBetween = case
when t.date between x.Date1 and x.Date2
then 1
when x.id is null
then null
else 0
end
from table_1 t
outer apply (
select top 1 t2.*
from table_2 t2
order by case
when t.date between t2.Date1 and t2.Date2
then 0
else 1
end
) as x

Avoiding Order By in T-SQL

Below sample query is a part of my main query. I found SORT operator in below query is consuming 30% of the cost.
To avoid SORT, there is need of creation of Indexes. Is there any other way to optimize this code.
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA
WHERE ID = r.ID
AND Status = 3
AND TableA_ID >ISNULL((
SELECT TOP 1 TableA_ID
FROM TableA
WHERE ID = r.ID
AND Status <> 3
ORDER BY T_Date DESC
), 0)
ORDER BY T_Date ASC
Looks like you can use not exists rather than the sorts. I think you'll probably get a better performance boost by use a CTE or derived table instead of the a scalar subquery.
select *
from r ... left outer join
(
select ID, min(t_date) as min_date from TableA t1
where status = 3 and not exists (
select 1 from TableA t2
where t2.ID = t1.ID
and t2.status <> 3 and t2.t_date > t1.t_date
)
group by ID
) as md on md.ID = r.ID ...
or
select *
from r ... left outer join
(
select t1.ID, min(t1.t_date) as min_date
from TableA t1 left outer join TableA t2
on t2.ID = t1.ID and t2.status <> 3
where t1.status = 3 and t1.t_date < t2.t_date
group by t1.ID
having count(t2.ID) = 0
) as md on md.ID = r.ID ...
It also appears that you're relying on an identity column but it's not clear what those values mean. I'm basically ignoring it and using the date column instead.
Try this:
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA a1
LEFT JOIN (
SELECT ID, MAX(TableA_ID) AS MaxAID
FROM TableA
WHERE Status <> 3
GROUP BY ID
) a2 ON a2.ID = a1.ID AND a1.TableA_ID > coalesce(a2.MAXAID,0)
WHERE a1.ID = r.ID AND a1.Status = 3
ORDER BY T_Date ASC
The use of TOP 1 in combination with the unexplained r alias concern me. There's almost certainly a MUCH better way to get this data into your results that doesn't involve doing this in a sub query (unless this is for an APPLY operation).

Error "Invalid column name" in CTE

I'm having an issue using a column alias for a join in a cte. Invalid column name on the line with RowNumber2 >= (t1.RowNumber - 20) Anyone have a suggestion? Thanks..
DECLARE #latestDate Date = dbo.LatestDateWithPricingVolCountOver4k()
;WITH AllSymbsAndDates AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY Symbol ORDER BY TradingDate) AS RowNumber,
Symbol, TradingDate
FROM tblSymbolsMain
CROSS JOIN tblTradingDays
WHERE TradingDate <= #latestDate
),
SymbsDatesGrouped AS
(
SELECT * FROM
(
SELECT
t1.Symbol, t1.TradingDate, t2.TradingDate AS TradingDate2, t1.RowNumber,
t2.RowNumber AS RowNumber2
FROM AllSymbsAndDates t1
JOIN AllSymbsAndDates t2 ON t1.Symbol = t2.Symbol
AND RowNumber2 >= (t1.RowNumber - 20)
) t
)
SELECT
Symbol, TradingDate, TradingDate2, RowNumber, RowNumber2
FROM
SymbsDatesGrouped
ORDER BY
Symbol, TradingDate, TradingDate2
You can't reference a column alias in the WHERE or JOIN clauses - actually the only clause where you can reference an alias from the SELECT list is either in the ORDER BY (or in an outer scope, e.g. selecting from a subquery or CTE).
In this case, the solution is pretty trivial. Why not just say:
AND t2.RowNumber >= (t1.RowNumber - 20)
?

Get current record for a subquery in T-SQL

I'm trying to select all records from a table "Table1" but I want a new column called "HasException" that contains a "0" or a "1". "HasException" must be "0" if the count of row matching the current Id from "Table2" is equal to 0, else it returns 1.
Here's what I've done so far, but it doesn't works:
SELECT *,
CONVERT(bit, (CASE WHEN (SELECT count(Id) FROM Table2 WHERE Table1.Id=Table2.Id) = 0 THEN 0 ELSE 1 END)) AS HasException
FROM Table1
You want to join the tables (and group on ID) before you can compare the two values like this:
SELECT dbo.Table_1.*,
CASE WHEN COUNT(dbo.Table_2.ID) = 0 THEN
0
ELSE
1
END
AS HasException
FROM dbo.Table_1 LEFT OUTER JOIN
dbo.Table_2 ON dbo.Table_1.ID = dbo.Table_2.ID
GROUP BY dbo.Table_1.ID
perhaps something like, assuming you meant table2?
SELECT *,
CAST(CASE WHEN COUNT(table2.id) = 0 THEN 0 ELSE 1 END AS bit) AS HasException
FROM
Table1
LEFT JOIN
Table2 ON Table1.Id=Table2.Id
GROUP BY
Table1.id
select
T1.*,
case when T2.Id is null then 0 else 1 end as HasException
from Table1 as T1
left outer join
(
select distinct Id
from Table2
) as T2
on T1.Id = T2.Id