Avoid duplication in SQL Server - select

I got the below result when i run this query.
SELECT DISTINCT PT.F_PRO AS F_PRODUCT, PT.F_TEXT_CODE AS F_TEXT_CODE, PHT.F_PHRASE AS F_PHRASE FROM T_PROD_TEXT PT
LEFT JOIN T_P_LINKAGE PHL
ON PT.F_TEXT_CODE = PHL.F_TEXT_CODE
INNER JOIN T_P_TRANSLATIONS PHT
ON PHL.F_PHRASE_ID = PHT.F_PHRASE_ID
WHERE PT.F_DATA_CODE = 'MANU' AND PHT.F_LANGUAGE = 'EN'
OUTPUT
F_PRODUCT F_TEXT_CODE F_PHRASE
294264_B MANU0008 Alcoa, Inc
294264_B MANU0012 BioSensory
00091A MANU0006 3M Company
00094A MANU0006 4M Company
00094A MANU0006 5M Company
The above query returns duplication in F_PRODUCT COLUMN.i want to display F_product without duplication. only one record should display for each F_product.(First record) without using top command
Required Output
F_PRODUCT F_TEXT_CODE F_PHRASE
294264_B MANU0008 Alcoa, Inc.
00091A MANU0006 3M Company|par

You can use row_number() to assign a number to each row within a group of f_pro. Then retrieve only rows that are number 1. You can change the order by if something else determines the order.
SELECT *
FROM
(SELECT PT.F_PRO AS F_PRODUCT, PT.F_TEXT_CODE AS F_TEXT_CODE, PHT.F_PHRASE AS F_PHRASE, ROW_NUMBER() OVER (PARTITION BY PT.F_PRO ORDER BY PHT.F_PHRASE ASC) AS RowNum
FROM T_PROD_TEXT PT
LEFT JOIN T_P_LINKAGE PHL
ON PT.F_TEXT_CODE = PHL.F_TEXT_CODE
INNER JOIN T_P_TRANSLATIONS PHT
ON PHL.F_PHRASE_ID = PHT.F_PHRASE_ID
WHERE PT.F_DATA_CODE = 'MANU' AND PHT.F_LANGUAGE = 'EN') dt
WHERE RowNum = 1

SELECT PT.F_PRO AS F_PRODUCT,
MIN(PT.F_TEXT_CODE) AS F_TEXT_CODE,
MIN(PHT.F_PHRASE) AS F_PHRASE FROM T_PROD_TEXT PT
LEFT JOIN T_P_LINKAGE PHL
ON PT.F_TEXT_CODE = PHL.F_TEXT_CODE
INNER JOIN T_P_TRANSLATIONS PHT
ON PHL.F_PHRASE_ID = PHT.F_PHRASE_ID
WHERE PT.F_DATA_CODE = 'MANU' AND PHT.F_LANGUAGE = 'EN'
group By PT.F_PRO;
is one way to do that. It doesn't do it for the "FIRST" since it is vague how would you define the "FIRST".

Related

How can I use value from main select in Left Outer Join select - T-SQL

In my below T-SQL Query I need to use EFP_MessageCenter.MessageSender from my main SELECT in the SELECT in my LEFT OUTER JOIN as the value where I have placed <MessageSenderInitials>.
When I set (EFP_MessageCenter_1.MessageSender = EFP_MessageCenter.MessageSender) or (EFP_MessageCenter_1.MessageSender = MessageSenderInitials) I get the error The multi-part identifier "EFP_MessageCenter.MessageSender" could not be bound.
How can I get this to work?
SELECT LOWER(EFP_MessageCenter.MessageSender) AS MessageSenderInitials
, MAX(SenderInfo.FullName) AS SenderFullName
, MAX(SenderInfo.ProfilePicture) AS SenderProfilePicture
, MAX(EFP_MessageCenter_Receiver.UserID) AS ReceiverID
, MAX(EFP_MessageCenter.MessageTimestamp) AS ChangeDate
, COUNT(DisplayCountSelect.Displayed) AS CountNonReadMessages
FROM EFP_MessageCenter_Receiver
INNER JOIN EFP_MessageCenter ON EFP_MessageCenter_Receiver.MessageID = EFP_MessageCenter.id
INNER JOIN EFP_EmploymentUser AS SenderInfo ON EFP_MessageCenter.MessageSender = SenderInfo.Initials
LEFT OUTER JOIN
(SELECT EFP_MessageCenter_Receiver_1.Displayed, EFP_MessageCenter_Receiver_1.UserID, EFP_MessageCenter_1.MessageSender
FROM EFP_MessageCenter AS EFP_MessageCenter_1
INNER JOIN EFP_MessageCenter_Receiver AS EFP_MessageCenter_Receiver_1 ON EFP_MessageCenter_1.id = EFP_MessageCenter_Receiver_1.MessageID
WHERE (EFP_MessageCenter_Receiver_1.Displayed = 0) AND (EFP_MessageCenter_Receiver_1.UserID = 65) AND (EFP_MessageCenter_1.MessageSender = '<MessageSenderInitials>'))
AS DisplayCountSelect
ON DisplayCountSelect.UserID = EFP_MessageCenter_Receiver.UserID
WHERE (EFP_MessageCenter_Receiver.UserID = 65) AND (EFP_MessageCenter.MessageType = 'SPECIFIC')
GROUP BY EFP_MessageCenter.MessageSender
ORDER BY ChangeDate DESC
I've made a slight refactor of your query and changed the outer join to an an outer apply
It's not going to be 100% working I'm sure but should allow you to tweak it and include the correlation you need to.
I suspect you could move the CountNonReadMessages to a count(*) in the apply and possibly remove the aggregation, but that's just a guess.
select Lower(mc.MessageSender) as MessageSenderInitials
, Max(s.FullName) as SenderFullName
, Max(s.ProfilePicture) as SenderProfilePicture
, Max(mr.UserID) as ReceiverID
, Max(mc.MessageTimestamp) as ChangeDate
, Count(s.Displayed) as CountNonReadMessages
from EFP_MessageCenter_Receiver mr
join EFP_MessageCenter mc on mr.MessageID = mc.id
join EFP_EmploymentUser eu on mc.MessageSender = eu.Initials
outer apply (
select mr.Displayed
from EFP_MessageCenter mcx
join EFP_MessageCenter_Receiver mrx on mcx.id = mrx.MessageID
where mrx.Displayed = 0
and mrx.UserId=mr.UserId
and mcx.UserID = 65 /* this should probably be correlated */
and mcx.MessageSender = '<MessageSenderInitials>'
) s
where mr.UserID = 65 and mc.MessageType = 'SPECIFIC'
group by mc.MessageSender
order by ChangeDate desc

Is it possible to do a "LIMIT 1" on a left join in Postgres?

I have two tables: one for money and attributes surrounding it (e.g. who earnt it) and a child table for the "ledger" - this contains one or more entries that represent the history of money that has moved.
SELECT SUM(pl.achieved)
FROM payout p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id
This query works well when there is only one ledger item, but when more are added the SUM will increase. I want to join only the latest row. So hypothetically:
SELECT SUM(pl.achieved)
FROM payout p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id ORDER BY pl.ts DESC LIMIT 1
WHERE ...
ORDER BY ...
LIMIT ...
(which sadly doesn't work)
What I have tried:
Using a subquery works, but is painfully slow given the size of the data set (and other omitted properties and where clauses etc.):
SELECT SUM(pl.achieved)
FROM payout p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id AND pl.id = (SELECT id FROM payout_ledgers WHERE payout_id = p.id ORDER BY ts DESC LIMIT 1)
Incidentally, I'm unsure why this subquery is so slow (~12 seconds, as opposed to 150ms with no subquery). I would have expected it to be quicker given that we're only selecting based on the foreign key (payout_id).
Another thing I tried was to do a select from the join - my logic being that if we select from small joined dataset instead of the whole table it would be quicker. However I was met with relation "pl" does not exist error:
SELECT SUM(pl.achieved)
FROM payouts p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id
WHERE pl.id = (SELECT id FROM pl ORDER BY ts DESC LIMIT 1)
Thank you in advance for any suggestions. I am also open to suggestions for schema changes that could make this type of logic easier, although my preference would be to try and get the query working since the schema is not easy to change on our production environment.
If you're on Postgres 9.4+, you can use a LEFT JOIN LATERAL (docs)
SELECT SUM(sub.achieved)
FROM payout p
LEFT JOIN LATERAL (SELECT achieved
FROM payout_ledgers pl
WHERE pl.payout_id = p.id
ORDER BY pl.ts DESC LIMIT 1) sub ON true
This will return the sum of the "achieved" field in the most recent entry in payout_ledgers for all payouts.
window functions:
-- using row_number()
SELECT SUM(sss.achieved)
FROM (SELECT pl.achieved
, row_number() OVER (PARTITION BY pl.payout_id, ORDER BY pl.ts DESC)
FROM payouts p
JOIN payout_ledgers pl ON pl.payout_id = p.id
) sss
WHERE sss.rn =1
;
-- using last_value()
SELECT SUM(sss.achieved)
FROM (SELECT
, last_value(achieved) OVER (PARTITION BY pl.payout_id, ORDER BY pl.ts ASC) AS achieved
FROM payouts p
JOIN payout_ledgers pl ON pl.payout_id = p.id
) sss
;
BTW: you do not need the LEFT JOIN (adding no value to the SUM does not change the sum)

Strange Behaviour on Postgresql query

We created a view in Postgres and I am getting strange result.
View Name: event_puchase_product_overview
When I try to get records with *, I get the correct result. but when I try to get specific fields, I get wrong values.
I hope the screens attached here can explain the problem well.
select *
from event_purchase_product_overview
where id = 15065;
select id, departure_id
from event_puchase_product_overview
where id = 15065;
VIEW definition:
CREATE OR REPLACE VIEW public.event_puchase_product_overview AS
SELECT row_number() OVER () AS id,
e.id AS departure_id,
e.type AS event_type,
e.name,
p.id AS product_id,
pc.name AS product_type,
product_date.attribute AS option,
p.upcomming_date AS supply_date,
pr.date_end AS bid_deadline,
CASE
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_hotel'::text) tt)) THEN e.maximum_rooms
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_flight'::text) tt)) THEN e.maximum_seats
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_bike'::text) tt)) THEN e.maximum_bikes
ELSE e.maximum_seats
END AS departure_qty,
CASE
WHEN now()::date > pr.date_end AND po.state::text = 'draft'::text THEN true
ELSE false
END AS is_deadline,
pl.product_qty::integer AS purchased_qty,
pl.comments,
pl.price_unit AS unit_price,
rp.id AS supplier,
po.id AS po_ref,
po.state AS po_state,
po.date_order AS po_date,
po.user_id AS operator,
pl.po_state_line AS line_status
FROM event_event e
LEFT JOIN product_product p ON p.related_departure = e.id
LEFT JOIN product_template pt ON pt.id = p.product_tmpl_id
LEFT JOIN product_category pc ON pc.id = pt.categ_id
LEFT JOIN purchase_order_line pl ON pl.product_id = p.id
LEFT JOIN purchase_order po ON po.id = pl.order_id
LEFT JOIN purchase_order_purchase_requisition_rel prr ON prr.purchase_order_id = po.id
LEFT JOIN purchase_requisition pr ON pr.id = prr.purchase_requisition_id
LEFT JOIN res_partner rp ON rp.id = po.partner_id
LEFT JOIN ( SELECT p_1.id AS product_id,
pav.name AS attribute
FROM product_product p_1
LEFT JOIN product_attribute_value_product_product_rel pa ON pa.prod_id = p_1.id
LEFT JOIN product_attribute_value pav ON pav.id = pa.att_id
LEFT JOIN product_attribute pat ON pat.id = pav.attribute_id
WHERE pat.name::text <> ALL (ARRAY['Date'::character varying, 'Departure'::character varying]::text[])) product_date ON product_date.product_id = p.id
WHERE (p.id IN ( SELECT DISTINCT mrp_bom_line.product_id
FROM mrp_bom_line)) AND p.active
ORDER BY e.id, pt.categ_id, p.id;
If I add new event_event or new product_product I'll get a new definition of row_number in my view, then the column ID of my view is not stable.
at least you can't use row_number as Id of the view,
If you insist to use row_number, you can use the Order By "creation DATE" by this way all new records will be as last lines in the view and this will not change the correspondency between ID (row_number) and other columns.
Hope that helps !
Very likely the execution plan of your query depends on the columns you select. Compare the execution plans!
Your id is generated using the row_number window function. Now window functions are executed before the ORDER BY clause, so the order will depend on the execution plan and hence on the columns you select.
Using row_number without an explicit ordering doesn't make any sense.
To fix that, don't use
row_number() OVER ()
but
row_number() OVER (ORDER BY e.id, pt.categ_id, p.id)
so that you have a reliable ordering.
In addition, you should omit the ORDER BY clause at the end.

How do I efficiently select the most recent record for a set of values TSQL?

I have a set of tables each containing related data and I need to select the most recent set of records for each row in the source table. There are millions of rows and I need to do this efficiently and so far im unable to return only the most recent date for a given number.
For example the current result for a given number is:
CampaignName MobileNumber Date
Campaign A 12345678910 12/02/2018 14:50:30
Campaign B 12345678910 05/02/2018 11:35:22
Only the row for Campaign A should be returned.
I'm essentially trying to get the most recent message sent for each mobile number and the campaign data for that message (each message is part of a campaign.
SELECT CC.campaignname,
Co.mobilenumber,
Max(M.msgcreatetime)
FROM [Database].[dbo].[messages] M WITH(nolock)
INNER JOIN dbo.messagecontact MC WITH(nolock)
ON M.msgid = MC.messageid
INNER JOIN dbo.campaigncontact Co WITH(nolock)
ON Co.contactid = MC.contactid
INNER JOIN dbo.campaign CC WITH(nolock)
ON M.campaignid = CC.campaignid
GROUP BY CC.campaignname,
Co.mobilenumber
Use top 1 with ties and order by row_number:
Using top 1 with ties means you will get all the records where the value of the order by expression is the lowest.
Using row_number() over(partition by Co.mobilenumber order by M.msgcreatetime desc) will return 1 for the last date for each Co.mobilenumber, 2 for the second from last etc'.
SELECT TOP 1 WITH TIES
CC.campaignname,
Co.mobilenumber,
M.msgcreatetime
FROM [Database].[dbo].[messages] M WITH(nolock)
INNER JOIN dbo.messagecontact MC WITH(nolock)
ON M.msgid = MC.messageid
INNER JOIN dbo.campaigncontact Co WITH(nolock)
ON Co.contactid = MC.contactid
INNER JOIN dbo.campaign CC WITH(nolock)
ON M.campaignid = CC.campaignid
ORDER BY ROW_NUMBER() OVER(PARTITION BY Co.mobilenumber ORDER BY M.msgcreatetime desc)

T-SQL Subquery on Latest Date with InnerJoin

I want to do a similiar thing like this guy:
T-SQL Subquery Max(Date) and Joins
I have to do this with an n:m relation.
So the layout is:
tbl_Opportunity
tbl_Opportunity_tbl_OpportunityData
tbl_OpportunityData
So as you see there is an intersection table which connects opportunity with opportunitydata.
For every opportunity there are multiple opportunity datas. In my view i only want a list with all opportunites and the data from the latest opportunity datas.
I tried something like this:
SELECT
dbo.tbl_Opportunity.Id, dbo.tbl_Opportunity.Subject,
dbo.tbl_User.UserName AS Responsible, dbo.tbl_Contact.Name AS Customer,
dbo.tbl_Opportunity.CreationDate, dbo.tbl_Opportunity.ActionDate AS [Planned Closure],
dbo.tbl_OpportunityData.Volume,
dbo.tbl_OpportunityData.ChangeDate, dbo.tbl_OpportunityData.Chance
FROM
dbo.tbl_Opportunity
INNER JOIN
dbo.tbl_User ON dbo.tbl_Opportunity.Creator = dbo.tbl_User.Id
INNER JOIN
dbo.tbl_Contact ON dbo.tbl_Opportunity.Customer = dbo.tbl_Contact.Id
INNER JOIN
dbo.tbl_Opprtnty_tbl_OpprtnityData ON dbo.tbl_Opportunity.Id = dbo.tbl_Opprtnty_tbl_OpprtnityData.Id
INNER JOIN
dbo.tbl_OpportunityData ON dbo.tbl_Opprtnty_tbl_OpprtnityData.Id2 = dbo.tbl_OpportunityData.Id
The problem is my view now includes a row for every opportunity data, since I don't know how to filter that I only want the latest data.
Can you help me? is my problem description clear enough?
thank you in advance :-)
best wishes,
laurin
; WITH Base AS (
SELECT dbo.tbl_Opportunity.Id, dbo.tbl_Opportunity.Subject, dbo.tbl_User.UserName AS Responsible, dbo.tbl_Contact.Name AS Customer,
dbo.tbl_Opportunity.CreationDate, dbo.tbl_Opportunity.ActionDate AS [Planned Closure], dbo.tbl_OpportunityData.Volume,
dbo.tbl_OpportunityData.ChangeDate, dbo.tbl_OpportunityData.Chance
FROM dbo.tbl_Opportunity INNER JOIN
dbo.tbl_User ON dbo.tbl_Opportunity.Creator = dbo.tbl_User.Id INNER JOIN
dbo.tbl_Contact ON dbo.tbl_Opportunity.Customer = dbo.tbl_Contact.Id INNER JOIN
dbo.tbl_Opprtnty_tbl_OpprtnityData ON dbo.tbl_Opportunity.Id = dbo.tbl_Opprtnty_tbl_OpprtnityData.Id INNER JOIN
dbo.tbl_OpportunityData ON dbo.tbl_Opprtnty_tbl_OpprtnityData.Id2 = dbo.tbl_OpportunityData.Id
)
, OrderedByDate AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY ChangeDate DESC) RN FROM Base
)
SELECT * FROM OrderedByDate WHERE RN = 1
To make it more readable I'm using CTE (the WITH part). In the end the real "trick" is doing a ROW_NUMBER() partitioning the data by tbl_Opportunity.Id and ordering the partitions by ChangeDate DESC (and I call it RN). Clearly the maximum date in each partition will be RN = 1 and then we filter it by RN.
Without using CTE it will be something like this:
SELECT * FROM (
SELECT dbo.tbl_Opportunity.Id, dbo.tbl_Opportunity.Subject, dbo.tbl_User.UserName AS Responsible, dbo.tbl_Contact.Name AS Customer,
dbo.tbl_Opportunity.CreationDate, dbo.tbl_Opportunity.ActionDate AS [Planned Closure], dbo.tbl_OpportunityData.Volume,
dbo.tbl_OpportunityData.ChangeDate, dbo.tbl_OpportunityData.Chance,
ROW_NUMBER() OVER (PARTITION BY dbo.tbl_Opportunity.Id ORDER BY dbo.tbl_OpportunityData.ChangeDate DESC) RN
FROM dbo.tbl_Opportunity INNER JOIN
dbo.tbl_User ON dbo.tbl_Opportunity.Creator = dbo.tbl_User.Id INNER JOIN
dbo.tbl_Contact ON dbo.tbl_Opportunity.Customer = dbo.tbl_Contact.Id INNER JOIN
dbo.tbl_Opprtnty_tbl_OpprtnityData ON dbo.tbl_Opportunity.Id = dbo.tbl_Opprtnty_tbl_OpprtnityData.Id INNER JOIN
dbo.tbl_OpportunityData ON dbo.tbl_Opprtnty_tbl_OpprtnityData.Id2 = dbo.tbl_OpportunityData.Id
) AS Base WHERE RN = 1
The statement can be simplified for one more step further:
SELECT TOP 1 WITH TIES
dbo.tbl_Opportunity.Id, dbo.tbl_Opportunity.Subject, dbo.tbl_User.UserName AS Responsible,
dbo.tbl_Contact.Name AS Customer, dbo.tbl_Opportunity.CreationDate,
dbo.tbl_Opportunity.ActionDate AS [Planned Closure], dbo.tbl_OpportunityData.Volume,
dbo.tbl_OpportunityData.ChangeDate, dbo.tbl_OpportunityData.Chance
FROM
dbo.tbl_Opportunity INNER JOIN
dbo.tbl_User ON dbo.tbl_Opportunity.Creator = dbo.tbl_User.Id INNER JOIN
dbo.tbl_Contact ON dbo.tbl_Opportunity.Customer = dbo.tbl_Contact.Id INNER JOIN
dbo.tbl_Opprtnty_tbl_OpprtnityData ON dbo.tbl_Opportunity.Id = dbo.tbl_Opprtnty_tbl_OpprtnityData.Id INNER JOIN
dbo.tbl_OpportunityData ON dbo.tbl_Opprtnty_tbl_OpprtnityData.Id2 = dbo.tbl_OpportunityData.Id
ORDER BY
ROW_NUMBER() OVER (PARTITION BY dbo.tbl_Opportunity.Id ORDER BY dbo.tbl_OpportunityData.ChangeDate DESC);