DISTINCT isn't removing dupes

DISTINCT isn't removing dupes - tsql

I am not sure how to use DISTINCT in an AB BA fashion. For instance, I have two columns BoughtLoyaltyProgramId, SoldLoyaltyProgramId. But even when I use DISTINCT, it produces a duplicate when the same code in a boughtloyaltyprogramid appears in soldloyaltyprogramid. I want no dupes but I have no idea how this works with multiple columns and pairings.
Here is the stored procedure:
ALTER PROC AA
#LPPProgramID UNIQUEIDENTIFIER ,
#DateFrom DATETIME ,
#DateTo DATETIME
AS
SELECT DISTINCT TOP ( 5 )
BoughtLoyaltyProgramId ,
SoldLoyaltyProgramId ,
DateTransactionCleared ,
ExchangeRate
FROM dbo.PEX_ClearedTransactions
WHERE DateTransactionCleared >= #DateFrom
AND DateTransactionCleared < #DateTo
AND ( BoughtLoyaltyProgramId = #LPPProgramID
OR SoldLoyaltyProgramId = #LPPProgramID
)
ORDER BY ExchangeRate;
GO

Distinct is per ROW, so the value in the columns in a row are in a distinct combination, the data isn't compared in each column of a row, to other columns in that row.
You likely will also want to do some comparison in your Where statement for the column data.

Perhaps you want to use ROW_NUMBER:
WITH cte
AS (SELECT boughtloyaltyprogramid,
soldloyaltyprogramid,
datetransactioncleared,
exchangerate,
RN=Row_number() OVER(
partition BY boughtloyaltyprogramid, soldloyaltyprogramid
ORDER BY exchangerate)
FROM dbo.pex_clearedtransactions
WHERE datetransactioncleared >= #DateFrom
AND datetransactioncleared < #DateTo
AND ( boughtloyaltyprogramid = #LPPProgramID
OR soldloyaltyprogramid = #LPPProgramID ))
SELECT TOP(5) * FROM cte
WHERE RN = 1
ORDER BY exchangerate

Here is how you can get all distinct values from two columns:
SELECT distinct * from
(SELECT BoughtLoyaltyProgramId
FROM dbo.PEX_ClearedTransactions
UNION ALL
SELECT SoldLoyaltyProgramId
FROM dbo.PEX_ClearedTransactions) as A

Related

How to collapse overlapping date periods with acceptable gaps using T-SQL?

We want to group our members' enrollments into "continuous enrollments," allowing for a gap of up to 45 days. I know how to use LEAD to determine if an enrollment should be grouped with the next, but I don't know how to group them. Would it be more appropriate to add 45 to the term date and subtract 45 from the effective date, then check for overlapping date periods? My goal is to have a SQL view that returns the results similar to the final query below. Thank you for your help.
SELECT '101' AS MemID, '2021-01-01' AS EffDate, '2021-01-31' AS TermDate INTO #T1 UNION
SELECT '101', '2021-02-01', '2021-02-28' UNION
SELECT '101', '2021-03-01', '2021-03-31' UNION
SELECT '101', '2021-06-01', '2021-06-30' UNION
SELECT '999', '2021-01-01', '2021-01-15' UNION
SELECT '999', '2021-09-01', '2021-09-28' UNION
SELECT '999', '2021-10-01', '2021-10-31'
SELECT *
, LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS LeadEffDate
, DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate))) AS DaysToNextEnrollment
, CASE WHEN (DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate)))) <= 45 THEN 1 ELSE 0 END AS CombineWithNextRecord
FROM #T1
-- result objective
SELECT 101 AS MemID, '2021-01-01' AS EffDate, '2021-03-31' AS TermDate UNION
SELECT 101, '2021-06-01', '2021-06-30' UNION
SELECT 999, '2021-01-01', '2021-01-15' UNION
SELECT 999, '2021-09-01', '2021-10-31'

I think you are really close. Your question is very similar to
TSQL - creating from-to date table while ignoring in-between steps with conditions with a logic difference on what you want to consider to be the same group.
My basic approach is to use the LAG() function to figure out the previous values for MemID and TermDate and combine that with your 45 day rule to define a group. And finally get the first and last values of each group.
Here is my response to that question modified to your situation.
SELECT
a4.MemID
, CONVERT (DATE, a4.First_EffDate) AS [EffDate]
, CONVERT (DATE, a4.TermDate) AS [TermDate]
FROM (
SELECT
a3.MemID
, a3.EffDate
, a3.TermDate
, a3.MemID_group
, FIRST_VALUE (a3.EffDate) OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate) AS [First_EffDate]
, ROW_NUMBER () OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate DESC) AS [Row_number]
FROM (
SELECT
a2.MemID
, a2.EffDate
, a2.TermDate
, a2.Previous_MemID
, a2.Previous_TermDate
, a2.New_group
, SUM (a2.New_group) OVER (ORDER BY a2.MemID, a2.EffDate) AS [MemID_group]
FROM (
SELECT
a1.MemID
, a1.EffDate
, a1.TermDate
, a1.Previous_MemID
, a1.Previous_TermDate
---------------------------------------------------------------------------------
-- new group if the MemID is different from the previous row OR
-- if the MemID is the same as the previous row AND it has been more than 45 days
-- between the TermDate of the previous row and the EffDate of the current row
,
IIF((a1.MemID <> a1.Previous_MemID)
OR (
a1.MemID = a1.Previous_MemID
AND DATEDIFF (DAY, a1.Previous_TermDate, a1.EffDate) > 45
)
, 1
, 0) AS [New_group]
---------------------------------------------------------------------------------
FROM (
SELECT
MemID
, EffDate
, TermDate
, LAG (MemID) OVER (ORDER BY MemID) AS [Previous_MemID]
, LAG (TermDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS [Previous_TermDate]
FROM #T1
) a1
) a2
) a3
) a4
WHERE a4.[Row_number] = 1;
Here is the dbfiddle.

SQL Server - Select with Group By together Raw_Number

I'm using SQL Server 2000 (80). So, it's not possible to use the LAG function.
I have a code a data set with four columns:
Purchase_Date
Facility_no
Seller_id
Sale_id
I need to identify missing Sale_ids. So every sale_id is a 100% sequential, so the should not be any gaps in order.
This code works for a specific date and store if specified. But i need to work on entire data set looping looping through every facility_id and every seller_id for ever purchase_date
declare #MAXCOUNT int
set #MAXCOUNT =
(
select MAX(Sale_Id)
from #table
where
Facility_no in (124) and
Purchase_date = '2/7/2020'
and Seller_id = 1
)
;WITH TRX_COUNT AS
(
SELECT 1 AS Number
union all
select Number + 1 from TRX_COUNT
where Number < #MAXCOUNT
)
select * from TRX_COUNT
where
Number NOT IN
(
select Sale_Id
from #table
where
Facility_no in (124)
and Purchase_Date = '2/7/2020'
and seller_id = 1
)
order by Number
OPTION (maxrecursion 0)
My Dataset

This column:
case when
Sale_Id=0 or 1=Sale_Id-LAG(Sale_Id) over (partition by Facility_no, Purchase_Date, Seller_id)
then 'OK' else 'Previous Missing' end
will tell you which Seller_Ids have some sale missing. If you want to go a step further and have exactly your desired output, then filter out and distinct the 'Previous Missing' ones, and join with a tally table on not exists.
Edit: OP mentions in comments they can't use LAG(). My suggestion, then, would be:
Make a temp table that that has the max(sale_id) group by facility/seller_id
Then you can get your missing results by this pseudocode query:
Select ...
from temptable t
inner join tally N on t.maxsale <=N.num
where not exists( select ... from sourcetable s where s.facility=t.facility and s.seller=t.seller and s.sale=N.num)
> because the only way to "construct" nonexisting combinations is to construct them all and just remove the existing ones.

This one worked out
; WITH cte_Rn AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY Facility_no, Purchase_Date, Seller_id ORDER BY Purchase_Date) AS [Rn_Num]
FROM (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id
FROM MyTable WITH (NOLOCK)
) a
)
, cte_Rn_0 as (
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Rn_Num] AS 'Skipped Sale'
-- , case when Sale_id = 0 Then [Rn_Num] - 1 Else [Rn_Num] End AS 'Skipped Sale for 0'
, [Rn_Num] - 1 AS 'Skipped Sale for 0'
FROM cte_Rn a
)
SELECT
Facility_no,
Purchase_Date,
Seller_id,
Sale_id,
-- [Skipped Sale],
[Skipped Sale for 0]
FROM cte_Rn_0 a
WHERE NOT EXISTS
(
select * from cte_Rn_0 b
where b.Sale_id = a.[Skipped Sale for 0]
and a.Facility_no = b.Facility_no
and a.Purchase_Date = b.Purchase_Date
and a.Seller_id = b.Seller_id
)
--ORDER BY Purchase_Date ASC

Parse Numeric Ranges in PostgreSQL

I would like to produce a string containing some parsed numeric ranges.
I have a table with some data
b_id,s_id
1,50
1,51
1,53
1,61
1,62
1,63
2,91
2,95
2,96
2,97
Using only SQL in PostgreSQL, how could I produce this output:
b_id,s_seqs
1,"50-51,53,61-63"
2,"91,95-97"
How on earth do I do that?

select b_id, string_agg(seq, ',' order by seq_no) as s_seqs
from (
select
b_id, seq_no,
replace(regexp_replace(string_agg(s_id::text, ','), ',.+,', '-'), ',', '-') seq
from (
select
b_id, s_id,
sum(mark) over w as seq_no
from (
select
b_id, s_id,
(s_id- 1 <> lag(s_id, 1, s_id) over w)::int as mark
from my_table
window w as (partition by b_id order by s_id)
) s
window w as (partition by b_id order by s_id)
) s
group by 1, 2
) s
group by 1;
Here you can find a step-by-step analyse from the innermost query towards the outside.

How to order UNPIVOT

I have the following UNPIVOT code and I would like to order it by the FactSheetSummary columns so that when it is converted to rows it is order 1 - 12:
INSERT INTO #Results
SELECT DISTINCT ReportingDate, PortfolioID,ISIN, PortfolioNme, Section,REPLACE(REPLACE(Risks,'‘',''''),'’','''')
FROM
(SELECT DISTINCT
ReportingDate
, PortfolioID
, ISIN
, PortfolioNme
, Section
, FactSheetSummary_1, FactSheetSummary_2, FactSheetSummary_3
, FactSheetSummary_4, FactSheetSummary_5, FactSheetSummary_6
, FactSheetSummary_7, FactSheetSummary_8, FactSheetSummary_9
, FactSheetSummary_10, FactSheetSummary_11, FactSheetSummary_12
FROM #WorkingTableFactsheet) p
UNPIVOT
(Risks FOR FactsheetSummary IN
( FactSheetSummary_1, FactSheetSummary_2, FactSheetSummary_3
, FactSheetSummary_4, FactSheetSummary_5, FactSheetSummary_6
, FactSheetSummary_7, FactSheetSummary_8, FactSheetSummary_9
, FactSheetSummary_10, FactSheetSummary_11, FactSheetSummary_12)
)AS unpvt;
--DELETE records where there are no Risk Narratives
DELETE FROM #Results
WHERE Risks = ''
SELECT
ReportingDate
, PortfolioID
, ISIN
, PortfolioNme
, Section
, Risks
, ROW_NUMBER() OVER(PARTITION BY ISIN,Section ORDER BY ISIN,Section,Risks) as SortOrder
FROM #Results
order by ISIN, Risks
Is it possible to do this? I thought the IN of the UNPIVOT would dictate the order? Do I need to add a column to dictate which I would like to be 1 through to 12?

Just use case expression to order:
order by case FactsheetSummary
when 'FactSheetSummary_1' then 1
when 'FactSheetSummary_2' then 2
when 'FactSheetSummary_12' then 12 end

SQL Column Populating

I want to know if it is possible to create another column in a table that has data that I wish to populate in this new column? The new column is Flag2. Here is the table:
what I want to do is, where item id is 30, I want the ITEM ID to only display 30 once and, populate the QC Unsupportted in Flag2? How do I do this?
I can only think of doing an inner join but this is not working.
This is what I have done in trying to do so:
SELECT
A.ITEMID, A.FLAG1, A.FLAG2
FROM
#FLAGS as A
INNER JOIN
#FLAGS as B ON A.ITEMID = B.ITEMID
GROUP BY
a.ITEMID, a.FLAG1, A.FLAG2
ORDER BY
ITEMID

Assuming I understand what you are after, if the current FLAG1 values are distinct for any ITEMID and you only have at most two instances of the same ID, I think this should do what you want:
SELECT
lft.ITEMID
, lft.FLAG1
, rght.FLAG1 FLAG2
FROM (
SELECT
t.ITEMID
, t.FLAG1
FROM (
SELECT
l.ITEMID
, l.FLAG1
, COUNT(l.ITEMID) i
FROM #FLAGS l
INNER JOIN #FLAGS r ON l.ITEMID = r.ITEMID
WHERE r.FLAG1 <= l.FLAG1
GROUP BY
l.ITEMID
, l.FLAG1) t
WHERE t.i=1) lft
LEFT OUTER JOIN (
SELECT
t.ITEMID
, t.FLAG1
FROM (
SELECT
l.ITEMID
, l.FLAG1
, COUNT(l.ITEMID) i
FROM #FLAGS l
INNER JOIN #FLAGS r ON l.ITEMID = r.ITEMID
WHERE r.FLAG1 <= l.FLAG1
GROUP BY
l.ITEMID
, l.FLAG1) t
WHERE t.i=2) rght ON lft.ITEMID = rght.ITEMID
-- Or better
SELECT
lft.ITEMID
, lft.FLAG1
, rght.FLAG1 FLAG2
FROM (
SELECT
t.ITEMID
, t.FLAG1
FROM (
SELECT
l.ITEMID
, l.FLAG1
, ROW_NUMBER() OVER(PARTITION BY ITEMID ORDER BY FLAG1) as i
FROM test l) t
WHERE t.i=1) lft
LEFT OUTER JOIN (
SELECT
t.ITEMID
, t.FLAG1
FROM (
SELECT
l.ITEMID
, l.FLAG1
, ROW_NUMBER() OVER(PARTITION BY ITEMID ORDER BY FLAG1) as i
FROM test l) t
WHERE t.i=2) rght ON lft.ITEMID = rght.ITEMID
If you have additional flag values for the same ID, a new outer join can be added to a new inline table (rght2, rght3, etc.) where i=3, 4, etc. and you are selecting rght2 AS FLAG3, rght3 AS FLAG4, etc.
Also note that the current values for FLAG1 will be distributed through FLAG1 and FLAG2 in alphabetical order. If you wanted to distribute them in reverse order you could replace <= with >=. If you had more than two flags that you wanted distributed in a specific order, you would have to create a separate table with a ranking value and join to that which would be doable but even uglier!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

DISTINCT isn't removing dupes - tsql

Distinct is per ROW, so the value in the columns in a row are in a distinct combination, the data isn't compared in each column of a row, to other columns in that row. You likely will also want to do some comparison in your Where statement for the column data.

Here is how you can get all distinct values from two columns: SELECT distinct * from (SELECT BoughtLoyaltyProgramId FROM dbo.PEX_ClearedTransactions UNION ALL SELECT SoldLoyaltyProgramId FROM dbo.PEX_ClearedTransactions) as A

Related

How to collapse overlapping date periods with acceptable gaps using T-SQL?

SQL Server - Select with Group By together Raw_Number

Parse Numeric Ranges in PostgreSQL

How to order UNPIVOT

SQL Column Populating

Categories

Resources