Eliminate duplicate values on an inner join (is it even possible given this scenario?)

Eliminate duplicate values on an inner join (is it even possible given this scenario?) - tsql

I have a query that pulls information from two different tables at a different granular level. I was wondering if it is even possible to keep the value from repeating on the right (only return one row with 110811.67 rest zero) and preserve all values on the left.
select x.pt_year,
x.pt_month,
x.pt_amount,
y.fp_earnedprem
from
(
select sum(a.amount) as pt_amount
, va.ACCT_YEAR as pt_year, va.ACCT_MONTHINYEAR as pt_month, ptp.PTRANS_CODE as pt_code
from fact_policytransaction a
join VDIM_ACCOUNTINGDATE va
on a.ACCOUNTINGDATE_ID=va.ACCOUNTINGDATE_ID
join DIM_POLICYTRANSACTIONTYPE ptp
on a.POLICYTRANSACTIONTYPE_ID=ptp.POLICYTRANSACTIONTYPE_ID
group by va.ACCT_YEAR, va.ACCT_MONTHINYEAR, ptp.PTRANS_CODE
) as x
join
(
select sum(fp.EARNED_PREM_AMT) as fp_earnedprem
, dm.MON_YEAR as fp_year, dm.MON_MONTHINYEAR as fp_month
from fact_policycoverage fp
join dim_month dm on fp.MONTH_ID=dm.MONTH_ID
group by dm.MON_YEAR, dm.MON_MONTHINYEAR
) as y
on x.pt_year = y.fp_year
and x.pt_month = y.fp_month
where x.pt_year=2016 and x.pt_month=6
order by x.pt_year, x.pt_month
pt_year pt_month pt_amount fp_earnedprem
2016 6 4340.00 110811.67
2016 6 15569.00 110811.67
2016 6 30024.00 110811.67
pt_year pt_month pt_amount fp_earnedprem
2016 6 4340.00 110811.67
2016 6 15569.00 0
2016 6 30024.00 0

you can use ROW_NUMBER
;with cte as (
select x.pt_year,
x.pt_month,
x.pt_amount,
y.fp_earnedprem
from
(
select sum(a.amount) as pt_amount
, va.ACCT_YEAR as pt_year, va.ACCT_MONTHINYEAR as pt_month, ptp.PTRANS_CODE as pt_code
from fact_policytransaction a
join VDIM_ACCOUNTINGDATE va
on a.ACCOUNTINGDATE_ID=va.ACCOUNTINGDATE_ID
join DIM_POLICYTRANSACTIONTYPE ptp
on a.POLICYTRANSACTIONTYPE_ID=ptp.POLICYTRANSACTIONTYPE_ID
group by va.ACCT_YEAR, va.ACCT_MONTHINYEAR, ptp.PTRANS_CODE
) as x
join
(
select sum(fp.EARNED_PREM_AMT) as fp_earnedprem
, dm.MON_YEAR as fp_year, dm.MON_MONTHINYEAR as fp_month
from fact_policycoverage fp
join dim_month dm on fp.MONTH_ID=dm.MONTH_ID
group by dm.MON_YEAR, dm.MON_MONTHINYEAR
) as y
on x.pt_year = y.fp_year
and x.pt_month = y.fp_month
where x.pt_year=2016 and x.pt_month=6
)
, Rn as (
select *
, ROW_NUMBER() over (partition by pt_year, pt_month order by pt_year, pt_month) as RoNum
from cte
)
select pt_year, pt_month, pt_amount
, IIF(RoNum = 1, fp_enarnedprem, 0)
from Rn
order by pt_year, pt_month
the main part is your own query, I only added below section and a line on top and moved order by to the bottom.
)
, Rn as (
select *
, ROW_NUMBER() over (partition by pt_year, pt_month order by pt_year, pt_month) as RoNum
from cte
)
select pt_year, pt_month, pt_amount
, IIF(RoNum = 1, fp_enarnedprem, 0)
from Rn
order by pt_year, pt_month

Related

How to repeat some data points in query results?

I am trying to get the max date by account from 3 different tables and view those dates side by side. I created a separate query for each table, merged the results with UNION ALL, and then wrapped all that in a PIVOT.
The first 2 sections in the link/pic below show what I have been able to accomplish and the 3rd section is what I would like to do.
Query results by step
How can I get the results from 2 of the tables to repeat? Is that possible?
--define var_ent_type = 'ACOM'
--define var_ent_id = '52766'
--define var_dict_id = 113
SELECT
*
FROM
(
SELECT
E.ENTITY_TYPE,
E.ENTITY_ID,
'PERF_SUMMARY' as "TableName",
PS.DICTIONARY_ID,
to_char(MAX(PS.END_EFFECTIVE_DATE), 'YYYY-MM-DD') as "MaxDate"
FROM
RULESDBO.ENTITY E
INNER JOIN PERFORMDBO.PERF_SUMMARY PS ON (PS.ENTITY_ID = E.ENTITY_ID)
WHERE
1=1
-- AND E.ENTITY_TYPE = '&var_ent_type'
-- AND E.ENTITY_ID = '&var_ent_id'
AND PS.DICTIONARY_ID >= 100
AND (E.ACTIVE_STATUS <> 'N' )--and E.TERMINATION_DATE is null )
GROUP BY
E.ENTITY_TYPE,
E.ENTITY_ID,
'PERF_SUMMARY',
PS.DICTIONARY_ID
union all
SELECT
E.ENTITY_TYPE,
E.ENTITY_ID,
'POSITION' as "TableName",
0 as DICTIONARY_ID,
to_char(MAX(H.EFFECTIVE_DATE), 'YYYY-MM-DD') as "MaxDate"
FROM
RULESDBO.ENTITY E
INNER JOIN HOLDINGDBO.POSITION H ON (H.ENTITY_ID = E.ENTITY_ID)
WHERE
1=1
-- AND E.ENTITY_TYPE = '&var_ent_type'
-- AND E.ENTITY_ID = '&var_ent_id'
AND (E.ACTIVE_STATUS <> 'N' )--and E.TERMINATION_DATE is null )
GROUP BY
E.ENTITY_TYPE,
E.ENTITY_ID,
'POSITION',
1
union all
SELECT
E.ENTITY_TYPE,
E.ENTITY_ID,
'CASH_ACTIVITY' as "TableName",
0 as DICTIONARY_ID,
to_char(MAX(C.EFFECTIVE_DATE), 'YYYY-MM-DD') as "MaxDate"
FROM
RULESDBO.ENTITY E
INNER JOIN CASHDBO.CASH_ACTIVITY C ON (C.ENTITY_ID = E.ENTITY_ID)
WHERE
1=1
-- AND E.ENTITY_TYPE = '&var_ent_type'
-- AND E.ENTITY_ID = '&var_ent_id'
AND (E.ACTIVE_STATUS <> 'N' )--and E.TERMINATION_DATE is null )
GROUP BY
E.ENTITY_TYPE,
E.ENTITY_ID,
'CASH_ACTIVITY',
1
--ORDER BY
-- 2,3, 4
)
PIVOT
(
MAX("MaxDate")
FOR "TableName"
IN ('CASH_ACTIVITY', 'PERF_SUMMARY','POSITION')
)

Everything is possible. You only need a window function to make the value repeat across rows w/o data.
--Assuming current query is QC
With QC as (
...
)
select code, account, grouping,
--cash,
first_value(cash) over (partition by code, account order by grouping asc rows unbounded preceding) as cash_repeat,
perf,
--pos,
first_value(pos) over (partition by code, account order by grouping asc rows unbounded preceding) as pos_repeat
from QC
;
See first_value() help here: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/FIRST_VALUE.html#GUID-D454EC3F-370C-4C64-9B11-33FCB10D95EC

How to collapse overlapping date periods with acceptable gaps using T-SQL?

We want to group our members' enrollments into "continuous enrollments," allowing for a gap of up to 45 days. I know how to use LEAD to determine if an enrollment should be grouped with the next, but I don't know how to group them. Would it be more appropriate to add 45 to the term date and subtract 45 from the effective date, then check for overlapping date periods? My goal is to have a SQL view that returns the results similar to the final query below. Thank you for your help.
SELECT '101' AS MemID, '2021-01-01' AS EffDate, '2021-01-31' AS TermDate INTO #T1 UNION
SELECT '101', '2021-02-01', '2021-02-28' UNION
SELECT '101', '2021-03-01', '2021-03-31' UNION
SELECT '101', '2021-06-01', '2021-06-30' UNION
SELECT '999', '2021-01-01', '2021-01-15' UNION
SELECT '999', '2021-09-01', '2021-09-28' UNION
SELECT '999', '2021-10-01', '2021-10-31'
SELECT *
, LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS LeadEffDate
, DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate))) AS DaysToNextEnrollment
, CASE WHEN (DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate)))) <= 45 THEN 1 ELSE 0 END AS CombineWithNextRecord
FROM #T1
-- result objective
SELECT 101 AS MemID, '2021-01-01' AS EffDate, '2021-03-31' AS TermDate UNION
SELECT 101, '2021-06-01', '2021-06-30' UNION
SELECT 999, '2021-01-01', '2021-01-15' UNION
SELECT 999, '2021-09-01', '2021-10-31'

I think you are really close. Your question is very similar to
TSQL - creating from-to date table while ignoring in-between steps with conditions with a logic difference on what you want to consider to be the same group.
My basic approach is to use the LAG() function to figure out the previous values for MemID and TermDate and combine that with your 45 day rule to define a group. And finally get the first and last values of each group.
Here is my response to that question modified to your situation.
SELECT
a4.MemID
, CONVERT (DATE, a4.First_EffDate) AS [EffDate]
, CONVERT (DATE, a4.TermDate) AS [TermDate]
FROM (
SELECT
a3.MemID
, a3.EffDate
, a3.TermDate
, a3.MemID_group
, FIRST_VALUE (a3.EffDate) OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate) AS [First_EffDate]
, ROW_NUMBER () OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate DESC) AS [Row_number]
FROM (
SELECT
a2.MemID
, a2.EffDate
, a2.TermDate
, a2.Previous_MemID
, a2.Previous_TermDate
, a2.New_group
, SUM (a2.New_group) OVER (ORDER BY a2.MemID, a2.EffDate) AS [MemID_group]
FROM (
SELECT
a1.MemID
, a1.EffDate
, a1.TermDate
, a1.Previous_MemID
, a1.Previous_TermDate
---------------------------------------------------------------------------------
-- new group if the MemID is different from the previous row OR
-- if the MemID is the same as the previous row AND it has been more than 45 days
-- between the TermDate of the previous row and the EffDate of the current row
,
IIF((a1.MemID <> a1.Previous_MemID)
OR (
a1.MemID = a1.Previous_MemID
AND DATEDIFF (DAY, a1.Previous_TermDate, a1.EffDate) > 45
)
, 1
, 0) AS [New_group]
---------------------------------------------------------------------------------
FROM (
SELECT
MemID
, EffDate
, TermDate
, LAG (MemID) OVER (ORDER BY MemID) AS [Previous_MemID]
, LAG (TermDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS [Previous_TermDate]
FROM #T1
) a1
) a2
) a3
) a4
WHERE a4.[Row_number] = 1;
Here is the dbfiddle.

Checking Slowly Changing Dimension 2

I have a table that looks like this:
A slowly changing dimension type 2, according to Kimball.
Key is just a surrogate key, a key to make rows unique.
As you can see there are three rows for product A.
Timelines for this product are ok. During time the description of the product changes.
From 1-1-2020 up until 4-1-2020 the description of this product was ProdA1.
From 5-1-2020 up until 12-2-2020 the description of this product was ProdA2 etc.
If you look at product B, you see there are gaps in the timeline.
We use DB2 V12 z/Os. How can I check if there are gaps in the timelines for each and every product?
Tried this, but doesn't work
with selectie (key, tel) as
(select product, count(*)
from PROD_TAB
group by product
having count(*) > 1)
Select * from
PROD_TAB A
inner join selectie B
on A.product = B.product
Where not exists
(SELECT 1 from PROD_TAB C
WHERE A.product = C.product
AND A.END_DATE + 1 DAY = C.START_DATE
)
Does anyone know the answer?

The following query returns all gaps for all products.
The idea is to enumerate (RN column) all periods inside each product by START_DATE and join each record with its next period record.
WITH
/*
MYTAB (PRODUCT, DESCRIPTION, START_DATE, END_DATE) AS
(
SELECT 'A', 'ProdA1', DATE('2020-01-01'), DATE('2020-01-04') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA2', DATE('2020-01-05'), DATE('2020-02-12') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA3', DATE('2020-02-13'), DATE('2020-12-31') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB1', DATE('2020-01-05'), DATE('2020-01-09') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB2', DATE('2020-01-12'), DATE('2020-03-14') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB3', DATE('2020-03-15'), DATE('2020-04-18') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB4', DATE('2020-04-16'), DATE('2020-05-03') FROM SYSIBM.SYSDUMMY1
)
,
*/
MYTAB_ENUM AS
(
SELECT
T.*
, ROWNUMBER() OVER (PARTITION BY PRODUCT ORDER BY START_DATE) RN
FROM MYTAB T
)
SELECT A.PRODUCT, A.END_DATE + 1 START_DT, B.START_DATE - 1 END_DT
FROM MYTAB_ENUM A
JOIN MYTAB_ENUM B ON B.PRODUCT = A.PRODUCT AND B.RN = A.RN + 1
WHERE A.END_DATE + 1 <> B.START_DATE
AND A.END_DATE < B.START_DATE;
The result is:
|PRODUCT|START_DT |END_DT |
|-------|----------|----------|
|B |2020-01-10|2020-01-11|
May be more efficient way:
WITH MYTAB2 AS
(
SELECT
T.*
, LAG(END_DATE) OVER (PARTITION BY PRODUCT ORDER BY START_DATE) END_DATE_PREV
FROM MYTAB T
)
SELECT PRODUCT, END_DATE_PREV + 1 START_DATE, START_DATE - 1 END_DATE
FROM MYTAB2
WHERE END_DATE_PREV + 1 <> START_DATE
AND END_DATE_PREV < START_DATE;

Thnx Mark, will try this one of these days.
Never heard of LAG in DB2 V12 for z/Os
Will read about it
Thnx

Distinct one column return selected columns and order by date desc using sybase

Hi Guys i have a problem on how im going to distinct single column and return selected column
Ac_no ord_status order_no
12334 PL 1
12334 ML 2
12334 CL 3
64543 PL 1
65778 JL 6
83887 CL 4
83887 KL 3
Ac_no ord_statu sorder_no
12334 CL 3
64543 PL 1
65778 JL 6
83887 CL 4
i want to see that result
here is my sample or code but unfortunately the code didnt work in sybase 1.2.0.637
SELECT Ac_no, ord_status, order_no
select *, ROW_NUMBER() OVER (PARTITION BY Ac_no order by ord_status)rm
from wo_order)x
where x = 1

It appears that you want to display, for each Ac_no group of records, the single record having the lowest ord_status. You were on the right track, but you need to restrict the subquery using the alias you defined for the row number:
SELECT Ac_no, ord_status, order_no
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Ac_no ORDER BY ord_status) rn
FROM wo_order
) t
WHERE rn = 1;
Here is a version which should run on your version of Sybase, even without using ROW_NUMBER:
SELECT w1.Ac_no, w1.ord_status, w1.order_no
FROM wo_order w1
INNER JOIN
(
SELECT Ac_no, MIN(ord_status) AS min_ord_status
FROM wo_order
GROUP BY Ac_no
) w2
ON w1.Ac_no = w2.Ac_no AND
w1.ord_status = w2.min_ord_status;

Should work without a window function:
select t1.* from wo_order t1,
(select max(order_no) order_no, ac_no from wo_order group by ac_no) t2
where
t1.ac_no=t2.ac_no
and t1.order_no=t2.order_no

Compare varchar string to produce missing items list

I have a table with a column. The column stores locations using varchar as the datatype. The locations use the format -2,7 -25,30 etc. I am trying to produce a list of missing locations i.e. where we don't have any customers.
The locations go from -30,-30 to 30,30. I can't find a way to setup a loop to run though all the options. Is there a way to do this?

Microsoft SQL Server 2017
;WITH cte as (
select -30 as n --anchor member
UNION ALL
select n + 1 --recursive member
from cte
where n < 31
)
select z.*
from (
select CONCAT(y.n,',',x.n) as locations
from cte as x CROSS JOIN cte y
) as z
LEFT OUTER JOIN dbo.Client as cli ON cli.client_location = z.locations
where cli.client_location IS NULL
order by z.locations asc

Generate all combinations.
Then match the generated against the existing combinations.
WITH DIGITS AS
(
SELECT n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS val(n)
),
NUMS AS
(
SELECT (tens.n * 10 + ones.n)-50 AS n
FROM DIGITS ones
CROSS JOIN DIGITS tens
),
LOCATIONS AS
(
SELECT CONCAT(n1.n,',',n2.n) AS location, n1.n as n1, n2.n as n2
FROM NUMS n1
JOIN NUMS n2 ON n2.n BETWEEN -30 AND 30
WHERE n1.n BETWEEN -30 AND 30
)
SELECT loc.location
FROM LOCATIONS loc
LEFT JOIN
(
SELECT Client_Location, COUNT(*) Cnt
FROM dbo.Client
GROUP BY Client_Location
) cl ON cl.Client_Location = loc.location
WHERE cl.Client_Location IS NULL
ORDER BY loc.n1, loc.n2

I would go with a recursive CTE. This is a slight variation of SNR's approach:
with cte as (
select -30 as n --anchor member
union all
select n + 1 --recursive member
from cte
where n < 30
)
select cte.x, cte.y,
concat(cte_x.n, ',', cte_y.n) as missing_location
from cte cte_x cross join
cte cte_y left join
dbo.client c
on c.client_location = concat(cte_x.n, ',', cte_y.n)
where c.client_location is null;
Or to avoid the concat() twice:
select cte.x, cte.y, v.location as missing_location
from cte cte_x cross join
cte cte_y cross apply
(values (concat(cte_x.n, ',', cte_y.n))
) v(location) left join
dbo.client c
on c.client_location = v.location
where c.client_location is null;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Eliminate duplicate values on an inner join (is it even possible given this scenario?) - tsql

Related

How to repeat some data points in query results?

How to collapse overlapping date periods with acceptable gaps using T-SQL?

Checking Slowly Changing Dimension 2

Distinct one column return selected columns and order by date desc using sybase

Compare varchar string to produce missing items list

Categories

Resources