Is there a way to implement a many-to-one relationship in DB2? - db2

I need to create a data structure like this:
Table 1
Code, Value, Offer_ID
I am creating a service that, for a given combination of "Code" and "Value", must return an Offer_ID that I preconfigured.
For example:
Code Value Offer_ID
------ ------- ----------
Age 30 OFF1
Age 30 OFF2
Province RM OFF2
Age 40 OFF3
Province TO OFF3
Age 40 OFF4
Province TO OFF4
Operator TIM OFF4
The calling service always calls me passing the Age, Province and operator values.
I have to look in this table if I find a specific Offer_ID for the three values ​​together (as OFF4), or for 2 (as OFF3) or for Age which is the only mandatory (OFF1).
So if the client passes me Province BO and operator WIND I have to return OFF1
How can I do ? How can I structure the tables and the query?
I hope I was able to expose the problem ...
Thanks 1000 to those who help me ... we are going crazy ... !!!

Try this:
with tab (age, province, operator, offer_id) as (values
(30, null, null, 'OFF1')
, (30, 'RM', null, 'OFF2')
, (40, 'TO', null, 'OFF3')
, (40, 'TO', 'TIM', 'OFF4')
)
, op_inp (age, province, operator) as (values
--(40, 'TO', 'TIM') --'OFF4'
(40, 'TO', 'VODAFONE') --'OFF3'
--(30, 'RM', 'VODAFONE') --'OFF2'
--(30, 'TO', 'VODAFONE') --'OFF1'
)
select offer_id /*Just for info*/, order_flag
from
(
select t.*, 3 as order_flag
from tab t
join op_inp o on o.age=t.age and o.province=t.province and o.operator=t.operator
union all
select t.*, 2 as order_flag
from tab t
join op_inp o on o.age=t.age and o.province=t.province --and t.operator is null
union all
select t.*, 1 as order_flag
from tab t
join op_inp o on o.age=t.age --and t.province is null and t.operator is null
)
order by order_flag desc
fetch first 1 row only
;

Related

Unpivot Columns with Most Recent Record

Student Records are updated for subject and update date. Student can be enrolled in one or multiple subjects. I would like to get each student record with most subject update date and status.
CREATE TABLE Student
(
StudentID int,
FirstName varchar(100),
LastName varchar(100),
FullAddress varchar(100),
CityState varchar(100),
MathStatus varchar(100),
MUpdateDate datetime2,
ScienceStatus varchar(100),
SUpdateDate datetime2,
EnglishStatus varchar(100),
EUpdateDate datetime2
);
Desired query output, I am using CTE method but trying to find alternative and better way.
SELECT StudentID, FirstName, LastName, FullAddress, CityState, [SubjectStatus], UpdateDate
FROM Student
;WITH orginal AS
(SELECT * FROM Student)
,Math as
(
SELECT DISTINCT StudentID, FirstName, LastName, FullAddress, CityState,
ROW_NUMBER OVER (PARTITION BY StudentID, MathStatus ORDER BY MUpdateDate DESC) as rn
, _o.MathStatus as SubjectStatus, _o.MupdateDate as UpdateDate
FROM original as o
left join orignal as _o on o.StudentID = _o.StudentID
where _o.MathStatus is not null and _o.MUpdateDate is not null
)
,Science AS
(
...--Same as Math
)
,English AS
(
...--Same As Math
)
SELECT * FROM Math WHERE rn = 1
UNION
SELECT * FROM Science WHERE rn = 1
UNION
SELECT * FROM English WHERE rn = 1
First: storing data in a denormalized form is not recommended. Some data model redesign might be in order. There are multiple resources about data normalization available on the web, like this one.
Now then, I made some guesses about how your source table is populated based on the query you wrote. I generated some sample data that could show how the source data is created. Besides that I also reduced the number of columns to reduce my typing efforts. The general approach should still be valid.
Sample data
create table Student
(
StudentId int,
StudentName varchar(15),
MathStat varchar(5),
MathDate date,
ScienceStat varchar(5),
ScienceDate date
);
insert into Student (StudentID, StudentName, MathStat, MathDate, ScienceStat, ScienceDate) values
(1, 'John Smith', 'A', '2020-01-01', 'B', '2020-05-01'),
(1, 'John Smith', 'A', '2020-01-01', 'B+', '2020-06-01'), -- B for Science was updated to B+ month later
(2, 'Peter Parker', 'F', '2020-01-01', 'A', '2020-05-01'),
(2, 'Peter Parker', 'A+', '2020-03-01', 'A', '2020-05-01'), -- Spider-Man would never fail Math, fixed...
(3, 'Tom Holland', null, null, 'A', '2020-05-01'),
(3, 'Tom Holland', 'A-', '2020-07-01', 'A', '2020-05-01'); -- Tom was sick for Math, but got a second chance
Solution
Your question title already contains the word unpivot. That word actually exists in T-SQL as a keyword. You can learn about the unpivot keyword in the documentation. Your own solution already contains common table expression, these constructions should look familiar.
Steps:
cte_unpivot = unpivot all rows, create a Subject column and place the corresponding values (SubjectStat, Date) next to it with a case expression.
cte_recent = number the rows to find the most recent row per student and subject.
Select only those most recent rows.
This gives:
with cte_unpivot as
(
select up.StudentId,
up.StudentName,
case up.[Subject]
when 'MathStat' then 'Math'
when 'ScienceStat' then 'Science'
end as [Subject],
up.SubjectStat,
case up.[Subject]
when 'MathStat' then up.MathDate
when 'ScienceStat' then up.ScienceDate
end as [Date]
from Student s
unpivot ([SubjectStat] for [Subject] in ([MathStat], [ScienceStat])) up
),
cte_recent as
(
select cu.StudentId, cu.StudentName, cu.[Subject], cu.SubjectStat, cu.[Date],
row_number() over (partition by cu.StudentId, cu.[Subject] order by cu.[Date] desc) as [RowNum]
from cte_unpivot cu
)
select cr.StudentId, cr.StudentName, cr.[Subject], cr.SubjectStat, cr.[Date]
from cte_recent cr
where cr.RowNum = 1;
Result
StudentId StudentName Subject SubjectStat Date
----------- --------------- ------- ----------- ----------
1 John Smith Math A 2020-01-01
1 John Smith Science B+ 2020-06-01
2 Peter Parker Math A+ 2020-03-01
2 Peter Parker Science A 2020-05-01
3 Tom Holland Math A- 2020-07-01
3 Tom Holland Science A 2020-05-01

Using "UNION ALL" and "GROUP BY" to implement "Intersect"

I'v provided following query to find common records in 2 data sets but it's difficult for me to make sure about correctness of my query because of that I have a lot of data records in my DB.
Is it OK to implement Intersect between "Customers" & "Employees" tables using UNION ALL and apply GROUP BY on the result like below?
SELECT D.Country, D.Region, D.City
FROM (SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
So can we say that any record which exists in the result of this query also exists in the Intersect set between "Customers & Employees" tables AND any record that exists in Intersect set between "Customers & Employees" tables will be in the result of this query too?
So is it right to say any record in result of this query is in
"Intersect" set between "Customers & Employees" "AND" any record that
exist in "Intersect" set between "Customers & Employees" is in result
of this query too?
YES.
... Yes, but it won't be as efficient because you are filtering out duplicates three times instead of once. In your query you're
Using DISTINCT to pull unique records from employees
Using DISTINCT to pull unique records from customers
Combining both queries using UNION ALL
Using GROUP BY in your outer query to to filter the records you retrieved in steps 1,2 and 3.
Using INTERSECT will return identical results but more efficiently. To see for yourself you can create the sample data below and run both queries:
use tempdb
go
if object_id('dbo.customers') is not null drop table dbo.customers;
if object_id('dbo.employees') is not null drop table dbo.employees;
create table dbo.customers
(
customerId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
create table dbo.employees
(
employeeId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
insert dbo.customers(country, region, city)
values ('us', 'N/E', 'New York'), ('us', 'N/W', 'Seattle'),('us', 'Midwest', 'Chicago');
insert dbo.employees
values ('us', 'S/E', 'Miami'), ('us', 'N/W', 'Portland'),('us', 'Midwest', 'Chicago');
Run these queries:
SELECT D.Country, D.Region, D.City
FROM
(
SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
SELECT Country, Region, City
FROM dbo.customers
INTERSECT
SELECT Country, Region, City
FROM dbo.employees;
Results:
Country Region City
----------- ---------- ----------
us Midwest Chicago
Country Region City
----------- ---------- ----------
us Midwest Chicago
If using INTERSECT is not an option OR you want a faster query you could improve the query you posted a couple different ways, such as:
Option 1: let GROUP BY handle ALL the de-duplication like this:
This is the same as what you posted but without the DISTINCTS
SELECT D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
Option 2: Use ROW_NUMBER
This would be my preference and will likely be most efficient
SELECT Country, Region, City
FROM
(
SELECT
rn = row_number() over (partition by D.Country, D.Region, D.City order by (SELECT null)),
D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
) uniquify
WHERE rn = 2;

Convertion of tabular data to JSON in Redshift

I am unable to figure out how to convert tabular data to JSON format and store it in another table in Redshift. For example, I have a "DEMO" table with four columns: pid,stid,item_id,trans_id.
For each combination of pid,stid,item_id there exist many trans_ids.
pid stid item_id trans_id :
1 , AB , P1 , T1
1 , AB , P1 , T2
1 , AB , P1 , T3
1 , AB , P1 , T4
2 , ABC , P2 , T5
2 , ABC , P2 , T6
2 , ABC , P2 , T7
2 , ABC , P2 , T8
I want to store this data in another table called "SAMPLE" as:
pid stid item_id trans_id
1 , AB , P1 , {"key1":T1, "key2":"T2" "key2":"T3" "key2":"T4"}
2 , ABC , P2 , {"key1":T5, "key2":"T6" "key2":"T7" "key2":"T8"}
I am unable to figure out how to load the data from "DEMO" to "SAMPLE" in JSON format only for column "trans_id" using a SQL query in Redshift. I don't want to use any intermediate files.
There is LISTAGG aggregate function that allows you to concatenate text values within groups. It allows the effective construction of JSON objects:
SELECT
pid
,stid
,item_id
,'{'||listagg(
'"key'||row_number::varchar||'":'||trans_id::varchar
,',') within group (order by row_number)
||'}'
FROM (
SELECT *, row_number() over (partition by pid,stid,item_id order by trans_id)
FROM "DEMO"
)
GROUP BY 1,2,3;
As a side note, in this particular case an array of transaction IDs might work better, you'll be able to request the element of a specific order easily without using keyN key:
WITH tran_arrays as (
SELECT
pid
,stid
,item_id
,listagg(trans_id::varchar,',') within group (order by trans_id) as tran_array
FROM "DEMO"
GROUP BY 1,2,3
)
SELECT *
,split_part(tran_array,',',1) as first_element
FROM tran_arrays;
Very similar to the existing Answer however slightly different. This example is also run out of an Oracle Database. I put the work into it and felt like sharing in case it may help someone else out.
/* Oracle Example */
WITH demo_data AS
(
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T1' AS trans_id FROM dual UNION ALL
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T2' AS trans_id FROM dual UNION ALL
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T3' AS trans_id FROM dual UNION ALL
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T4' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T5' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T6' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T7' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T8' AS trans_id FROM dual
)
, transformData AS
(
SELECT pid, stid, item_id, trans_id, rownum AS keyNum FROM demo_data
)
SELECT pid, stid, item_id
, '{'||
LISTAGG(CHR(34)||'key'||keynum||CHR(34)||':'||CHR(34)||trans_id||CHR(34), ' ')
WITHIN GROUP (ORDER BY pid)
||'}' AS trans_id
FROM transformData
GROUP BY pid, stid, item_id
;
Output will look like this:

Find all records NOT in any blocked range where blocked ranges are in a table

I have a table TaggedData with the following fields and data
ID GroupID Tag MyData
** ******* *** ******
1 Texas AA01 Peanut Butter
2 Texas AA15 Cereal
3 Ohio AA05 Potato Chips
4 Texas AA08 Bread
I have a second table of BlockedTags as follows:
ID StartTag EndTag
** ******** ******
1 AA00 AA04
2 AA15 AA15
How do I select from this to return all data matching a given GroupId but NOT in any blocked range (inclusive)? For the data given if the GroupId is Texas, I don't want to return Cereal because it matches the second range. It should only return Bread.
I did try left joins based queries but I'm not even that close.
Thanks
create table TaggedData (
ID int,
GroupID varchar(16),
Tag char(4),
MyData varchar(50))
create table BlockedTags (
ID int,
StartTag char(4),
EndTag char(4)
)
insert into TaggedData(ID, GroupID, Tag, MyData)
values (1, 'Texas', 'AA01', 'Peanut Butter')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (2, 'Texas' , 'AA15', 'Cereal')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (3, 'Ohio ', 'AA05', 'Potato Chips')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (4, 'Texas', 'AA08', 'Bread')
insert into BlockedTags(ID, StartTag, EndTag)
values (1, 'AA00', 'AA04')
insert into BlockedTags(ID, StartTag, EndTag)
values (2, 'AA15', 'AA15')
select t.* from TaggedData t
left join BlockedTags b on t.Tag between b.StartTag and b.EndTag
where b.ID is null
Returns:
ID GroupID Tag MyData
----------- ---------------- ---- --------------------------------------------------
3 Ohio AA05 Potato Chips
4 Texas AA08 Bread
(2 row(s) affected)
So, to match on given GroupID you change the query like that:
select t.* from TaggedData t
left join BlockedTags b on t.Tag between b.StartTag and b.EndTag
where b.ID is null and t.GroupID=#GivenGroupID
I Prefer the NOT EXISTS simply because it gives you more readability, usability and better performance usually in large data (several cases get better execution plans):
would be like this:
SELECT * from TaggedData
WHERE GroupID=#GivenGroupID
AND NOT EXISTS(SELECT 1 FROM BlockedTags WHERE Tag BETWEEN StartTag ANDEndTag)

TSQL Group By with an "OR"?

This query for creating a list of Candidate duplicates is easy enough:
SELECT Count(*), Can_FName, Can_HPhone, Can_EMail
FROM Can
GROUP BY Can_FName, Can_HPhone, Can_EMail
HAVING Count(*) > 1
But if the actual rule I want to check against is FName and (HPhone OR Email) - how can I adjust the GROUP BY to work with this?
I'm fairly certain I'm going to end up with a UNION SELECT here (i.e. do FName, HPhone on one and FName, EMail on the other and combine the results) - but I'd love to know if anyone knows an easier way to do it.
Thank you in advance for any help.
Scott in Maine
Before I can advise anything, I need to know the answer to this question:
name phone email
John 555-00-00 john#example.com
John 555-00-01 john#example.com
John 555-00-01 john-other#example.com
What COUNT(*) you want for this data?
Update:
If you just want to know that a record has any duplicates, use this:
WITH q AS (
SELECT 1 AS id, 'John' AS name, '555-00-00' AS phone, 'john#example.com' AS email
UNION ALL
SELECT 2 AS id, 'John', '555-00-01', 'john#example.com'
UNION ALL
SELECT 3 AS id, 'John', '555-00-01', 'john-other#example.com'
UNION ALL
SELECT 4 AS id, 'James', '555-00-00', 'james#example.com'
UNION ALL
SELECT 5 AS id, 'James', '555-00-01', 'james-other#example.com'
)
SELECT *
FROM q qo
WHERE EXISTS
(
SELECT NULL
FROM q qi
WHERE qi.id <> qo.id
AND qi.name = qo.name
AND (qi.phone = qo.phone OR qi.email = qo.email)
)
It's more efficient, but doesn't tell you where the duplicate chain started.
This query select all entries along with the special field, chainid, that indicates where the duplicate chain started.
WITH q AS (
SELECT 1 AS id, 'John' AS name, '555-00-00' AS phone, 'john#example.com' AS email
UNION ALL
SELECT 2 AS id, 'John', '555-00-01', 'john#example.com'
UNION ALL
SELECT 3 AS id, 'John', '555-00-01', 'john-other#example.com'
UNION ALL
SELECT 4 AS id, 'James', '555-00-00', 'james#example.com'
UNION ALL
SELECT 5 AS id, 'James', '555-00-01', 'james-other#example.com'
),
dup AS (
SELECT id AS chainid, id, name, phone, email, 1 as d
FROM q
UNION ALL
SELECT chainid, qo.id, qo.name, qo.phone, qo.email, d + 1
FROM dup
JOIN q qo
ON qo.name = dup.name
AND (qo.phone = dup.phone OR qo.email = dup.email)
AND qo.id > dup.id
),
chains AS
(
SELECT *
FROM dup do
WHERE chainid NOT IN
(
SELECT id
FROM dup di
WHERE di.chainid < do.chainid
)
)
SELECT *
FROM chains
ORDER BY
chainid
None of these answers is correct. Quassnoi's is a decent approach, but you will notice one fatal flaw in the expressions "qo.id > dup.id" and "di.chainid < do.chainid": comparisons made by ID! This is ALWAYS bad practice because it depends on some inherent ordering in the IDs. IDs should NEVER be given any implicit meaning and should ONLY participate in equality or null testing. You can easily break Quassnoi's solution in this example by simply reordering the IDs in the data.
The essential problem is a disjunctive condition with a grouping, which leads to the possibility of two records being related through an intermediate, though they are not directly relatable.
e.g., you stated these records should all be grouped:
(1) John 555-00-00 john#example.com
(2) John 555-00-01 john#example.com
(3) John 555-00-01 john-other#example.com
You can see that #1 and #2 are relatable, as are #2 and #3, but clearly #1 and #3 are not directly relatable as a group.
This establishes that a recursive or iterative solution is the ONLY possible solution.
So, recursion is not viable since you can easily end up in a looping situation. This is what Quassnoi was trying to avoid with his ID comparisons, but in doing so he broke the algorithm. You could try limiting the levels of recursion, but you may not then complete all relations, and you will still potentially be following loops back upon yourself, leading to excessive data size and prohibitive inefficiency.
The best solution is ITERATIVE: Start a result set by tagging each ID as a unique group ID, and then spin through the result set and update it, combining IDs into the same unique group ID as they match on the disjunctive condition. Repeat the process on the updated set each time until no further updates can be made.
I will create example code for this soon.
GROUP BY doesn't support OR - it's implicitly AND and must include every non-aggregator in the select list.
I assume you also have a unique ID integer as the primary key on this table. If you don't, it's a good idea to have one, for this purpose and many others.
Find those duplicates by a self-join:
select
c1.ID
, c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
, c2.ID
, c2.Can_FName
, c2.Can_HPhone
, c2.Can_Email
from
(
select
min(ID),
Can_FName,
Can_HPhone,
Can_Email
from Can
group by
Can_FName,
Can_HPhone,
Can_Email
) c1
inner join Can c2 on c1.ID < c2.ID
where
c1.Can_FName = c2.Can_FName
and (c1.Can_HPhone = c2.Can_HPhone OR c1.Can_Email = c2.Can_Email)
order by
c1.ID
The query gives you N-1 rows for each N duplicate combinations - if you want just a count along with each unique combination, count the rows grouped by the "left" side:
select count(1) + 1,
, c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
from
(
select
min(ID),
Can_FName,
Can_HPhone,
Can_Email
from Can
group by
Can_FName,
Can_HPhone,
Can_Email
) c1
inner join Can c2 on c1.ID < c2.ID
where
c1.Can_FName = c2.Can_FName
and (c1.Can_HPhone = c2.Can_HPhone OR c1.Can_Email = c2.Can_Email)
group by
c1.Can_FName
, c1.Can_HPhone
, c1.Can_Email
Granted, this is more involved than a union - but I think it illustrates a good way of thinking about duplicates.
Project the desired transformation first from a derived table, then do the aggregation:
SELECT COUNT(*)
, CAN_FName
, Can_HPhoneOrEMail
FROM (
SELECT Can_FName
, ISNULL(Can_HPhone,'') + ISNULL(Can_EMail,'') AS Can_HPhoneOrEMail
FROM Can) AS Can_Transformed
GROUP BY Can_FName, Can_HPhoneOrEMail
HAVING Count(*) > 1
Adjust your 'OR' operation as needed in the derived table project list.
I know this answer will be criticised for the use of the temp table, but it will work anyway:
-- create temp table to give the table a unique key
create table #tmp(
ID int identity,
can_Fname varchar(200) null, -- real type and len here
can_HPhone varchar(200) null, -- real type and len here
can_Email varchar(200) null, -- real type and len here
)
-- just copy the rows where a duplicate fname exits
-- (better performance specially for a big table)
insert into #tmp
select can_fname,can_hphone,can_email
from Can
where can_fname exists in (select can_fname from Can
group by can_fname having count(*)>1)
-- select the rows that have the same fname and
-- at least the same phone or email
select can_Fname, can_Hphone, can_Email
from #tmp a where exists
(select * from #tmp b where
a.ID<>b.ID and A.can_fname = b.can_fname
and (isnull(a.can_HPhone,'')=isnull(b.can_HPhone,'')
or (isnull(a.can_email,'')=isnull(b.can_email,'') )
Try this:
SELECT Can_FName, COUNT(*)
FROM (
SELECT
rank() over(partition by Can_FName order by Can_FName,Can_HPhone) rnk_p,
rank() over(partition by Can_FName order by Can_FName,Can_EMail) rnk_m,
Can_FName
FROM Can
) X
WHERE rnk_p=1 or rnk_m =1
GROUP BY Can_FName
HAVING COUNT(*)>1