T-SQL : How to obtain the last modified row from a grouping - tsql

I'm working with a database that have a poor design that does not constraint duplicates rows as long as they have a different unique-identifier.
Within one of the table, a given user can have an attribute and a value for the attribute. Normally, a user would only a have a single time the attribute but because of the poor design, I'm getting a lot of duplicates in the table and now I need to clean that mess. This is due to the CRM software not always checking if the row exists when we modify the employee profile but instead it creates a bunch of new rows with duplicates values.
The following query returns the duplicates values:
SELECT ua.ID AS LineID
,ua.Modified AS LineLastModifiedDate
,u.FullName AS EmployeeName
,a.Name AS AttributeName
,ua.value AS AttributeValue
FROM UserAttributes AS ua
INNER JOIN Users AS u ON ua.userid = u.id
INNER JOIN Attributes AS a ON ua.AttributeID = a.ID
WHERE EXISTS (
SELECT NULL
FROM UserAttributes as ua2
WHERE ua2.UserID = ua.UserID
AND ua2.AttributeID = ua.AttributeID
AND ua2.ID != ua.ID
)
And produces results as this:
LineID LineLastModifiedDate EmployeeName AttributeName AttributeValue
------ ----------------------- ------------- --------------- ---------------
15 2016-01-01 Employee1 EmployeeNumber 15
19 2016-07-20 Employee1 EmployeeNumber 15
35 2016-01-01 Employee2 EmployeeSex M
96 2016-07-20 Employee2 EmployeeSex M
21 2016-03-03 Employee1 SickDays 3
99 2016-07-10 Employee1 SickDays 5
What I need to accomplish starting from this query is : ForEach grouping of the same EmployeeName and AttributeName, give me the last modified line expecting results like this :
LineID LineLastModifiedDate EmployeeName AttributeName AttributeValue
------ ----------------------- ------------- --------------- ---------------
19 2016-07-20 Employee1 EmployeeNumber 15
96 2016-07-20 Employee2 EmployeeSex M
99 2016-07-10 Employee1 SickDays 5
How can I modify my query to accomplish this ?
Thank you
-M

;WITH CTE
AS
(
SELECT ua.ID AS LineID
,ua.Modified AS LineLastModifiedDate
,u.FullName AS EmployeeName
,a.Name AS AttributeName
,ua.value AS AttributeValue
,ROW_NUMBER() OVER (PARTITION BY EMPLOYEENAME,EMPLOYEESEX ORDER BY UA.Modified DESC) AS RN
FROM UserAttributes AS ua
INNER JOIN Users AS u ON ua.userid = u.id
INNER JOIN Attributes AS a ON ua.AttributeID = a.ID
WHERE EXISTS (
SELECT NULL
FROM UserAttributes as ua2
WHERE ua2.UserID = ua.UserID
AND ua2.AttributeID = ua.AttributeID
AND ua2.ID != ua.ID
)
)
SELECT * FROM cte where rn=1

You can use row numbering or a scheme as below where you pull out the max value and then use a join. Presumably you can't have ties by date.
select ...
from
UserAttributes as ua
inner join
(
select
UserID, AttributeID,
max(LineLastModifiedDate) as MaxLineLastModifiedDate
fromUserAttributes
group by UserId
) as max_ua
on max_ua.UserID = ua.UserID
and max_ua.AttributeID = max_ua.AttributeID
and max_ua.MaxLineLastModifiedDate = ua.LineLastModifiedDate
...

Related

Postgresql recursive query

I have table with self-related foreign keys and can not get how I can receive firs child or descendant which meet condition. My_table structure is:
id
parent_id
type
1
null
union
2
1
group
3
2
group
4
3
depart
5
1
depart
6
5
unit
7
1
unit
I should for id 1 (union) receive all direct child or first descendant, excluding all groups between first descendant and union. So in this example as result I should receive:
id
type
4
depart
5
depart
7
unit
id 4 because it's connected to union through group with id 3 and group with id 2 and id 5 because it's connected directly to union.
I've tried to write recursive query with condition for recursive part: when parent_id = 1 or parent_type = 'depart' but it doesn't lead to expected result
with recursive cte AS (
select b.id, p.type_id
from my_table b
join my_table p on p.id = b.parent_id
where b.id = 1
union
select c.id, cte.type_id
from my_table c
join cte on cte.id = c.parent_id
where c.parent_id = 1 or cte.type_id = 'group'
)
Here's my interpretation:
if type='group', then id and parent_id are considered in the same group
id#1 and id#2 are in the same group, they're equals
id#2 and id#3 are in the same group, they're equals
id#1, id#2 and id#3 are in the same group
If the above is correct, you want to get all the first descendent of id#1's group. The way to do that:
Get all the ids in the same group with id#1
Get all the first descendants of the above group (type not in ('union', 'group'))
with recursive cte_group as (
select 1 as id
union all
select m.id
from my_table m
join cte_group g
on m.parent_id = g.id
and m.type = 'group')
select mt.id,
mt.type
from my_table mt
join cte_group cg
on mt.parent_id = cg.id
and mt.type not in ('union','group');
Result:
id|type |
--+------+
4|depart|
5|depart|
7|unit |
Sounds like you want to start with the row of id 1, then get its children, and continue recursively on rows of type group. To do that, use
WITH RECURSIVE tree AS (
SELECT b.id, b.type, TRUE AS skip
FROM my_table b
WHERE id = 1
UNION ALL
SELECT c.id, c.type, (c.type = 'group') AS skip
FROM my_table c
JOIN tree p ON c.parent_id = p.id AND p.skip
)
SELECT id, type
FROM tree
WHERE NOT skip

Cascading sum hierarchy using recursive cte

I'm trying to perform recursive cte with postgres but I can't wrap my head around it. In terms of performance issue there are only 50 items in TABLE 1 so this shouldn't be an issue.
TABLE 1 (expense):
id | parent_id | name
------------------------------
1 | null | A
2 | null | B
3 | 1 | C
4 | 1 | D
TABLE 2 (expense_amount):
ref_id | amount
-------------------------------
3 | 500
4 | 200
Expected Result:
id, name, amount
-------------------------------
1 | A | 700
2 | B | 0
3 | C | 500
4 | D | 200
Query
WITH RECURSIVE cte AS (
SELECT
expenses.id,
name,
parent_id,
expense_amount.total
FROM expenses
WHERE expenses.parent_id IS NULL
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
UNION ALL
SELECT
expenses.id,
expenses.name,
expenses.parent_id,
expense_amount.total
FROM cte
JOIN expenses ON expenses.parent_id = cte.id
LEFT JOIN expense_amount ON expense_amount.expense_id = expenses.id
)
SELECT
id,
SUM(amount)
FROM cte
GROUP BY 1
ORDER BY 1
Results
id | sum
--------------------
1 | null
2 | null
3 | 500
4 | 200
You can do a conditional sum() for only the root row:
with recursive tree as (
select id, parent_id, name, id as root_id
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.root_id
from expense c
join tree p on c.parent_id = p.id
)
select e.id,
e.name,
e.root_id,
case
when e.id = e.root_id then sum(ea.amount) over (partition by root_id)
else amount
end as amount
from tree e
left join expense_amount ea on e.id = ea.ref_id
order by id;
I prefer doing the recursive part first, then join the related tables to the result of the recursive query, but you could do the join to the expense_amount also inside the CTE.
Online example: http://rextester.com/TGQUX53703
However, the above only aggregates on the top-level parent, not for any intermediate non-leaf rows.
If you want to see intermediate aggregates as well, this gets a bit more complicated (and is probably not very scalable for large results, but you said your tables aren't that big)
with recursive tree as (
select id, parent_id, name, 1 as level, concat('/', id) as path, null::numeric as amount
from expense
where parent_id is null
union all
select c.id, c.parent_id, c.name, p.level + 1, concat(p.path, '/', c.id), ea.amount
from expense c
join tree p on c.parent_id = p.id
left join expense_amount ea on ea.ref_id = c.id
)
select e.id,
lpad(' ', (e.level - 1) * 2, ' ')||e.name as name,
e.amount as element_amount,
(select sum(amount)
from tree t
where t.path like e.path||'%') as sub_tree_amount,
e.path
from tree e
order by path;
Online example: http://rextester.com/MCE96740
The query builds up a path of all IDs belonging to a (sub)tree and then uses a scalar sub-select to get all child rows belonging to a node. That sub-select is what will make this quite slow as soon as the result of the recursive query can't be kept in memory.
I used the level column to create a "visual" display of the tree structure - this helps me debugging the statement and understanding the result better. If you need the real name of an element in your program you would obviously only use e.name instead of pre-pending it with blanks.
I could not get your query to work for some reason. Here's my attempt that works for the particular table you provided (parent-child, no grandchild) without recursion. SQL Fiddle
--- step 1: get parent-child data together
with parent_child as(
select t.*, amount
from
(select e.id, f.name as name,
coalesce(f.name, e.name) as pname
from expense e
left join expense f
on e.parent_id = f.id) t
left join expense_amount ea
on ea.ref_id = t.id
)
--- final step is to group by id, name
select id, pname, sum(amount)
from
(-- step 2: group by parent name and find corresponding amount
-- returns A, B
select e.id, t.pname, t.amount
from expense e
join (select pname, sum(amount) as amount
from parent_child
group by 1) t
on t.pname = e.name
-- step 3: to get C, D we union and get corresponding columns
-- results in all rows and corresponding value
union
select id, name, amount
from expense e
left join expense_amount ea
on e.id = ea.ref_id
) t
group by 1, 2
order by 1;

SQL Server recursive query with left outer join

I have two tables Customers and Orders with some data.
SELECT * FROM Customers C;
Result:
CustomerId Name
--------------------
1 Shree;
2 Kalpana;
3 Basavaraj;
Query:
select * from Orders O;
Result:
OrderId CustomerId OrderDate
-------------------------------------------------
100 1 2017-01-05 23:16:15.497
200 4 2017-01-06 23:16:15.497
300 3 2017-01-07 23:16:15.497
I have a business requirement where i need to populate data from Customers left outer join Orders in repeated way. I have written below query and desired data.
SELECT *
FROM Customers C
LEFT OUTER JOIN
(SELECT *
FROM Orders
WHERE OrderId = 100) O ON O.CustomerId = C.CustomerId
UNION ALL
SELECT *
FROM Customers C
LEFT OUTER JOIN
(SELECT *
FROM Orders
WHERE OrderId = 200) O ON O.CustomerId = C.CustomerId
UNION ALL
SELECT *
FROM Customers C
LEFT OUTER JOIN
(SELECT *
FROM Orders
WHERE OrderId = 300) O ON O.CustomerId = C.CustomerId;
Desired Result:
CustomerId Name OrderId CustomerId OrderDate
--------------------------------------------------------------------
1 Shree 100 1 2017-01-05 23:16:15.497
2 Kalpana NULL NULL NULL
3 Basavaraj NULL NULL NULL
1 Shree NULL NULL NULL
2 Kalpana NULL NULL NULL
3 Basavaraj NULL NULL NULL
1 Shree NULL NULL NULL
2 Kalpana NULL NULL NULL
3 Basavaraj 300 3 2017-01-07 23:16:15.497
I have one option to put left outer query in loop and pass the OrderId and finally save the result data but that takes lots of time because of high number of records. I want to know the best way to get this done. I have tried function and CTE but no luck so far. Please help.
Many thanks in advance.
A cartesian product can do the job:
SELECT C.*,
OrderId = CASE WHEN C.CustomerId = O.CustomerID THEN O.OrderId ELSE NULL END,
CustomerId = CASE WHEN C.CustomerId = O.CustomerID THEN O.CustomerId ELSE NULL END,
OrderDate = CASE WHEN C.CustomerId = O.CustomerID THEN O.OrderDate ELSE NULL END
FROM Orders O, Customers C
I have got the solution using similar to Cartesian product. Store the CustomerId in table variable and than make Cartesian production with same. This works as i wanted.
declare #CustomerTable TABLE (ID int IDENTITY(1,1) NOT NULL, CustomerId int);
insert into #CustomerTable select distinct CustomerId from orders;
select v.ID,isnull(v.CT_CustomerId,o.CustomerId) as CT_CustomerId,v.CustomerId,v.Name,o.* from
(select CT.ID,CT.CustomerId as CT_CustomerId,C.CustomerId,C.Name from #CustomerTable CT,Customers C ) V
left outer join Orders O ON O.CustomerId = V.CustomerId and V.ID=o.ID

Postgres how to maintain order of rows using CTEs

I have 2 tables
students:
id | name | age
1 abc 20
2 xyz 21
scores:
id | studentid | marks
1 1 20
2 2 22
3 2 20
4 1 22
5 1 20
where studentid is foreign key to students table
When a do
select studentid
from scores
where marks=20;
I get the following result
1, 2, 1
But if want the name of the student name and when I do a join using
select t1.name
from students t1
inner join scores t2 on t1.id = t2.studentid
where t2.marks=20;
I get xyz,abc,abc Though the ouput is correct is there any way I can maintain the order in which scores are listed in the scores table? I should get abc,xyz,abc as output. I tried using subquery as well
SELECT name
FROM students
WHERE ID IN ( select studentid from scores where marks=20) ;
but that also did not give me correct order. How can this be achieved using CTEs (common table expressions)? I tried the follownig cte but it did not work
with cte as(
select t2.id, t1.name
from students t1
inner join scores t2 on t1.id = t2.studentid
where t2.marks=20)
select name from cte order by id
You can order by a column not present in select list:
select t1.name
from students t1
inner join scores t2 on t1.id = t2.student_id
where t2.marks=20
order by t2.id;
name
------
abc
xyz
abc
(3 rows)

DB2 query group by id but with max of date and max of sequence

My table is like
ID FName LName Date(mm/dd/yy) Sequence Value
101 A B 1/10/2010 1 10
101 A B 1/10/2010 2 20
101 X Y 1/2/2010 1 15
101 Z X 1/3/2010 5 10
102 A B 1/10/2010 2 10
102 X Y 1/2/2010 1 15
102 Z X 1/3/2010 5 10
I need a query that should return 2 records
101 A B 1/10/2010 2 20
102 A B 1/10/2010 2 10
that is max of date and max of sequence group by id.
Could anyone assist on this.
-----------------------
-- get me my rows...
-----------------------
select * from myTable t
-----------------------
-- limiting them...
-----------------------
inner join
----------------------------------
-- ...by joining to a subselection
----------------------------------
(select m.id, m.date, max(m.sequence) as max_seq from myTable m inner join
----------------------------------------------------
-- first group on id and date to get max-date-per-id
----------------------------------------------------
(select id, max(date) as date from myTable group by id) y
on m.id = y.id and m.date = y.date
group by id) x
on t.id = x.id
and t.sequence = x.max_seq
Would be a simple solution, which does not take account of ties, nor of rows where sequence is NULL.
EDIT: I've added an extra group to first select max-date-per-id, and then join on this to get max-sequence-per-max-date-per-id before joining to the main table to get all columns.
I have considered your table name as employee..
check the below thing helped you.
select * from employee emp1
join (select Id, max(Date) as dat, max(sequence) as seq from employee group by id) emp2
on emp1.id = emp2.id and emp1.sequence = emp2.seq and emp1.date = emp2.dat
I'm a fan of using the WITH clause in SELECT statements to organize the different steps. I find that it makes the code easier to read.
WITH max_date(max_date)
AS (
SELECT MAX(Date)
FROM my_table
),
max_seq(max_seq)
AS (
SELECT MAX(Sequence)
FROM my_table
WHERE Date = (SELECT md.max_date FROM max_date md)
)
SELECT *
FROM my_table
WHERE Date = (SELECT md.max_date FROM max_date md)
AND Sequence = (SELECT ms.max_seq FROM max_seq ms);
You should be able to optimize this further as needed.