SQL Server : group by with corresponding row values - tsql

I need to write a T-SQL group by query for a table with multiple dates and seq columns:
DROP TABLE #temp
CREATE TABLE #temp(
id char(1),
dt DateTime,
seq int)
Insert into #temp values('A','2015-03-31 10:00:00',1)
Insert into #temp values('A','2015-08-31 10:00:00',2)
Insert into #temp values('A','2015-03-31 10:00:00',5)
Insert into #temp values('B','2015-09-01 10:00:00',1)
Insert into #temp values('B','2015-09-01 10:00:00',2)
I want the results to contains only the items A,B with their latest date and the corresponding seq number, like:
id MaxDate CorrespondentSeq
A 2015-08-31 10:00:00.000 2
B 2015-09-01 10:00:00.000 2
I am trying with (the obviously wrong!):
select id, max(dt) as MaxDate, max(seq) as CorrespondentSeq
from #temp
group by id
which returns:
id MaxDate CorrespondentSeq
A 2015-08-31 10:00:00.000 5 <-- 5 is wrong
B 2015-09-01 10:00:00.000 2
How can I achieve that?
EDIT
The dt datetime column has duplicated values (exactly same date!)
I am using SQL Server 2005

You can use a ranking subselect to get only the highest ranked entries for an id:
select id, dt, seq
from (
select id, dt, seq, rank() over (partition by id order by dt desc, seq desc) as r
from #temp
) ranked
where r=1;

SELECT ID, DT, SEQ
FROM (
SELECT ID, DT, SEQ, Row_Number()
OVER (PARTITION BY id ORDER BY dt DESC, seq DESC) AS row_number
FROM temp
) cte
WHERE row_number = 1;
Demo : http://www.sqlfiddle.com/#!3/3e3d5/5

With trial and errors maybe I have found a solution, but I'm not completely sure this is correct:
select A.id, B.dt, max(B.seq)
from (select id, max(dt) as maxDt
from #temp
group by id) as A
inner join #temp as B on A.id = B.id AND A.maxDt = B.dt
group by A.id, B.dt

Select id, dt, seq
From #temp t
where dt = (Select Max(dt) from #temp
Where id = t.Id)
If there are duplicate rows, then you also need to specify what the query processor should use to determine which of the duplicates to return. Say you want the lowest value of seq,
Then you could write:
Select id, dt, seq
From #temp t
where dt = (Select Max(dt) from #temp
Where id = t.Id)
and seq = (Select Min(Seq) from #temp
where id = t.Id
and dt = t.dt)

Related

recursive query to replicate/imitate dense_rank

BEGIN;
CREATE temp TABLE teacher (
name text,
salary numeric
);
INSERT INTO teacher
VALUES ('b1', 90000);
INSERT INTO teacher
VALUES ('f1', 87000);
INSERT INTO teacher
VALUES ('a', 65000),
('b', 90000),
('c', 40000),
('d', 95000),
('e', 60000),
('f', 87000);
COMMIT;
query
with recursive cte as(
(select name, salary, 1 as rn
from teacher order by salary desc limit 1)
union all
select l.* from cte c cross join lateral(
select name, salary, rn + 1 from teacher t
where t.salary < c.salary
order by salary desc
limit 1
) l
)
table cte order by salary desc;
If all salary are distinct,then above mentioned query can imitate as rank/row_number.
I am wondering how to use recursive query to replicate/imitate dense_rank.
related post: https://dba.stackexchange.com/questions/286627/get-top-two-rows-per-group-efficiently

Delete duplicate rows with different values in columns

I didn't find my case on the Internet. Tell me how i can delete duplicates if the values are in different columns.
I have a table with a lot of values, for example:
|Id1|Id2|
|89417980|89417978|
|89417980|89417979|
|89417978|89417980|
|89417979|89417980|
I need to exclude duplicates and leave in the answer only:
|Id1|Id2|
|89417980|89417978|
|89417980|89417979|
min/max does not work here, as the values may be different.
I tried to union/join tables on a table/exclude results with temporary tables, but in the end I come to the beginning.
Assuming id1 and id2 are primary keys columns you could try this
DECLARE #tbl table (id1 int, id2 int )
INSERT INTO #tbl
SELECT 89417980, 89417978
UNION SELECT 89417980, 89417979
UNION SELECT 89417978, 89417980
UNION SELECT 89417979, 89417980
SELECT * FROM #tbl
;WITH CTE AS (--Get comparable value as "cs"
SELECT
IIF(id1 > id2, CHECKSUM(id1, id2), CHECKSUM(id2,id1)) as cs
, id1
, id2
, ROW_NUMBER() OVER (order by id1, id2) as rn
FROM #tbl
)
, CTE2 AS ( --Get rows to keep
SELECT MAX (rn) as rn
FROM CTE
GROUP BY cs
HAVING COUNT(*) > 1
)
DELETE tbl -- Delete all except the rows to keep
FROM #tbl tbl
WHERE NOT EXISTS(SELECT 1
FROM CTE2
JOIN CTE ON CTE.rn = CTE2.rn
WHERE CTE.id1 = tbl.id1
AND CTE.id2 = tbl.id2
)
SELECT * FROM #tbl

postgres hierarchy - count of child levels and sort by date of children or grandchildren

I would like to know how to write a postgres subquery so that the following table example will output what I need.
id parent_id postdate
1   -1 2015-03-10
2     1 2015-03-11 (child level 1)
3     1 2015-03-12 (child level 1)
4     3 2015-03-13 (child level 2)
5    -1 2015-03-14
6    -1 2015-03-15
7     6 2015-03-16 (child level 1)
If I want to sort all the root ids by child level 1 with a count of children(s) from the parent, the output would be something like this
id count  date
6   2    2015-03-15
1   4    2015-03-10
5   1    2015-03-14
The output is sorted by postdate based on the root's child. The 'date' being outputted is the date of the root's postdate. Even though id#5 has a more recent postdate, the rootid#6's child (id#7) has the most recent postdate because it is being sorted by child's postdate. id#5 doesnt have any children so it just gets placed at the end, sorted by date. The 'count' is the number children(child level 1), grandchildren(child level 2) and itself (root). For instance, id #2,#3,#4 all belong to id#1 so for id#1, the count would be 4.
My current subquery thus far:
SELECT p1.id,count(p1.id),p1.postdate
FROM mytable p1
LEFT JOIN mytable c1 ON c1.parent_id = p1.id AND p1.parent_id = -1
LEFT JOIN mytable c2 ON c2.parent_id = c1.id AND p1.parent_id = -1
GROUP BY p1.id,c1.postdate,p1.postdate
ORDER by c1.postdate DESC,p1.postdate DESC
create table mytable ( id serial primary key, parent_id int references mytable, postdate date );
create index mytable_parent_id_idx on mytable (parent_id);
insert into mytable (id, parent_id, postdate) values (1, null, '2015-03-10');
insert into mytable (id, parent_id, postdate) values (2, 1, '2015-03-11');
insert into mytable (id, parent_id, postdate) values (3, 1, '2015-03-12');
insert into mytable (id, parent_id, postdate) values (4, 3, '2015-03-13');
insert into mytable (id, parent_id, postdate) values (5, null, '2015-03-14');
insert into mytable (id, parent_id, postdate) values (6, null, '2015-03-15');
insert into mytable (id, parent_id, postdate) values (7, 6, '2015-03-16');
with recursive recu as (
select id as parent, id as root, null::date as child_postdate
from mytable
where parent_id is null
union all
select r.parent, mytable.id, mytable.postdate
from recu r
join mytable
on parent_id = r.root
)
select m.id, c.cnt, m.postdate, c.max_child_date
from mytable m
join ( select parent, count(*) as cnt, max(child_postdate) as max_child_date
from recu
group by parent
) c on c.parent = m.id
order by c.max_child_date desc nulls last, m.postdate desc;
You'll need a recursive query to count the elements in the subtrees:
WITH RECURSIVE opa AS (
SELECT id AS par
, id AS moi
FROM the_tree
WHERE parent_id IS NULL
UNION ALL
SELECT o.par AS par
, t.id AS moi
FROM opa o
JOIN the_tree t ON t.parent_id = o.moi
)
SELECT t.id
, c.cnt
, t.postdate
FROM the_tree t
JOIN ( SELECT par, COUNT(*) AS cnt
FROM opa o
GROUP BY par
) c ON c.par = t.id
ORDER BY t.id
;
UPDATE (it appears the OP also wants the maxdate per tree)
-- The same, but also select the postdate
-- --------------------------------------
WITH RECURSIVE opa AS (
SELECT id AS par
, id AS moi
, postdate AS postdate
FROM the_tree
WHERE parent_id IS NULL
UNION ALL
SELECT o.par AS par
, t.id AS moi
-- , GREATEST(o.postdate,t.postdate) AS postdate
, t.postdate AS postdate
FROM opa o
JOIN the_tree t ON t.parent_id = o.moi
)
SELECT t.id
, c.cnt
, t.postdate
, c.maxdate
FROM the_tree t
JOIN ( SELECT par, COUNT(*) AS cnt
, MAX(o.postdate) AS maxdate -- and obtain the max()
FROM opa o
GROUP BY par
) c ON c.par = t.id
ORDER BY c.maxdate, t.id
;
After looking at everyone's code, I created the subquery I needed. I can use PHP to vary the 'case when' code depending on the user's sort selection. For instance, the code below will sort the root nodes based on child level 1's postdate.
with recursive cte as (
select id as parent, id as root, null::timestamp as child_postdate,0 as depth
from mytable
where parent_id = -1
union all
select r.parent, mytable.id, mytable.postdate,depth+1
from cte r
join mytable
on parent_id = r.root
)
select m.id, c.cnt, m.postdate
from ssf.dtb_021 m
join ( select parent, count(*) as cnt, max(child_postdate) as max_child_date,depth
from cte
group by parent,depth
) c on c.parent = m.id
order by
case
when depth=2 then 1
when depth=1 then 2
else 0
end DESC,
c.max_child_date desc nulls last, m.postdate desc;
select
p.id,
(1+c.n) as parent_post_plus_number_of_subposts,
p.postdate
from
table as p
inner join
(
select
parent_id, count(*) as n, max(postdate) as _postdate
from table
group by parent_id
) as c
on p.id = c.parent_id
where p.parent_id = -1
order by c._postdate desc

getting distinct rows based on two column values

I am trying to get distinct rows from a temporary table and output them to an aspx page. I am trying to use the value of one column and get the last entry made into that column.
I have been trying to use inner join and max(). However i have been unsuccessful.
Here is the code i have been trying to do it with.
Declare #TempTable table (
viewIcon nvarchar(10),
tenderType nvarchar(20),
diaryIcon int,
customerName nvarchar(100),
projectName nvarchar(100),
diaryEntry nvarchar(max),
diaryDate nvarchar(20),
pid nvarchar(20)
)
insert into #TempTable(
viewIcon,
tenderType,
diaryIcon,
customerName,
projectName,
diaryEntry ,
diaryDate ,
pid
)
select p.viewicon,
p.[Tender Type],
1 diaryicon,
c.[Customer Name],
co.[Last Project],
d.Action,
co.[Diary Date],
p.PID
From Projects2 p Inner Join
(select distinct Pno, max(convert(date,[date of next call],103)) maxdate from ProjectDiary group by Pno
) td on p.PID = td.Pno
Inner Join contacts3 co on co.[Customer Number] = p.[Customer Number]
Inner Join Customers3 c on p.[Customer Number] = c.[Customer Number]
Inner Join ProjectDiary d on td.Pno = d.Pno
Where CONVERT(Date, co.[Diary Date], 103) BETWEEN GETDATE()-120 AND GETDATE()-60
DECLARE #contactsTable TABLE
(pid nvarchar(200),
diaryDate date)
insert into #contactsTable (t.pid, t.diarydate)
select distinct pid as pid, MAX(CONVERT(DATE, diaryDate, 103)) as diaryDate from # TempTable t group by pid
DECLARE #tempContacts TABLE
(pid nvarchar(200))
insert into #tempContacts(pid)
select pid from #contactsTable
DECLARE #tempDiaryDate TABLE (diaryDate date)
insert into #tempDiaryDate(diaryDate)
select distinct MAX(CONVERT(DATE, diaryDate, 103)) from #TempTable
select t.* from #TempTable t inner join (select distinct customerName, M AX(CONVERT(DATE, diaryDate, 103)) AS diaryDate from #TempTable group by customerName) tt on t t.customerName=t.customerName
where t.pid not in
(select Pno from ProjectDiary where convert(date,[Date Of Next Call],103) > GETDATE())
and t.viewIcon <> '098'
and t.viewIcon <> '163'
and t.viewIcon <> '119'
and t.pid in (select distinct pid from #tempContacts)
and CONVERT(DATE, t.diaryDate, 103) in (select distinct CONVERT(DATE, diaryDate, 103) f rom #tempDiaryDate)
order by CONVERT(DATE, tt.diaryDate, 103)
I am trying to get all the distinct customerName's using the max date to determine which record it uses.
Use a subquery. Without going through your entire sql statement, the general idea is:
Select [Stuff]
From table t
Where date = (Select Max(Date) from table
where customer = t.customer)

TSQL difficult join issue

I struggling with a problem I have in TSQL, I need to get the top 10 results for each user from a table that might contain more than 10 results.
My natural (and procedurally minded) approach is "for each user in table T select the top 10 results ordered by date".
Each time I try to formulate the question in my mind in a set based approach, I keep running into the term "foreach".
Is it possible to do something like this:
SELECT *
FROM table AS t1
INNER JOIN (
SELECT TOP 10 *
FROM table AS t2
WHERE t2.id = t1.id
ORDER BY date DESC
)
Or even
SELECT ( SELECT TOP 10 *
FROM table AS t2
WHERE t2.id = t1.id
ORDER BY date )
FROM table AS t1
Or is there another solution to this using temp tables that I should think about?
EDIT:
Just to be perfectly clear - I need to the top 10 results for each user in the table, e.g. 10 * N where N = number of users.
EDIT:
In response to a suggestion made by RBarryYoung, I'm having an issue, which is best demonstrated with code:
CREATE TABLE #temp (id INT, date DATETIME)
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
SELECT *
FROM #temp AS t1
CROSS APPLY (
SELECT TOP 1 *
FROM #temp AS t2
WHERE t2.id = t1.id
ORDER BY t2.date DESC
) AS t2
DROP TABLE #temp
Running this, you can see that this doesn't limit the results to the TOP 1... Am I doing something wrong here?
EDIT:
It seems my last example provided a bit of confusion. Here is an example showing what I want to do:
CREATE TABLE #temp (id INT, date DATETIME)
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (2, GETDATE())
SELECT *
FROM #temp AS t1
CROSS APPLY
(
SELECT TOP 2 *
FROM #temp AS t2
WHERE t2.id = t1.id
ORDER BY t2.date DESC
) AS t2
DROP TABLE #temp
This outputs:
1 2009-08-26 09:05:56.570 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.570 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
2 2009-08-26 09:05:56.583 2 2009-08-26 09:05:56.583
If I use distinct:
SELECT DISTINCT t1.id
FROM #temp AS t1
CROSS APPLY
(
SELECT TOP 2 *
FROM #temp AS t2
WHERE t2.id = t1.id
ORDER BY t2.date DESC
) AS t2
I get
1
2
I need
1
1
2
Does anyone know if this is possible?
EDIT:
The following code will do this
WITH RowTable AS
(
SELECT
id, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS RowNum
FROM #temp
)
SELECT *
FROM RowTable
WHERE RowNum <= 2;
I posted in the comments, but there is no code formatting, so it doesn't look very nice.
Yes, there are several differet good ways to do this in 2005 and 2008. The one most similar to what you are already trying is with CROSS APPLY:
SELECT T2.*
FROM (
SELECT DISTINCT ID FROM table
) AS t1
CROSS APPLY (
SELECT TOP 10 *
FROM table AS t2
WHERE t2.id = t1.id
ORDER BY date DESC
) AS t2
ORDER BY T2.id, date DESC
This then returns the ten most recent entries in [table] (or as many as exist, up to 10), for each distinct [id]. Asumming that [id] corresponds to a user, then this should be exactly what you are asking for.
(edit: slight changes because I did not take into account that T1 and T2 were the same tables and thus there will be multiple duplicate t1.IDs matching multiple duplicate T2.ids.)
select userid, foo, row_number() over (partition by userid order by foo) as rownum from table where rownum <= 10
It is possible, however using nested queries will be slower.
The following will also find the results you are looking for:
SELECT TOP 10 *
FROM table as t1
INNER JOIN table as t2
ON t1.id = t2.id
ORDER BY date DESC
I believe this SO question will answer your question. It's not answering exactly the same question, but I think the solution will work for you too.
Here's a trick I use to do this "top-N-per-group" type of query:
SELECT t1.id
FROM table t1 LEFT OUTER JOIN table t2
ON (t1.user_id = t2.user_id AND (t1.date > t2.date
OR t1.date = t2.date AND t1.id > t2.id))
GROUP BY t1.id
HAVING COUNT(*) < 10
ORDER BY t1.user_id, COALESCE(COUNT(*), 0);