I would like to know how to use a different WHERE clause based on a CASE or IF. I'd prefer a CASE, as the rest of the statement is complex, and I don't like the idea of that complexity being in two places with only a minor difference. However, I know cases are only used for values. I've replicated a simple version of my issue below.
Essentially, I have three tables. The first contains the master information (MasterTable). The second contains a one-to-many relationship belonging to the master table (Table1). The third is a list of selectors indicating which of the records in Table1 are to be used in this instance. I want the most recent record of Table2 to drive what is selected from Table1, with precedence given to SubID over OrderNum.
MasterTable | MasterID, OtherInfo
Table1 | T1UniqueId, MasterID, SubID, Text, OrderNum
Table2 | T2UniqueId, MasterID, SubID, OrderNum, Date
SELECT MasterID, OtherInfo, SubID
FROM MasterTable
OUTER APPLY(
SELECT TOP 1 SubID FROM Table1
WHERE Table1.MasterID=MasterTable.MasterID
CASE
WHEN
(
SELECT TOP 1 SubID FROM Table2
WHERE Table2.MasterID=MasterTable.MasterID
ORDER BY Date DESC
) Is NULL
THEN Table1.OrderNum=
(
SELECT TOP 1 OrderNum
FROM Table2
WHERE Table2.MasterId=MasterTable.MasterId
ORDER BY Date DESC
)
ELSE Table1.SubId=
(
SELECT TOP 1 SubId
FROM Table2
WHERE Table2.MasterId=MasterTable.MasterId
ORDER BY Date DESC
)
END
) SubData
One quick rewrite of this would result in the following:
IF ((SELECT TOP 1 SubID FROM Table2 WHERE Table2.MasterID=MasterTable.MasterID ORDER BY Date DESC) IS NULL)
BEGIN
SELECT
MasterID, OtherInfo, SubID
FROM MasterTable
OUTER APPLY(
SELECT TOP 1 SubID FROM Table1
WHERE
Table1.MasterID=MasterTable.MasterID
AND Table1.OrderNum =
(
SELECT TOP 1 OrderNum
FROM Table2
WHERE Table2.MasterId=MasterTable.MasterId
ORDER BY Date DESC
)
) SubData
END
ELSE
BEGIN
SELECT
MasterID, OtherInfo, SubID
FROM MasterTable
OUTER APPLY(
SELECT TOP 1 SubID FROM Table1
WHERE
Table1.MasterID=MasterTable.MasterID
AND Table1.SubId=
(
SELECT TOP 1 SubId
FROM Table2
WHERE Table2.MasterId=MasterTable.MasterId
ORDER BY Date DESC
)
) SubData
END
But as you noted that makes it look ugly, because you now have that complexity in two places...
I guess you could also formulate it this way (untested, but this should keep your complex logic in one place):
SELECT
MasterID, OtherInfo, SubID
FROM MasterTable
OUTER APPLY(
SELECT TOP 1 SubID FROM Table1
WHERE Table1.MasterID=MasterTable.MasterID
AND
(
(
(
SELECT
TOP 1 SubID
FROM Table2
WHERE Table2.MasterID=MasterTable.MasterID
ORDER BY Date DESC
) IS NULL
AND
Table1.OrderNum =
(
SELECT TOP 1 OrderNum
FROM Table2
WHERE Table2.MasterId=MasterTable.MasterId
ORDER BY Date DESC
)
)
OR
(
Table1.SubId =
(
SELECT
TOP 1 SubId
FROM Table2
WHERE Table2.MasterId=MasterTable.MasterId
ORDER BY Date DESC
)
)
)
) SubData
If SubID and OrderNum in Table1 and Table2 are the same you can utilize simple query with nested select statement:
select m.MasterID, m.OtherInfo, (
select top 1 coalesce(t2.SubID, t2.OrderNum) from Table2 t2
where t2.MasterID = m.MasterID order by date desc
) as SubID
from MasterTable m;
Related
I'm reviewing some of our Redshift queries and found cases with multiple levels of nested select like the one below:
LEFT JOIN
(
SELECT *
FROM (
SELECT
id,
created_at,
min(created_at) OVER (PARTITION BY id, slug) AS transition_date
FROM table
WHERE status = 'cancelled'
GROUP BY id, Y, Z, created_at
)
WHERE created_at = transition_date
) t1 ON b.id = t1.id
if this were MySQL, I would've done something like this to remove one level of nested select:
LEFT JOIN
(
SELECT
id,
created_at,
#tdate := min(created_at) OVER (PARTITION BY id, slug) AS transition_date
FROM table
WHERE status = 'cancelled' and #tdate = bul.created_at
GROUP BY id, Y, Z, created_at
) t1 ON b.id = t1.id
Is it possible to so something similar in RedShift?
--- update
forgot to include GROUP BY in the nested SELECT, which may affect the answer
You can move the condition for the transition_date into the JOIN condition:
LEFT JOIN
(
SELECT
id,
created_at,
min(created_at) OVER (PARTITION BY id, slug) AS transition_date
FROM table
WHERE status = 'cancelled'
) t1 ON b.id = t1.id AND t1.created_at = t1.transition_date
I need to write a T-SQL group by query for a table with multiple dates and seq columns:
DROP TABLE #temp
CREATE TABLE #temp(
id char(1),
dt DateTime,
seq int)
Insert into #temp values('A','2015-03-31 10:00:00',1)
Insert into #temp values('A','2015-08-31 10:00:00',2)
Insert into #temp values('A','2015-03-31 10:00:00',5)
Insert into #temp values('B','2015-09-01 10:00:00',1)
Insert into #temp values('B','2015-09-01 10:00:00',2)
I want the results to contains only the items A,B with their latest date and the corresponding seq number, like:
id MaxDate CorrespondentSeq
A 2015-08-31 10:00:00.000 2
B 2015-09-01 10:00:00.000 2
I am trying with (the obviously wrong!):
select id, max(dt) as MaxDate, max(seq) as CorrespondentSeq
from #temp
group by id
which returns:
id MaxDate CorrespondentSeq
A 2015-08-31 10:00:00.000 5 <-- 5 is wrong
B 2015-09-01 10:00:00.000 2
How can I achieve that?
EDIT
The dt datetime column has duplicated values (exactly same date!)
I am using SQL Server 2005
You can use a ranking subselect to get only the highest ranked entries for an id:
select id, dt, seq
from (
select id, dt, seq, rank() over (partition by id order by dt desc, seq desc) as r
from #temp
) ranked
where r=1;
SELECT ID, DT, SEQ
FROM (
SELECT ID, DT, SEQ, Row_Number()
OVER (PARTITION BY id ORDER BY dt DESC, seq DESC) AS row_number
FROM temp
) cte
WHERE row_number = 1;
Demo : http://www.sqlfiddle.com/#!3/3e3d5/5
With trial and errors maybe I have found a solution, but I'm not completely sure this is correct:
select A.id, B.dt, max(B.seq)
from (select id, max(dt) as maxDt
from #temp
group by id) as A
inner join #temp as B on A.id = B.id AND A.maxDt = B.dt
group by A.id, B.dt
Select id, dt, seq
From #temp t
where dt = (Select Max(dt) from #temp
Where id = t.Id)
If there are duplicate rows, then you also need to specify what the query processor should use to determine which of the duplicates to return. Say you want the lowest value of seq,
Then you could write:
Select id, dt, seq
From #temp t
where dt = (Select Max(dt) from #temp
Where id = t.Id)
and seq = (Select Min(Seq) from #temp
where id = t.Id
and dt = t.dt)
I would like to know how to write a postgres subquery so that the following table example will output what I need.
id parent_id postdate
1 -1 2015-03-10
2 1 2015-03-11 (child level 1)
3 1 2015-03-12 (child level 1)
4 3 2015-03-13 (child level 2)
5 -1 2015-03-14
6 -1 2015-03-15
7 6 2015-03-16 (child level 1)
If I want to sort all the root ids by child level 1 with a count of children(s) from the parent, the output would be something like this
id count date
6 2 2015-03-15
1 4 2015-03-10
5 1 2015-03-14
The output is sorted by postdate based on the root's child. The 'date' being outputted is the date of the root's postdate. Even though id#5 has a more recent postdate, the rootid#6's child (id#7) has the most recent postdate because it is being sorted by child's postdate. id#5 doesnt have any children so it just gets placed at the end, sorted by date. The 'count' is the number children(child level 1), grandchildren(child level 2) and itself (root). For instance, id #2,#3,#4 all belong to id#1 so for id#1, the count would be 4.
My current subquery thus far:
SELECT p1.id,count(p1.id),p1.postdate
FROM mytable p1
LEFT JOIN mytable c1 ON c1.parent_id = p1.id AND p1.parent_id = -1
LEFT JOIN mytable c2 ON c2.parent_id = c1.id AND p1.parent_id = -1
GROUP BY p1.id,c1.postdate,p1.postdate
ORDER by c1.postdate DESC,p1.postdate DESC
create table mytable ( id serial primary key, parent_id int references mytable, postdate date );
create index mytable_parent_id_idx on mytable (parent_id);
insert into mytable (id, parent_id, postdate) values (1, null, '2015-03-10');
insert into mytable (id, parent_id, postdate) values (2, 1, '2015-03-11');
insert into mytable (id, parent_id, postdate) values (3, 1, '2015-03-12');
insert into mytable (id, parent_id, postdate) values (4, 3, '2015-03-13');
insert into mytable (id, parent_id, postdate) values (5, null, '2015-03-14');
insert into mytable (id, parent_id, postdate) values (6, null, '2015-03-15');
insert into mytable (id, parent_id, postdate) values (7, 6, '2015-03-16');
with recursive recu as (
select id as parent, id as root, null::date as child_postdate
from mytable
where parent_id is null
union all
select r.parent, mytable.id, mytable.postdate
from recu r
join mytable
on parent_id = r.root
)
select m.id, c.cnt, m.postdate, c.max_child_date
from mytable m
join ( select parent, count(*) as cnt, max(child_postdate) as max_child_date
from recu
group by parent
) c on c.parent = m.id
order by c.max_child_date desc nulls last, m.postdate desc;
You'll need a recursive query to count the elements in the subtrees:
WITH RECURSIVE opa AS (
SELECT id AS par
, id AS moi
FROM the_tree
WHERE parent_id IS NULL
UNION ALL
SELECT o.par AS par
, t.id AS moi
FROM opa o
JOIN the_tree t ON t.parent_id = o.moi
)
SELECT t.id
, c.cnt
, t.postdate
FROM the_tree t
JOIN ( SELECT par, COUNT(*) AS cnt
FROM opa o
GROUP BY par
) c ON c.par = t.id
ORDER BY t.id
;
UPDATE (it appears the OP also wants the maxdate per tree)
-- The same, but also select the postdate
-- --------------------------------------
WITH RECURSIVE opa AS (
SELECT id AS par
, id AS moi
, postdate AS postdate
FROM the_tree
WHERE parent_id IS NULL
UNION ALL
SELECT o.par AS par
, t.id AS moi
-- , GREATEST(o.postdate,t.postdate) AS postdate
, t.postdate AS postdate
FROM opa o
JOIN the_tree t ON t.parent_id = o.moi
)
SELECT t.id
, c.cnt
, t.postdate
, c.maxdate
FROM the_tree t
JOIN ( SELECT par, COUNT(*) AS cnt
, MAX(o.postdate) AS maxdate -- and obtain the max()
FROM opa o
GROUP BY par
) c ON c.par = t.id
ORDER BY c.maxdate, t.id
;
After looking at everyone's code, I created the subquery I needed. I can use PHP to vary the 'case when' code depending on the user's sort selection. For instance, the code below will sort the root nodes based on child level 1's postdate.
with recursive cte as (
select id as parent, id as root, null::timestamp as child_postdate,0 as depth
from mytable
where parent_id = -1
union all
select r.parent, mytable.id, mytable.postdate,depth+1
from cte r
join mytable
on parent_id = r.root
)
select m.id, c.cnt, m.postdate
from ssf.dtb_021 m
join ( select parent, count(*) as cnt, max(child_postdate) as max_child_date,depth
from cte
group by parent,depth
) c on c.parent = m.id
order by
case
when depth=2 then 1
when depth=1 then 2
else 0
end DESC,
c.max_child_date desc nulls last, m.postdate desc;
select
p.id,
(1+c.n) as parent_post_plus_number_of_subposts,
p.postdate
from
table as p
inner join
(
select
parent_id, count(*) as n, max(postdate) as _postdate
from table
group by parent_id
) as c
on p.id = c.parent_id
where p.parent_id = -1
order by c._postdate desc
I made two queries that I thought should have the same result:
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
WHERE table2.value = '1')
AS result1 ORDER BY id1)
AS result2;
SELECT COUNT(*) FROM (
SELECT DISTINCT ON (id1) id1, value
FROM (
SELECT table1.id1, table2.value
FROM table1
JOIN table2 ON table1.id1=table2.id
)
AS result1 ORDER BY id1)
AS result2
WHERE value = '1';
The only difference being that one had the WHERE clause inside SELECT DISTINCT ON, and the other outside that, but inside SELECT COUNT. But the results were not the same. I don't understand why the position of the WHERE clause should make a difference in this case. Can anyone explain? Or is there a better way to phrase this question?
here's a good way to look at this:
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a;
SELECT DISTINCT ON (id) id, value
FROM (select 1 as id, 1 as value
union
select 1 as id, 2 as value) a
WHERE value = 2;
The problem has to do with the unique conditions and what is visible where. It is behavior by design.
I struggling with a problem I have in TSQL, I need to get the top 10 results for each user from a table that might contain more than 10 results.
My natural (and procedurally minded) approach is "for each user in table T select the top 10 results ordered by date".
Each time I try to formulate the question in my mind in a set based approach, I keep running into the term "foreach".
Is it possible to do something like this:
SELECT *
FROM table AS t1
INNER JOIN (
SELECT TOP 10 *
FROM table AS t2
WHERE t2.id = t1.id
ORDER BY date DESC
)
Or even
SELECT ( SELECT TOP 10 *
FROM table AS t2
WHERE t2.id = t1.id
ORDER BY date )
FROM table AS t1
Or is there another solution to this using temp tables that I should think about?
EDIT:
Just to be perfectly clear - I need to the top 10 results for each user in the table, e.g. 10 * N where N = number of users.
EDIT:
In response to a suggestion made by RBarryYoung, I'm having an issue, which is best demonstrated with code:
CREATE TABLE #temp (id INT, date DATETIME)
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
SELECT *
FROM #temp AS t1
CROSS APPLY (
SELECT TOP 1 *
FROM #temp AS t2
WHERE t2.id = t1.id
ORDER BY t2.date DESC
) AS t2
DROP TABLE #temp
Running this, you can see that this doesn't limit the results to the TOP 1... Am I doing something wrong here?
EDIT:
It seems my last example provided a bit of confusion. Here is an example showing what I want to do:
CREATE TABLE #temp (id INT, date DATETIME)
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (2, GETDATE())
SELECT *
FROM #temp AS t1
CROSS APPLY
(
SELECT TOP 2 *
FROM #temp AS t2
WHERE t2.id = t1.id
ORDER BY t2.date DESC
) AS t2
DROP TABLE #temp
This outputs:
1 2009-08-26 09:05:56.570 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.570 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
2 2009-08-26 09:05:56.583 2 2009-08-26 09:05:56.583
If I use distinct:
SELECT DISTINCT t1.id
FROM #temp AS t1
CROSS APPLY
(
SELECT TOP 2 *
FROM #temp AS t2
WHERE t2.id = t1.id
ORDER BY t2.date DESC
) AS t2
I get
1
2
I need
1
1
2
Does anyone know if this is possible?
EDIT:
The following code will do this
WITH RowTable AS
(
SELECT
id, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS RowNum
FROM #temp
)
SELECT *
FROM RowTable
WHERE RowNum <= 2;
I posted in the comments, but there is no code formatting, so it doesn't look very nice.
Yes, there are several differet good ways to do this in 2005 and 2008. The one most similar to what you are already trying is with CROSS APPLY:
SELECT T2.*
FROM (
SELECT DISTINCT ID FROM table
) AS t1
CROSS APPLY (
SELECT TOP 10 *
FROM table AS t2
WHERE t2.id = t1.id
ORDER BY date DESC
) AS t2
ORDER BY T2.id, date DESC
This then returns the ten most recent entries in [table] (or as many as exist, up to 10), for each distinct [id]. Asumming that [id] corresponds to a user, then this should be exactly what you are asking for.
(edit: slight changes because I did not take into account that T1 and T2 were the same tables and thus there will be multiple duplicate t1.IDs matching multiple duplicate T2.ids.)
select userid, foo, row_number() over (partition by userid order by foo) as rownum from table where rownum <= 10
It is possible, however using nested queries will be slower.
The following will also find the results you are looking for:
SELECT TOP 10 *
FROM table as t1
INNER JOIN table as t2
ON t1.id = t2.id
ORDER BY date DESC
I believe this SO question will answer your question. It's not answering exactly the same question, but I think the solution will work for you too.
Here's a trick I use to do this "top-N-per-group" type of query:
SELECT t1.id
FROM table t1 LEFT OUTER JOIN table t2
ON (t1.user_id = t2.user_id AND (t1.date > t2.date
OR t1.date = t2.date AND t1.id > t2.id))
GROUP BY t1.id
HAVING COUNT(*) < 10
ORDER BY t1.user_id, COALESCE(COUNT(*), 0);