T-SQL - Several groupings with computation

T-SQL - Several groupings with computation - tsql

I need assistance on how I can come up with a query wherein the Table 1 and table 2 will be joined to perform calculation. I have 'cursor' in mind but I am having trouble on conceptualizing. Some kickstart will be a huge help.
basically what I need is something like this:
Table 1:
Rep_Date NumID NumValue Score Period
1/10/2015 1 161 4 Q1
1/11/2015 1 167 2 Q1
1/12/2015 1 95 1 Q1
1/01/2016 1 150 1 Q2
1/02/2016 1 100 2 Q2
1/03/2016 1 600 5 Q2
1/10/2015 38 1 1 Q1
1/11/2015 38 1 2 Q1
1/12/2015 38 1 2 Q1
1/01/2016 38 1 1 Q2
1/02/2016 38 1 2 Q2
1/03/2016 38 1 4 Q2
1/10/2015 113 5 3 Q1
1/11/2015 113 2 4 Q1
1/12/2015 113 8 1 Q1
1/01/2016 113 11 4 Q2
1/02/2016 113 1 5 Q2
1/03/2016 113 5 3 Q2
Table 2
NumID CalculationType
1 SUM
38 SUM
113 AVG
Expected Result:
Rep_Date NumID Result Period
1/10/2015 1 7 Q1
1/01/2016 1 8 Q2
1/10/2015 38 5 Q1
1/01/2016 38 7 Q2
1/10/2015 113 2.67 Q1
1/01/2016 113 4 Q2
Join by NumID to get the calculation type.
Use 'Score' field to derive the result.
Group by NumID, Period

This should work:
declare #t1 table (Rep_Date varchar(50), numId int, numValue int, score int, period varchar(50));
insert into #t1 values
('1/10/2015','1','161','4','Q1'),
('1/11/2015','1','167','2','Q1'),
('1/12/2015','1','95','1','Q1'),
('1/01/2016','1','150','1','Q2'),
('1/02/2016','1','100','2','Q2'),
('1/03/2016','1','600','5','Q2'),
('1/10/2015','38','1','1','Q1'),
('1/11/2015','38','1','2','Q1'),
('1/12/2015','38','1','2','Q1'),
('1/01/2016','38','1','1','Q2'),
('1/02/2016','38','1','2','Q2'),
('1/03/2016','38','1','4','Q2'),
('1/10/2015','113','5','3','Q1'),
('1/11/2015','113','2','4','Q1'),
('1/12/2015','113','8','1','Q1'),
('1/01/2016','113','11','4','Q2'),
('1/02/2016','113','1','5','Q2'),
('1/03/2016','113','5','3','Q2');
declare #t2 table (numId int, CalculationType varchar(50));
insert into #t2 values
('1' ,'SUM'),
('38' ,'SUM'),
('113' ,'AVG')
select
min(t.rep_date) Rep_Date
,t.numId
,case when max(t2.CalculationType) = 'SUM' then sum(t.score)
when max(t2.CalculationType) = 'AVG' then round(avg(1.0 * t.score), 2)
else 0
end Result
,t.period
from
#t1 t
join #t2 t2 on t.numId = t2.numId
group by
t.period, t.numId
order by
numId, period
Output

Related

Unpivot in postgres with a column created in the same query

I am trying to unpivot a table with PostgreSQL as described here.
My problem is that I am creating a new column in my query which I want to use in my cross join lateral statement (which results in an SQL error because the original table does not have this column).
ORIGINAL QUESTION:
select
"Name",
case
when "Year"='2020' then "Date"
end as "Baseline"
from "test_table"
EDIT: I am using the example from the referred StackOverflow question:
create table customer_turnover
(
customer_id integer,
q1 integer,
q2 integer,
q3 integer,
q4 integer
);
INSERT INTO customer_turnover VALUES
(1, 100, 210, 203, 304);
INSERT INTO customer_turnover VALUES
(2, 150, 118, 422, 257);
INSERT INTO customer_turnover VALUES
(3, 220, 311, 271, 269);
INSERT INTO customer_turnover VALUES
(3, 320, 211, 171, 269);
select * from customer_turnover;
creates the following output
customer_id q1 q2 q3 q4
1 100 210 203 304
2 150 118 422 257
3 220 311 271 269
3 320 211 171 269
(I used the customer_id 3 twice because this column is not unique)
Essentially, what I would like to do is the following: I would like to calculate a new column qsum:
select customer_id, q1, q2, q3, q4,
q1+q2+q3+q4 as qsum
from customer_turnover
and use this additional column in my unpivoting statement to produce the following output:
customer_id turnover quarter
1 100 Q1
1 210 Q2
1 203 Q3
1 304 Q4
1 817 qsum
2 150 Q1
2 118 Q2
2 422 Q3
2 257 Q4
2 947 qsum
3 220 Q1
3 311 Q2
3 271 Q3
3 269 Q4
3 1071 qsum
3 320 Q1
3 211 Q2
3 171 Q3
3 269 Q4
3 971 qsum
As I do not want to have qsum in my final output, I understand that I cannot use it in my select statement, but even if I would use it like this
select customer_id, t.*, q1, q2, q3, q4,
q1+q2+q3+q4 as qsum
from customer_turnover c
cross join lateral (
values
(c.q1, 'Q1'),
(c.q2, 'Q2'),
(c.q3, 'Q3'),
(c.q4, 'Q4'),
(c.qsum, 'Qsum')
) as t(turnover, quarter)
I receive the following SQL error: ERROR: column c.qsum does not exist
How can I produce my desired output?

Not sure to well understand your issue, maybe a subquery can help :
select s.baseline
from
( select
"Name",
case
when "Year"='2020' then "Date"
end as "Baseline"
from "test_table"
) AS s

Pivot Table in SQL (using Groupby)

I have a table structured as below
Customer_ID Sequence Comment_Code Comment
1 10 0 a
1 11 1 b
1 12 1 c
1 13 1 d
2 20 0 x
2 21 1 y
3 100 0 m
3 101 1 n
3 102 1 o
1 52 0 t
1 53 1 y
1 54 1 u
Sequence number is the unique number in the table
I want the output in SQL as below
Customer_ID Sequence
1 abcd
2 xy
3 mno
1 tyu
Can someone please help me with this. I can provide more details if required.
enter image description here

This looks like a simple gaps/islands problem.
-- Sample Data
DECLARE #table TABLE
(
Customer_ID INT,
[Sequence] INT,
Comment_Code INT,
Comment CHAR(1)
);
INSERT #table
(
Customer_ID,
[Sequence],
Comment_Code,
Comment
)
VALUES (1,10 ,0,'a'),(1,11 ,1,'b'),(1,12 ,1,'c'),(1,13 ,1,'d'),(2,20 ,0,'x'),(2,21 ,1,'y'),
(3,100,0,'m'),(3,101,1,'n'),(3,102,1,'o'),(1,52 ,0,'t'),(1,53 ,1,'y'),(1,54 ,1,'u');
-- Solution
WITH groups AS
(
SELECT
t.Customer_ID,
Grouper = [Sequence] - DENSE_RANK() OVER (ORDER BY [Sequence]),
t.Comment
FROM #table AS t
)
SELECT
g.Customer_ID,
[Sequence] =
(
SELECT g2.Comment+''
FROM groups AS g2
WHERE g.Customer_ID = g2.Customer_ID AND g.Grouper = g2.Grouper
FOR XML PATH('')
)
FROM groups AS g
GROUP BY g.Customer_ID, g.Grouper;
Returns:
Customer_ID Sequence
----------- ----------
1 abcd
1 tyu
2 xy
3 mno

Redshift - Get a value from one column A for each ID in the grouping ID column B based on max value in another column C

I have a sql problem (on Redshift) where I need to get the value from column index for each id in column id based on max value in column final_score and put this value in a new column fav_index. score2 equals to the value of score1 where index n = index n + 1, for example, for id = abc1, index = 0 and score1 = 10 the value of score2 will be the value of score1 where index = 1 and the value of final_score is the difference between score1 and score2.
It's easier if you look at below table score. This table score is a result of a sql query which is shown later below.
id index score1 score2 final_score
abc1 0 10 20 10
abc1 1 20 45 25
abc1 2 45 (null) (null)
abc2 0 5 10 5
abc2 1 10 (null) (null)
abc3 0 50 30 -20
abc3 1 30 (null) (null)
So, the resulting table containing column fav_index should look like this:
id index score1 score2 final_score fav_index
abc1 0 10 20 10 0
abc1 1 20 45 25 1
abc1 2 45 (null) (null) 0
abc2 0 5 10 5 0
abc2 1 10 (null) (null) 0
abc3 0 50 30 -20 0
abc3 1 30 (null) (null) 0
Below is the script to generate table score from table story:
select
m.id,
m.index,
max(m.max) as score1,
fmt.score2,
round(fmt.score2 - max(m.max), 1) as final_score
from
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(sv.score1)
from
story as sv
group by
sv.id,
index,
sv.score1
order by
sv.id,
index
) as m
left join
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(score1) as score2
from
story as sv
group by
id,
index
) as fmt
on
m.id = fmt.id
and
m.index = fmt.index - 1
group by
m.id,
m.index,
fmt.score2
Table story is as below:
id story_number score1
abc1 1 10
abc1 2 10
abc1 3 20
abc1 4 20
abc1 5 45
abc1 6 45
The only solution I can think of is to do something like,
select id, max(final_score) from score group by id
and then join it back to the long script above (which was used to generate table score). I really want to avoid writing such a long script to get just 1 extra column of information that I need.
Is there a better way to do this?
Thank you!
Update: answer in mysql is also accepted. thanks!

After spending more hours on this and asking people around, I finally figured out a solution by referring to this window function documentation - PostgreSQL https://www.postgresql.org/docs/9.1/static/tutorial-window.html
I basically added 2 x select statements at the top and 1 x where statement at the very bottom. The where statement is to take care of the rows where final_score = null because otherwise the rank() function will rank them as 1.
My code then becomes:
select
id, index, final_score, rank, case when rank = 1 then index else null end as fav_index
from
(select
id, index, final_score, rank() over (partition by id order by final_score desc)
from
(select
m.id,
m.index,
max(m.max) as score1,
fmt.score2,
round(fmt.score2 - max(m.max), 1) as final_score
from
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(sv.score1)
from
story as sv
group by
sv.id,
index,
sv.score1
order by
sv.id,
index
) as m
left join
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(score1) as score2
from
story as sv
group by
id,
index
) as fmt
on
m.id = fmt.id
and
m.index = fmt.index - 1
group by
m.id,
m.index,
fmt.score2)
where
final_score is not null)
And the result is as follows:
id index final_score rank fav_index
abc1 0 10 2 (null)
abc1 1 25 1 1
abc2 0 5 1 0
abc3 0 -20 1 0
Result is slightly different than what I stated in the question, however, the fav_index for each id is identified and this is what I needed really. Hope this might help someone. Cheers

PostgreSQL window function & difference between dates

Suppose I have data formatted in the following way (FYI, total row count is over 30K):
customer_id order_date order_rank
A 2017-02-19 1
A 2017-02-24 2
A 2017-03-31 3
A 2017-07-03 4
A 2017-08-10 5
B 2016-04-24 1
B 2016-04-30 2
C 2016-07-18 1
C 2016-09-01 2
C 2016-09-13 3
I need a 4th column, let's call it days_since_last_order which, in the case where order_rank = 1 then 0 else calculate the number of days since the previous order (with rank n-1).
So, the above would return:
customer_id order_date order_rank days_since_last_order
A 2017-02-19 1 0
A 2017-02-24 2 5
A 2017-03-31 3 35
A 2017-07-03 4 94
A 2017-08-10 5 38
B 2016-04-24 1 0
B 2016-04-30 2 6
C 2016-07-18 1 79
C 2016-09-01 2 45
C 2016-09-13 3 12
Is there an easier way to calculate the above with a window function (or similar) rather than join the entire dataset against itself (eg. on A.order_rank = B.order_rank - 1) and doing the calc?
Thanks!

use the lag window function
SELECT
customer_id
, order_date
, order_rank
, COALESCE(
DATE(order_date)
- DATE(LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date))
, 0)
FROM <table_name>

Extract Unique Time Slices in Oracle

I use Oracle 10g and I have a table that stores a snapshot of data on a person for a given day. Every night an outside process adds new rows to the table for any person whose had any changes to their core data (stored elsewhere). This allows a query to be written using a date to find out what a person 'looked' like on some past day. A new row is added to the table even if only a single aspect of the person has changed--the implication being that many columns have duplicate values from slice to slice since not every detail changed in each snapshot.
Below is a data sample:
SliceID PersonID StartDt Detail1 Detail2 Detail3 Detail4 ...
1 101 08/20/09 Red Vanilla N 23
2 101 08/31/09 Orange Chocolate N 23
3 101 09/15/09 Yellow Chocolate Y 24
4 101 09/16/09 Green Chocolate N 24
5 102 01/10/09 Blue Lemon N 36
6 102 01/11/09 Indigo Lemon N 36
7 102 02/02/09 Violet Lemon Y 36
8 103 07/07/09 Red Orange N 12
9 104 01/31/09 Orange Orange N 12
10 104 10/20/09 Yellow Orange N 13
I need to write a query that pulls out time slices records where some pertinent bits, not the whole record, have changed. So, referring to the above, if I only want to know the slices in which Detail3 has changed from its previous value, then I would expect to only get rows having SliceID 1, 3 and 4 for PersonID 101 and SliceID 5 and 7 for PersonID 102 and SliceID 8 for PersonID 103 and SliceID 9 for PersonID 104.
I'm thinking I should be able to use some sort of Oracle Hierarchical Query (using CONNECT BY [PRIOR]) to get what I want, but I have not figured out how to write it yet. Perhaps YOU can help.
Thanks you for your time and consideration.

Here is my take on the LAG() solution, which is basically the same as that of egorius, but I show my workings ;)
SQL> select * from
2 (
3 select sliceid
4 , personid
5 , startdt
6 , detail3 as new_detail3
7 , lag(detail3) over (partition by personid
8 order by startdt) prev_detail3
9 from some_table
10 )
11 where prev_detail3 is null
12 or ( prev_detail3 != new_detail3 )
13 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
8 103 07-JUL-09 N
9 104 31-JAN-09 N
7 rows selected.
SQL>
The point about this solution is that it hauls in results for 103 and 104, who don't have slice records where detail3 has changed. If that is a problem we can apply an additional filtration, to return only rows with changes:
SQL> with subq as (
2 select t.*
3 , row_number () over (partition by personid
4 order by sliceid ) rn
5 from
6 (
7 select sliceid
8 , personid
9 , startdt
10 , detail3 as new_detail3
11 , lag(detail3) over (partition by personid
12 order by startdt) prev_detail3
13 from some_table
14 ) t
15 where t.prev_detail3 is null
16 or ( t.prev_detail3 != t.new_detail3 )
17 )
18 select sliceid
19 , personid
20 , startdt
21 , new_detail3
22 , prev_detail3
23 from subq sq
24 where exists ( select null from subq x
25 where x.personid = sq.personid
26 and x.rn > 1 )
27 order by sliceid
28 /
SLICEID PERSONID STARTDT N P
---------- ---------- --------- - -
1 101 20-AUG-09 N
3 101 15-SEP-09 Y N
4 101 16-SEP-09 N Y
5 102 10-JAN-09 N
7 102 02-FEB-09 Y N
SQL>
edit
As egorius points out in the comments, the OP does want hits for all users, even if they haven't changed, so the first version of the query is the correct solution.

In addition to OMG Ponies' answer: if you need to query slices for all persons, you'll need partition by:
SELECT s.sliceid
, s.personid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (
PARTITION BY t.personid ORDER BY t.startdt
) prev_val
FROM t) s
WHERE (s.prev_val IS NULL OR s.prev_val != s.detail3)

I think you'll have better luck with the LAG function:
SELECT s.sliceid
FROM (SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t) s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)
Subquery Factoring alternative:
WITH slices AS (
SELECT t.sliceid,
t.personid,
t.detail3,
LAG(t.detail3) OVER (PARTITION BY t.personid ORDER BY t.startdt) 'prev_val'
FROM TABLE t)
SELECT s.sliceid
FROM slices s
WHERE s.personid = 101
AND (s.prev_val IS NULL OR s.prev_val != s.detail3)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse