I have a table structured as below
Customer_ID Sequence Comment_Code Comment
1 10 0 a
1 11 1 b
1 12 1 c
1 13 1 d
2 20 0 x
2 21 1 y
3 100 0 m
3 101 1 n
3 102 1 o
1 52 0 t
1 53 1 y
1 54 1 u
Sequence number is the unique number in the table
I want the output in SQL as below
Customer_ID Sequence
1 abcd
2 xy
3 mno
1 tyu
Can someone please help me with this. I can provide more details if required.
enter image description here
This looks like a simple gaps/islands problem.
-- Sample Data
DECLARE #table TABLE
(
Customer_ID INT,
[Sequence] INT,
Comment_Code INT,
Comment CHAR(1)
);
INSERT #table
(
Customer_ID,
[Sequence],
Comment_Code,
Comment
)
VALUES (1,10 ,0,'a'),(1,11 ,1,'b'),(1,12 ,1,'c'),(1,13 ,1,'d'),(2,20 ,0,'x'),(2,21 ,1,'y'),
(3,100,0,'m'),(3,101,1,'n'),(3,102,1,'o'),(1,52 ,0,'t'),(1,53 ,1,'y'),(1,54 ,1,'u');
-- Solution
WITH groups AS
(
SELECT
t.Customer_ID,
Grouper = [Sequence] - DENSE_RANK() OVER (ORDER BY [Sequence]),
t.Comment
FROM #table AS t
)
SELECT
g.Customer_ID,
[Sequence] =
(
SELECT g2.Comment+''
FROM groups AS g2
WHERE g.Customer_ID = g2.Customer_ID AND g.Grouper = g2.Grouper
FOR XML PATH('')
)
FROM groups AS g
GROUP BY g.Customer_ID, g.Grouper;
Returns:
Customer_ID Sequence
----------- ----------
1 abcd
1 tyu
2 xy
3 mno
Related
I have a sql problem (on Redshift) where I need to get the value from column index for each id in column id based on max value in column final_score and put this value in a new column fav_index. score2 equals to the value of score1 where index n = index n + 1, for example, for id = abc1, index = 0 and score1 = 10 the value of score2 will be the value of score1 where index = 1 and the value of final_score is the difference between score1 and score2.
It's easier if you look at below table score. This table score is a result of a sql query which is shown later below.
id index score1 score2 final_score
abc1 0 10 20 10
abc1 1 20 45 25
abc1 2 45 (null) (null)
abc2 0 5 10 5
abc2 1 10 (null) (null)
abc3 0 50 30 -20
abc3 1 30 (null) (null)
So, the resulting table containing column fav_index should look like this:
id index score1 score2 final_score fav_index
abc1 0 10 20 10 0
abc1 1 20 45 25 1
abc1 2 45 (null) (null) 0
abc2 0 5 10 5 0
abc2 1 10 (null) (null) 0
abc3 0 50 30 -20 0
abc3 1 30 (null) (null) 0
Below is the script to generate table score from table story:
select
m.id,
m.index,
max(m.max) as score1,
fmt.score2,
round(fmt.score2 - max(m.max), 1) as final_score
from
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(sv.score1)
from
story as sv
group by
sv.id,
index,
sv.score1
order by
sv.id,
index
) as m
left join
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(score1) as score2
from
story as sv
group by
id,
index
) as fmt
on
m.id = fmt.id
and
m.index = fmt.index - 1
group by
m.id,
m.index,
fmt.score2
Table story is as below:
id story_number score1
abc1 1 10
abc1 2 10
abc1 3 20
abc1 4 20
abc1 5 45
abc1 6 45
The only solution I can think of is to do something like,
select id, max(final_score) from score group by id
and then join it back to the long script above (which was used to generate table score). I really want to avoid writing such a long script to get just 1 extra column of information that I need.
Is there a better way to do this?
Thank you!
Update: answer in mysql is also accepted. thanks!
After spending more hours on this and asking people around, I finally figured out a solution by referring to this window function documentation - PostgreSQL https://www.postgresql.org/docs/9.1/static/tutorial-window.html
I basically added 2 x select statements at the top and 1 x where statement at the very bottom. The where statement is to take care of the rows where final_score = null because otherwise the rank() function will rank them as 1.
My code then becomes:
select
id, index, final_score, rank, case when rank = 1 then index else null end as fav_index
from
(select
id, index, final_score, rank() over (partition by id order by final_score desc)
from
(select
m.id,
m.index,
max(m.max) as score1,
fmt.score2,
round(fmt.score2 - max(m.max), 1) as final_score
from
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(sv.score1)
from
story as sv
group by
sv.id,
index,
sv.score1
order by
sv.id,
index
) as m
left join
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(score1) as score2
from
story as sv
group by
id,
index
) as fmt
on
m.id = fmt.id
and
m.index = fmt.index - 1
group by
m.id,
m.index,
fmt.score2)
where
final_score is not null)
And the result is as follows:
id index final_score rank fav_index
abc1 0 10 2 (null)
abc1 1 25 1 1
abc2 0 5 1 0
abc3 0 -20 1 0
Result is slightly different than what I stated in the question, however, the fav_index for each id is identified and this is what I needed really. Hope this might help someone. Cheers
Suppose I have data formatted in the following way (FYI, total row count is over 30K):
customer_id order_date order_rank
A 2017-02-19 1
A 2017-02-24 2
A 2017-03-31 3
A 2017-07-03 4
A 2017-08-10 5
B 2016-04-24 1
B 2016-04-30 2
C 2016-07-18 1
C 2016-09-01 2
C 2016-09-13 3
I need a 4th column, let's call it days_since_last_order which, in the case where order_rank = 1 then 0 else calculate the number of days since the previous order (with rank n-1).
So, the above would return:
customer_id order_date order_rank days_since_last_order
A 2017-02-19 1 0
A 2017-02-24 2 5
A 2017-03-31 3 35
A 2017-07-03 4 94
A 2017-08-10 5 38
B 2016-04-24 1 0
B 2016-04-30 2 6
C 2016-07-18 1 79
C 2016-09-01 2 45
C 2016-09-13 3 12
Is there an easier way to calculate the above with a window function (or similar) rather than join the entire dataset against itself (eg. on A.order_rank = B.order_rank - 1) and doing the calc?
Thanks!
use the lag window function
SELECT
customer_id
, order_date
, order_rank
, COALESCE(
DATE(order_date)
- DATE(LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date))
, 0)
FROM <table_name>
Is it possible to refer to the current row in a window partition? I want to do something like the following:
SELECT min(ABS(variable - CURRENT.variable)) over (order by criterion RANGE UNBOUNDED PRECEDING)
That is, i want to find in the given partition the variable which is closest to the current value. Is is possible to do something like that?
As an example, from:
criterion | variable
1 2
2 4
3 2
4 7
5 6
We would obtain:
null
2
0
3
1
Thanks
As far as I know, this cannot be done with window functions.
But it can be done with a self join:
SELECT a.id,
a.variable,
min(abs(a.variable - b.variable))
FROM mydata a
LEFT JOIN mydata b
ON (b.criterion < a.criterion)
GROUP BY a.id, a.variable
ORDER BY a.id;
If I understand correctly:
with t (v) as (values (-5),(-2),(0),(1),(3),(10))
select v,
least(
v - lag(v) over (order by v),
lead(v) over (order by v) - v
) as closest
from t
;
v | closest
----+---------
-5 | 3
-2 | 2
0 | 1
1 | 1
3 | 2
10 | 7
Hope this could help you (pay attention for performance problems).
I tried this in MSSQL (at bottom you'll find POSTGRESQL version):
CREATE TABLE TX (CRITERION INT, VARIABILE INT);
INSERT INTO TX VALUES (1,2), (2,4),(3,2),(4,7), (5,6);
SELECT CRITERION, MIN_DELTA FROM
(
SELECT TX.CRITERION
, MIN(ABS(B.TX2_VAR - TX.VARIABILE)) OVER (PARTITION BY TX.CRITERION) AS MIN_DELTA
, RANK() OVER (PARTITION BY TX.CRITERION ORDER BY ABS(B.TX2_VAR - TX.VARIABILE) ) AS MIN_RANK
FROM TX
CROSS APPLY (SELECT TX2.CRITERION AS TX2_CRIT, TX2.VARIABILE AS TX2_VAR FROM TX TX2 WHERE TX2.CRITERION < TX.CRITERION) B
) C
WHERE MIN_RANK=1
ORDER BY CRITERION
;
Output:
CRITERION MIN_DELTA
----------- -----------
2 2
3 0
4 3
5 1
POSTGRESQL Version (tested on Rextester http://rextester.com/VMGJ87600):
CREATE TABLE TX (CRITERION INT, VARIABILE INT);
INSERT INTO TX VALUES (1,2), (2,4),(3,2),(4,7), (5,6);
SELECT * FROM TX;
SELECT CRITERION, MIN_DELTA FROM
(
SELECT TX.CRITERION
, MIN(ABS(B.TX2_VAR - TX.VARIABILE)) OVER (PARTITION BY TX.CRITERION) AS MIN_DELTA
, RANK() OVER (PARTITION BY TX.CRITERION ORDER BY ABS(B.TX2_VAR - TX.VARIABILE) ) AS MIN_RANK
FROM TX
LEFT JOIN LATERAL (SELECT TX2.CRITERION AS TX2_CRIT, TX2.VARIABILE AS TX2_VAR FROM TX TX2 WHERE TX2.CRITERION < TX.CRITERION) B ON TRUE
) C
WHERE MIN_RANK=1
ORDER BY CRITERION
;
DROP TABLE TX;
Output:
criterion variabile
1 1 2
2 2 4
3 3 2
4 4 7
5 5 6
criterion min_delta
1 1 NULL
2 2 2
3 3 0
4 4 3
5 5 1
Think of a table like below:
unique_id
a_column
b_column
a_int
b_int
date_created
Let's say data is like:
-unique_id -a_column -b_column -a_int -b_int -date_created
1z23 abc 444 0 1 27.12.2016 18:03:00
2c31 abc 444 0 0 26.12.2016 13:40:00
2e22 qwe 333 0 1 28.12.2016 15:45:00
1b11 qwe 333 1 1 27.12.2016 19:00:00
3a33 rte 333 0 1 15.11.2016 11:00:00
4d44 rte 333 0 1 27.09.2016 18:00:00
6e66 irt 333 0 1 22.12.2016 13:00:00
7q77 aaa 555 1 0 27.12.2016 18:00:00
I want to get the unique_id s where b_int is 1, b_column is 333 and considering a_column, a_int column must always be 0, if there are any records with a_int = 1 even if there are records with a_int = 0 these records must not be shown in the result. Desired result is: " 3a33 , 6e66 " when grouped by a_column and ordered by date_created and got top1 for each unique a_column.
I tried lots of "with ties" and "over(partition by" samples, searched questions, but couldn't manage to do it. This is what I could do:
select unique_id
from the_table
where b_column = '333'
and b_int = 1
and a_column in (select a_column
from the_table
where b_column = '333'
and b_int = 1
group by a_column
having sum(a_int) = 0)
order by date_created desc;
This query returns the result like this " 3a33 ,4d44, 6e66 ". But I don't want "4d44".
You were on the right track with the partitions and window functions. This solution uses ROW_NUMBER to assign a value to the a_column so we can see where there is more than 1. The 1 is the most recent date_created. Then you select from the result set where the row_counter is 1.
;WITH CTE
AS (
SELECT unique_id
, a_column
, ROW_NUMBER() OVER (
PARTITION BY a_column ORDER BY date_created DESC
) AS row_counter --This assigns a 1 to the most recent date_created and partitions by a_column
FROM #test
WHERE a_column IN (
SELECT a_column
FROM #test
WHERE b_column = '333'
AND b_int = 1
GROUP BY a_column
HAVING MAX(a_int) < 1
)
)
SELECT unique_ID
FROM cte
WHERE row_counter = 1
My table has a parent/child relationship, along the lines of parent.id,id. There is also a column that contains a quantity, and another ID representing a grand-parent, like so:
id parent.id qty Org
1 1 1 100
2 1 0 100
3 1 4 100
4 4 1 101
5 4 2 101
6 6 1 102
7 6 0 102
8 6 1 102
What this is supposed to show is ID 1 is the parent, and ID 2 and 3 are children which belongs to ID 1, and ID 1, 2, and 3 all belong to the grandparent 100.
I would like to know if any child or parent has QTY = 0, what are all the other id's associated to that parent, and what are all the other parents associated with that grandparent?
For example, I would want to see a report that shows me this:
Org id parent.id qty
100 1 1 1
100 2 1 0
100 3 1 4
102 6 6 1
102 7 6 0
102 8 6 1
Much appreciate any help you can offer to build a MS SQL 2000 (yeah, I know) query to handle this.
Try this
select * from tablename a
where exists (select 1 from tablename x
where x.parent_id = a.parent_id and qty = 0)
Example:
;with cte as
( select 1 id,1 parent_id, 1 qty, 100 org
union all select 2,1,0,100
union all select 3,1,4,100
union all select 4,4,1,101
union all select 5,4,2,101
union all select 6,6,1,102
union all select 7,6,0,102
union all select 8,6,1,102
)
select * from cte a
where exists (select 1 from cte x
where x.parent_id = a.parent_id and qty = 0)
SQL DEMO HERE