how to group parent-child rows (TSQL) - tsql

I have a SQL table with transactions some are in a parent-child relationship. But the relationship is only determined by a line and a type column. There is no other reference. I have to build this reference.
DECLARE #tmp TABLE (line INT, type CHAR(1), product VARCHAR(30))
INSERT #tmp VALUES
( 1,' ','22411')
,( 2,' ','22413')
,( 3,'P','27050')
,( 4,'C','22492')
,( 5,'C','22493')
,( 6,'C','22490')
,( 7,' ','22410')
,( 8,' ','22511')
,( 9,'P','27051')
,(10,'C','22470')
,(11,'C','22471')
,(12,'C','22473')
,(13,'C','22474')
,(14,' ','22015')
,(15,' ','22167')
,(16,' ','12411')
,(17,' ','22500')
Line 3 is a parent product. Lines 4 to 6 are the child rows.
Line 9 is another parent product. Lines 10 to 13 are the child rows.
Desired output is something like this, where the parent-child lines are grouped:
line|type|product|group
1 | |22411 |1
2 | |22413 |2
3 |P |27050 |3
4 |C |22492 |3
5 |C |22493 |3
6 |C |22490 |3
7 | |22410 |7
8 | |22511 |8
9 |P |27051 |9
10 |C |22470 |9
11 |C |22471 |9
12 |C |22473 |9
13 |C |22474 |9
14 | |22015 |14
15 | |22167 |15
16 | |12411 |16
17 | |22500 |17
How to achieve this without a cursor?

If input is always valid, you can use below query.
SELECT *, RANK() OVER(ORDER BY gid) AS [group]
FROM
(
SELECT *, SUM(CASE WHEN type = 'C' AND (prev = 'P' OR prev = 'C') THEN 0 ELSE 1 END) OVER(ORDER BY line) AS gid
FROM
(
SELECT *, LAG(type) OVER(ORDER BY line) AS prev
FROM #tmp
) AS withPreviouLine
) AS grouped
It can't handle continuous 'C' without 'P'. LAG is added starting with SQL SERVER 2012.

This will work for almost all version of SQL Server, since you didn't specify what version you're using.
But if you're using at least SQL Server 2012, then you can use #qxg's solution, since it's simpler.
Here's the code that you need, albeit it is not a single query, but it gives you the result you want:
CREATE TABLE #tmp (line INT, type CHAR(1), product VARCHAR(30))
INSERT #tmp VALUES ( 1,' ','22411')
,( 2,' ','22413')
,( 3,'P','27050')
,( 4,'C','22492')
,( 5,'C','22493')
,( 6,'C','22490')
,( 7,' ','22410')
,( 8,' ','22511')
,( 9,'P','27051')
,(10,'C','22470')
,(11,'C','22471')
,(12,'C','22473')
,(13,'C','22474')
,(14,' ','22015')
,(15,' ','22167')
,(16,' ','12411')
,(17,' ','22500')
select t1.*
, case
when t1.type = 'C'
and t3.type = ''
then 'G'
when t1.type = 'P' or t1.type = 'C'
then 'G'
else 'N'
end [same_group]
into #tmp2
from #tmp t1
left join #tmp t2 on t1.line = t2.line + 1
left join #tmp t3 on t1.line = t3.line - 1
order by t1.line
select *
, case
when t.type <> ''
then (select max(line)
from #tmp2
where same_group = 'G'
and type = 'P'
and line <= t.line)
else t.line
end [group_id]
from #tmp2 t
order by line
You probably could refactor it to be a single query, but I don't have the time to do so at the moment.

Related

aggregate column of type row

I want to filter a column of rowtype and aggregate rowtypes when they have complement information.
So my data looks like that :
|col1|rowcol |
|----|--------------------------------|
|1 |{col1=2, col2=null, col3=4} |
|1 |{col1=null, col2=3, col3=null} |
|2 |{col1=7, col2=8, col3=null} |
|2 |{col1=null, col2=null, col3=56} |
|3 |{col1=1, col2=3, col3=7} |
Here is some code you can use to have an working example:
select col1, cast(rowcol as row(col1 integer, col2 integer, col3 integer))
from (
values
(1, row(2,null,4)),
(1, row(null,3,null)),
(2, row(7,8,null)),
(2, row(null,null,56)),
(3, row(1,3,7))
)
AS x (col1, rowcol)
I am expecting the result as following:
|col1|rowcol |
|----|-------------------------------|
|1 |{col1=2, col2=3, col3=4} |
|2 |{col1=7, col2=8, col3=56} |
|3 |{col1=1, col2=3, col3=7} |
Maybe someone can help me...
Thanks in advance
You need to group them by col1 and process to merge not nulls, for example using max:
-- sample data
WITH dataset (col1, rowcol) AS (
VALUES
(1, row(2,null,4)),
(1, row(null,3,null)),
(2, row(7,8,null)),
(2, row(null,null,56)),
(3, row(1,3,7))
)
--query
select col1,
cast(row(max(r.col1), max(r.col2), max(r.col3)) as row(col1 integer, col2 integer, col3 integer)) rowcol
from (
select col1,
cast(rowcol as row(col1 integer, col2 integer, col3 integer)) r
from dataset
)
group by col1
order by col1 -- for ordered output
Output:
col1
rowcol
1
{col1=2, col2=3, col3=4}
2
{col1=7, col2=8, col3=56}
3
{col1=1, col2=3, col3=7}

SQL Join multiple table without repetition

I've got 3 tables
Table A
----------------------
| ID| Data1 | Data2 |
---------------------
| 1 |John | 2021 |
| 2 |Steve | 2020 |
Table B
----------------------
|Row|ID|Value1|Value2|
----------------------
|1 |1 |iR3000|0.5 |
|2 |1 |iRC252|0.7 |
|3 |2 |Dr2000|0.4 |
Table C
----------------------
|Row|ID|Value3|Value4|
----------------------
|1 |1 |aaaaaa|12345 |
|2 |1 |bbbbbb|6789 |
My goal is to add a result like this :
-------------------------------------------------
| ID| Data1 | Data2 |Value1|Value2|Value3|Value4|
-------------------------------------------------
| 1 |John | 2021 |iR3000|0.5 |aaaaaa|12345 |
| 1 |John | 2021 |iRC252|0.7 |bbbbbb|6789 |
| 2 |Steve | 2020 |Dr2000|0.4 |null |null |
Actually with my query, the ID 1 is duplicate 4 times.
Here is my query :
SELECT
a.id, a.data1,a.data2
,b.value1, b.value2
,c.value3,c.value4
FROM TableA a
JOIN TableB b
ON b.ID=a.ID
JOIN TableC c
ON c.ID=a.ID
What you had was close; only the JOIN to TableC was wrong. It needs to be an OUTER JOIN and also match on the Row column:
SELECT a.ID, a.Data1, a.Data2, b.Value1, b.Value2, c.Value3, c.Value4
FROM TableA a
INNER JOIN TableB b on b.ID = a.ID
LEFT JOIN TableC c on c.ID = b.ID AND c.Row = b.Row
Update based on the comment:
I cannot use row column cause they are not always match with the same number.
Okay. If the Row column at least exists, we can still work with that to create projections that might be more consistent between tables:
With TableB2 AS (
SELECT *, row_number() over (partition by ID order by row) As Row2
FROM TableB
),
TableC2 As (
SELECT *, row_number() over (partition by ID order by row) As Row2
FROM TableC
)
SELECT a.ID, a.Data1, a.Data2, b.Value1, b.Value2, c.Value3, c.Value4
FROM TableA a
INNER JOIN TableB2 b on b.ID = a.ID
LEFT JOIN TableC2 c on c.ID = b.ID AND c.Row = b.Row
What we cannot do is rely on the order of the records on disk or the insertion order. There MUST be some field to indicate, e.g. the iR3000 row in TableB relates to the aaaaaa row in TableC rather than the bbbbbb row.
The order records appear in the table is not good enough. Databases are based on relational set theory, so what we think of as "Tables" are more-formally defined as "Unordered Relations". Note the word "unordered" in that definition. While table order may seem to be stable over stretches, databases are free to re-ordered the rows on disk after insertion. They can and will do this to make queries more efficient, conform better with indexes, fill up pages, etc.

I need a type of group-sort that I couldn't figure out with ROW_NUMBER on T-SQL

I have a table with a table_id row and 2 other rows. I want type of numbering with row_number function and I want result to seem like this:
id |col1 |col2 |what I want
------------------------------
1 |x |a |1
2 |x |b |2
3 |x |a |3
4 |x |a |3
5 |x |c |4
6 |x |c |4
7 |x |c |4
please consider that;
there's only one x, so "partition by col1" is OK. other than that;
there are two sequences of a's, and they'll be counted seperately
(not 1,2,1,1,3,3,3). and sorting must be by id, not by col2 (so
order by col2 is NOT OK).
I want that number to increase by one anytime col2 changes compared to previous line.
row_number () over (partition by col1 order by col2) DOESN'T WORK. because I want it ordered by id.
Using LAG and a windowed COUNT appears to get you what you are after:
WITH Previous AS(
SELECT V.id,
V.col1,
V.col2,
V.[What I want],
LAG(V.Col2,1,V.Col2) OVER (ORDER BY ID ASC) AS PrevCol2
FROM (VALUES(1,'x','a',1),
(2,'x','b',2),
(3,'x','a',3),
(4,'x','a',3),
(5,'x','c',4),
(6,'x','c',4),
(7,'x','c',4))V(id, col1, col2, [What I want]))
SELECT P.id,
P.col1,
P.col2,
P.[What I want],
COUNT(CASE P.Col2 WHEN P.PrevCol2 THEN NULL ELSE 1 END) OVER (ORDER BY P.ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) +1 AS [What you get]
FROM Previous P;
DB<>Fiddle

Oracle SQL Percent Difference Same Column

Given the following auction data, how would you find the percent difference between a persons most recent and previous bid for a product using Oracle SQL?
The duplicate sequence (SEQ) for person A and B is representative of data I am working with.
An example of your SQL would be very appreciated.
TXN_TIME | SEQ | PERSON | PRODUCT | TRANSACTION | BID |
2017-11-22 15:41:10:0 | 20 | A | 1 | BID | 12 |
2017-11-22 15:35:10:0 | 10C | A | 1 | CXLBID | NULL |
2017-11-22 15:34:25:0 | 10 | A | 1 | BID | 10 |
2017-11-22 15:35:40:0 | 6 | A | 2 | BID | 4 |
2017-11-22 15:34:50:0 | 1C | A | 2 | CXLBID | NULL |
2017-11-22 15:34:20:0 | 1 | A | 2 | BID | 5 |
2017-11-22 15:35:45:0 | 6 | B | 2 | BID | 2 |
2017-11-22 15:34:55:0 | 1C | B | 2 | CXLBID | NULL |
2017-11-22 15:34:25:0 | 1 | B | 2 | BID | 1 |
We could try to use LEAD/LAG analytic functions if they be available. But one approach here would be to use a CTE to identify just the most recent, and immediately prior, bid for each person, and then compare these two values.
WITH cte AS (
SELECT PERSON, BID,
ROW_NUMBER() OVER (PARTITION BY PERSON ORDER BY TXN_TIME DESC) rn
FROM yourTable
WHERE TRANSACTION = 'BID'
)
SELECT
t1.PERSON,
100*(t1.BID - t2.BID) / t2.BID AS BID_PCT_DIFF
FROM cte t1
INNER JOIN cte t2
ON t1.PERSON = t2.PERSON AND
t1.rn = 1 AND t2.rn = 2;
This output looks correct, because person A went from a bid of 4 to 12, which is an increase of 8, or 200%, and person B went from a bid of 1 to 2, which is a 100% increase.
I created a demo below in SQL Server, because I always have difficulties getting Oracle demos to work. But my query is just ANSI SQL and should run the same on either SQL Server or Oracle.
Demo
Good thing you are using Oracle 12. This way you can use the MATCH_RECOGNIZE clause, which is perfect for your problem.
I calculate the CHANGE column in the MATCH_RECOGNIZE clause, using the LAST() function with the optional second argument, which is a logical offset within the set of rows mapped to a specific pattern variable. I format the CHANGE column in the SELECT clause - I use a favorite hack, using the "currency" symbol to attach the percent sign... you can modify the formatting any way you want, without affecting the calculation (which is hidden in the MATCH_RECOGNIZE clause).
with auction_data ( txn_time, seq, person, product, transaction, bid ) as (
select timestamp '2017-11-22 15:41:10', '20' , 'A', 1, 'BID' , 12 from dual union all
select timestamp '2017-11-22 15:35:10', '10C', 'A', 1, 'CXLBID', NULL from dual union all
select timestamp '2017-11-22 15:34:25', '10' , 'A', 1, 'BID' , 10 from dual union all
select timestamp '2017-11-22 15:35:40', '6' , 'A', 2, 'BID' , 4 from dual union all
select timestamp '2017-11-22 15:34:50', '1C' , 'A', 2, 'CXLBID', NULL from dual union all
select timestamp '2017-11-22 15:34:20', '1' , 'A', 2, 'BID' , 5 from dual union all
select timestamp '2017-11-22 15:35:45', '6' , 'B', 2, 'BID' , 2 from dual union all
select timestamp '2017-11-22 15:34:55', '1C' , 'B', 2, 'CXLBID', NULL from dual union all
select timestamp '2017-11-22 15:34:25', '1' , 'B', 2, 'BID' , 1 from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
select txn_time, seq, person, product, transaction, bid,
to_char( 100 * (change - 1), '999D0L', 'nls_currency=''%''') as change
from auction_data
match_recognize(
partition by person, product
order by txn_time
measures case when classifier() = 'B' then bid / last(B.bid, 1) end as change
all rows per match
pattern ( (B|A)* )
define B as B.transaction = 'BID'
);
TXN_TIME SEQ PERSON PRODUCT TRANSACTION BID CHANGE
------------------- --- ------ ---------- ----------- ---------- ----------------
2017-11-22 15:34:25 10 A 1 BID 10
2017-11-22 15:35:10 10C A 1 CXLBID
2017-11-22 15:41:10 20 A 1 BID 12 20.0%
2017-11-22 15:34:20 1 A 2 BID 5
2017-11-22 15:34:50 1C A 2 CXLBID
2017-11-22 15:35:40 6 A 2 BID 4 -20.0%
2017-11-22 15:34:25 1 B 2 BID 1
2017-11-22 15:34:55 1C B 2 CXLBID
2017-11-22 15:35:45 6 B 2 BID 2 100.0%

T-SQL generate sequence from string and count

I need to generate a sequence starting from a CSV string and a maximum count.
When the sequence exceed, I need to start the sequence again and continue until I saturate the COUNT variable
I have the following CSV:
A,B,C,D
In order to get 4 rows out of this CSV I am using XML and the following statement:
SET #xml_csv = N'<root><r>' + replace('A, B, C, D',',','</r><r>') + '</r></root>'
SELECT
REPLACE(t.value('.','varchar(max)'), ' ', '') AS [delimited items]
FROM
#xml_csv.nodes('//root/r') AS a(t)
Now my SELECT returns the following output:
|-------------|
| A |
| B |
| C |
| D |
Assuming I have a #count variable set to 9, I need to output the following:
|--|-----------|
|1 |A |
|2 |B |
|3 |C |
|4 |D |
|5 |A |
|6 |B |
|7 |C |
|8 |D |
|9 |A |
I tried to join a table called master..[spt_values] but I get for a COUNT = 10 10 rows for A, 10 for B and so on, while I need the sequence ordered and repeated until it saturate
Basically you are on the correct path. Joining the split result with a numbers table will get you the correct output.
I've chosen to use a different function for splitting the csv data since it's using a numbers table for the split as well. (taken from this great article)
First, if you don't already have a numbers table, create one. here is the script used in the article I've linked to:
SET NOCOUNT ON;
DECLARE #UpperLimit INT = 1000;
WITH n AS
(
SELECT
x = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
CROSS JOIN sys.all_objects AS s3
)
SELECT Number = x
INTO dbo.Numbers
FROM n
WHERE x BETWEEN 1 AND #UpperLimit;
GO
CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers(Number)
WITH (DATA_COMPRESSION = PAGE);
GO
Then, create the split function:
CREATE FUNCTION dbo.SplitStrings_Numbers
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)
FROM dbo.Numbers
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, LEN(#Delimiter)) = #Delimiter
);
GO
Next step: Join the split results with the numbers table:
DECLARE #Csv varchar(20) = 'A,B,C,D'
SELECT TOP 10 Item
FROM dbo.SplitStrings_Numbers(#Csv, ',')
CROSS JOIN Numbers
ORDER BY Number
Output:
Item
----
A
B
C
D
A
B
C
D
A
B
Great thanks to Aaron Bertrand for sharing his knowledge.