Recursive CTE using function in recursive member - tsql

I need to created a list with all the subarticle (from all levels) of a given article.
To retrieve only the children article of a parent article I use a function, GetArticleGroup(ArticleId, Index).
Executing the function would return something like this:
TRANS_ID ARTICLE_GROUP_ID ARTICLE_TRANS_ID PIECES SEQUENCE_NO DESCRIPTION IS_CHANGED ART_INDEX
1 55 56 1 1 0 A
2 55 57 1 2 0 A
This is what I tried with no success:
WITH Subarticles AS (
SELECT ARTICLE_TRANS_ID, ART_INDEX, cast(ARTICLE_TRANS_ID as varchar(max)) levels FROM dbo.GetArticleGroup(55, 'A') AG
UNION ALL
SELECT S.ARTICLE_TRANS_ID, S.ART_INDEX, S.levels+','+cast(S.ARTICLE_TRANS_ID as varchar(max)) levels FROM Subarticles S
CROSS APPLY
GetArticleGroup(S.ARTICLE_TRANS_ID, S.ART_INDEX) g where CHARINDEX(CAST(g.ARTICLE_TRANS_ID AS varchar(max)), s.levels)=0
)
SELECT top 30 ARTICLE_TRANS_ID, ART_INDEX, levels FROM Subarticles OPTION (maxrecursion 1000)
Expectation:
56, A
57, A
58, A
What I get instead, infinite loop:
56, A
57, A
57, A
...
57, A
Thank you,

Related

Taking N-samples from each group in PostgreSQL

I have a table containing data that has a column named id that looks like below:
id
value 1
value 2
value 3
1
244
550
1000
1
251
551
700
1
540
60
1200
...
...
...
...
2
19
744
2000
2
10
903
100
2
44
231
600
2
120
910
1100
...
...
...
...
I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points.
For example I would like a maximum 50 data points randomly selected from id = 1, id = 2 etc...
I cannot find any previous questions similar to this but have tried taking a stab at at least logically working through the solution where I could iterate and union all queries by id and limit to 50:
SELECT * FROM (SELECT * FROM schema.table AS tbl WHERE tbl.id = X LIMIT 50) UNION ALL;
But it's obvious that you cannot use this type of solution because UNION ALL requires aggregating outputs from one id to the next and I do not have a list of id values to use in place of X in tbl.id = X.
Is there a way to accomplish this by gathering that list of unique id values and union all results or is there a more optimal way this could be done?
If you want to select a random sample for each id, then you need to randomize the rows somehow. Here is a way to do it:
select * from (
select *, row_number() over (partition by id order by random()) as u
from schema.table
) as a
where u <= 50;
Example (limiting to 3, and some row number for each id so you can see the selection randomness):
setup
DROP TABLE IF EXISTS foo;
CREATE TABLE foo
(
id int,
value1 int,
idrow int
);
INSERT INTO foo
select 1 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 2 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 3 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow;
Selection
select * from (
select *, row_number() over (partition by id order by random()) as u
from foo
) as a
where u <= 3;
Output:
id
value1
idrow
u
1
542
6
1
1
24
86
2
1
155
74
3
2
505
95
1
2
100
46
2
2
422
33
3
3
966
88
1
3
747
89
2
3
664
19
3
In case you are looking to get 50 (or less) from each group of IDs then you can use windowing -
From question - "I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points."
Query -
with data as (
select row_number() over (partition by id order by random()) rn,
* from table_name)
select * from data where rn<=50 order by id;
Fiddle.
Your description of trying to get the UNION ALL without specifying all the branches ahead of time is aiming for a LATERAL join. And that is one way to solve the problem. But unless you have a table of all distinct ids, you would have to compute one on the fly. For example (using the same fiddle as Pankaj used):
with uniq as (select distinct id from test)
select foo.* from uniq cross join lateral
(select * from test where test.id=uniq.id order by random() limit 3) foo
This could be either slower or faster than the Window Function method, depending on your system and your data and your indexes. In my hands, it was quite a bit faster even with the need to dynamically compute the list of distinct ids.

Unpivot in postgres with a column created in the same query

I am trying to unpivot a table with PostgreSQL as described here.
My problem is that I am creating a new column in my query which I want to use in my cross join lateral statement (which results in an SQL error because the original table does not have this column).
ORIGINAL QUESTION:
select
"Name",
case
when "Year"='2020' then "Date"
end as "Baseline"
from "test_table"
EDIT: I am using the example from the referred StackOverflow question:
create table customer_turnover
(
customer_id integer,
q1 integer,
q2 integer,
q3 integer,
q4 integer
);
INSERT INTO customer_turnover VALUES
(1, 100, 210, 203, 304);
INSERT INTO customer_turnover VALUES
(2, 150, 118, 422, 257);
INSERT INTO customer_turnover VALUES
(3, 220, 311, 271, 269);
INSERT INTO customer_turnover VALUES
(3, 320, 211, 171, 269);
select * from customer_turnover;
creates the following output
customer_id q1 q2 q3 q4
1 100 210 203 304
2 150 118 422 257
3 220 311 271 269
3 320 211 171 269
(I used the customer_id 3 twice because this column is not unique)
Essentially, what I would like to do is the following: I would like to calculate a new column qsum:
select customer_id, q1, q2, q3, q4,
q1+q2+q3+q4 as qsum
from customer_turnover
and use this additional column in my unpivoting statement to produce the following output:
customer_id turnover quarter
1 100 Q1
1 210 Q2
1 203 Q3
1 304 Q4
1 817 qsum
2 150 Q1
2 118 Q2
2 422 Q3
2 257 Q4
2 947 qsum
3 220 Q1
3 311 Q2
3 271 Q3
3 269 Q4
3 1071 qsum
3 320 Q1
3 211 Q2
3 171 Q3
3 269 Q4
3 971 qsum
As I do not want to have qsum in my final output, I understand that I cannot use it in my select statement, but even if I would use it like this
select customer_id, t.*, q1, q2, q3, q4,
q1+q2+q3+q4 as qsum
from customer_turnover c
cross join lateral (
values
(c.q1, 'Q1'),
(c.q2, 'Q2'),
(c.q3, 'Q3'),
(c.q4, 'Q4'),
(c.qsum, 'Qsum')
) as t(turnover, quarter)
I receive the following SQL error: ERROR: column c.qsum does not exist
How can I produce my desired output?
Not sure to well understand your issue, maybe a subquery can help :
select s.baseline
from
( select
"Name",
case
when "Year"='2020' then "Date"
end as "Baseline"
from "test_table"
) AS s

Problem Counting Items For an Individual Row

I need to find the count for ActivityID and AdditionalActivityID for each DailyFieldRecordID based on when GroupID = 260 and ItemID is either 1302,1303,1305,1306. The problem I'm having is that regardless of how many rows for each individual DailyFieldRecordID there are, there can only be one ActivityID and one AdditionalActivityID regardless of how many rows comply with the constraints.
Say someone is filling out a form and they list what their Activity for the day was and also what other activity they might have. They can only list one primary activity(ActivityID) and one secondary activity(AdditionalActivity). But during those activities they could participate with multiple groups(GroupID) or people(ItemID). So when I'm running this query I'm able to separate the rows based on the constraints, but I only want to count how many activities they participated in, which will either be 1 or 2 for each DailyFieldRecordID, regardless of how many groups or people were involved. Right now my query is counting each ActivityID and AdditionalActivityID for each row that meets the criteria, which can give me many more than just 1 or 2 for each DailyFieldRecordID. I'm just not sure how I would go about doing this. Any feedback is greatly appreciated.
DailyFieldRecordID: GroupID: ItemID: ActivityID: AdittionalActivityID:
3369320 260 1302 37 0
3369320 260 1305 37 0
3369320 210 2222 37 0
3369320 250 2222 37 0
3372806 260 1302 56 56
3372806 260 1305 56 56
3372806 250 2222 56 56
3388888 260 2222 45 32
Expected Result:
DailyFieldRecordID: Count:
3369320 1
3372806 2
Current Result:
DailyFieldRecordID: Count:
3369320 2
3372806 4
'
select a.DailyFieldRecordID,
count(case when a.ActivityID <>0 then 1 else null end) +
count(case when a.AdditionalActivityID <>0 then 1 else null end) as count
from AB953 a
where a.GroupID= 260 and exists(
select b.DailyFieldRecordID
from AB953 b
where a.DailyFieldRecordID = b.DailyFieldRecordID and b.ItemID in (1302,1303,1305,1306))
group by DailyFieldRecordID
I get this result when trying your data:
DailyfieldrecordID Count
3369320 3
3372806 2
3388888 1
SELECT DailyFieldRecordID,
COUNT(CASE WHEN ActivityID <>0 then 1 else 0 end +
CASE WHEN AdditionalActivityID <>0 then 1 else 0 end) as Count
from Foo
where GroupID= 260 and exists(
select b.DailyFieldRecordID
from fOO b
where DailyFieldRecordID = b.DailyFieldRecordID and b.ItemID in (1302,1303,1305,1306))
group by DailyFieldRecordID
New query: you might need to fiddle with this, not sure if your data is wrong or not....... cant get it to select 3 and then 2:
SELECT DailyFieldRecordID,
COUNT(CASE WHEN ActivityID <>0 then 1 else 0 end +
CASE WHEN AdditionalActivityID <>0 then 1 else 0 end) as Count
from Foo
where GroupID= 260 and DailyFieldRecordID IN(
select b.DailyFieldRecordID
from fOO b
where b.ItemID IN(1302,1303,1305,1306))
group by DailyFieldRecordID
This should do it:
;WITH CTE AS
(
SELECT A.DailyFieldRecordID
,ActivityID = IIF(A.ActivityID = 0, NULL, A.ActivityID)
,AdittionalActivityID = IIF(A.AdittionalActivityID = 0, NULL, A.AdittionalActivityID)
FROM AB953 A
WHERE A.GroupID = 260
AND A.ItemID IN (1302,1303,1305,1306)
)
SELECT DailyFieldRecordID
,CNT = COUNT(DISTINCT ActivityID) + COUNT(DISTINCT AdittionalActivityID)
FROM CTE
GROUP BY DailyFieldRecordID;
I created this DDL and test data for testing:
DROP TABLE IF EXISTS AB953
GO
CREATE TABLE AB953 (
DailyFieldRecordID INT, GroupID INT, ItemID INT, ActivityID INT, AdittionalActivityID INT)
INSERT INTO AB953
VALUES
( 3369320, 260, 1302, 37, 0 )
,( 3369320, 260, 1305, 37, 0 )
,( 3369320, 210, 2222, 37, 0 )
,( 3369320, 250, 2222, 37, 0 )
,( 3372806, 260, 1302, 56, 56 )
,( 3372806, 260, 1305, 56, 56 )
,( 3372806, 250, 2222, 56, 56 )
,( 3388888, 260, 2222, 45, 32 )
GO

t-sql function like "filter" for sum(x) filter(condition) over (partition by

I'm trying to sum a window with a filter. I saw something similar to
sum(x) filter(condition) over (partition by...)
but it does not seem to work in t-sql, SQL Server 2017.
Essentially, I want to sum the last 5 rows that have a condition on another column.
I've tried
sum(case when condition...) over (partition...)
and sum(cast(nullif(x))) over (partition...).
I've tried left joining the table with a where condition to filter out the condition.
All of the above will add the last 5 from the starting point of the current row with the condition.
What I want is from the current row. Add the last 5 values above that meet a condition.
Date| Value | Condition | Result
1-1 10 1
1-2 11 1
1-3 12 1
1-4 13 1
1-5 14 0
1-6 15 1
1-7 16 0
1-8 17 0 sum(15+13+12+11+10)
1-9 18 1 sum(18+15+13+12+11)
1-10 19 1 sum(19+18+15+13+12)
In the above example the condition I would want would be 1, ignoring the 0 but still having the "window" size be 5 non-0 values.
This can easily be achieved using a correlated sub query:
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
[Date] Date,
[Value] int,
Condition bit
)
INSERT INTO #T ([Date], [Value], Condition) VALUES
('2019-01-01', 10, 1),
('2019-01-02', 11, 1),
('2019-01-03', 12, 1),
('2019-01-04', 13, 1),
('2019-01-05', 14, 0),
('2019-01-06', 15, 1),
('2019-01-07', 16, 0),
('2019-01-08', 17, 0),
('2019-01-09', 18, 1),
('2019-01-10', 19, 1)
The query:
SELECT [Date], [Value], Condition,
(
SELECT Sum([Value])
FROM
(
SELECT TOP 5 [Value]
FROM #T AS t1
WHERE Condition = 1
AND t1.[Date] <= t0.[Date]
-- If you want the sum to appear starting from a specific date, unremark the next row
--AND t0.[Date] > '2019-01-07'
ORDER BY [Date] DESC
) As t2
HAVING COUNT(*) = 5 -- there are at least 5 rows meeting the condition
) As Result
FROM #T As T0
Results:
Date Value Condition Result
2019-01-01 10 1
2019-01-02 11 1
2019-01-03 12 1
2019-01-04 13 1
2019-01-05 14 0
2019-01-06 15 1 61
2019-01-07 16 0 61
2019-01-08 17 0 61
2019-01-09 18 1 69
2019-01-10 19 1 77

how to efficiently locate a value from one table among values from another table, with SQL

I have a problem in Postgresql which I find even difficult to describe in the title: I have two tables, containing each a range of values very similar but not identical. Suppose I have values like 0, 10, 20, 30, ... in one, and 1, 5, 6, 9, 10, 12, 19, 25, 26, ... in the second one (these are milliseconds). For each value of the second one I want to find the values immediately lower and higher in the first one. So, for the value 12 it would give me 10 and 20. I'm doing it like this :
SELECT s.*, MAX(v1."millisec") AS low_v, MIN(v2."millisec") AS high_v
FROM "signals" AS s, "tracks" AS v1, "tracks" AS v2
WHERE v1."millisec" <= s."d_time"
AND v2."millisec" > s."d_time"
GROUP BY s."d_time", s."field2"; -- this is just an example
And it works ... but it is very slow once I process several thousands of lines, even with indexes on s."d_time" and v.millisec. So, I think there must be a much better way to do it, but I fail to find one. Could anyone help me ?
Try:
select s.*,
(select millisec
from tracks t
where t.millisec <= s.d_time
order by t.millisec desc
limit 1
) as low_v,
(select millisec
from tracks t
where t.millisec > s.d_time
order by t.millisec asc
limit 1
) as high_v
from signals s;
Be sure you have an index for track.millisec;
If you had just created
the index, you'll need to analyze the table to take advantage of it.
Naive (trivial) way to find the preceding and next value.
-- the data (this could have been part of the original question)
CREATE TABLE table_one (id SERIAL NOT NULL PRIMARY KEY
, msec INTEGER NOT NULL -- index maight help
);
CREATE TABLE table_two (id SERIAL NOT NULL PRIMARY KEY
, msec INTEGER NOT NULL -- index maight help
);
INSERT INTO table_one(msec) VALUES (0), ( 10), ( 20), ( 30);
INSERT INTO table_two(msec) VALUES (1), ( 5), ( 6), ( 9), ( 10), ( 12), ( 19), ( 25), ( 26);
-- The query: find lower/higher values in table one
-- , but but with no values between "us" and "them".
--
SELECT this.msec AS this
, prev.msec AS prev
, next.msec AS next
FROM table_two this
LEFT JOIN table_one prev ON prev.msec < this.msec AND NOT EXISTS (SELECT 1 FROM table_one nx WHERE nx.msec < this.msec AND nx.msec > prev.msec)
LEFT JOIN table_one next ON next.msec > this.msec AND NOT EXISTS (SELECT 1 FROM table_one nx WHERE nx.msec > this.msec AND nx.msec < next.msec)
;
Result:
CREATE TABLE
CREATE TABLE
INSERT 0 4
INSERT 0 9
this | prev | next
------+------+------
1 | 0 | 10
5 | 0 | 10
6 | 0 | 10
9 | 0 | 10
10 | 0 | 20
12 | 10 | 20
19 | 10 | 20
25 | 20 | 30
26 | 20 | 30
(9 rows)
try this :
select * from signals s,
(select millisec low_value,
lead(millisec) over (order by millisec) high_value from tracks) intervals
where s.d_time between low_value and high_value-1
For this type of problem "Window functions" are ideal see : http://www.postgresql.org/docs/9.1/static/tutorial-window.html