Retrieve information dynamically from multiple CTE - tsql

I have multiple CTEs and I want to retrieve some information from a couple of them into next CTE.
So, I have this information from one of the CTEs:
PeriodID StarDate
1 2006-01-01
2 2007-04-25
3 2008-08-16
4 2009-12-08
5 2011-04-017
and this from other:
RecordID Date
100 2007-04-15
101 2008-05-21
102 2008-06-06
103 2008-07-01
104 2009-11-12
And I need to show in next one:
RecordID Date PeriodID
100 2007-04-15 1
101 2008-05-21 2
102 2008-06-06 2
103 2008-07-01 2
104 2009-11-12 3
I can use some case/when statement to define if date of record is in period 1,2,3,4 or 5 but it some situation I can have different numbers of periods return from the first CTE.
Is there a way to do this in the above context?

You can have multiple CTEs defined as follows, and then select from and join them as you would any other table.
with cte1 as (select * ...),
cte2 as (select * ...)
select
cte2.*,
periodid
from cte2
cross apply
(select top 1 * from cte1 where cte2.recorddate> cte1.startdate order by startdate desc) v

Related

Taking N-samples from each group in PostgreSQL

I have a table containing data that has a column named id that looks like below:
id
value 1
value 2
value 3
1
244
550
1000
1
251
551
700
1
540
60
1200
...
...
...
...
2
19
744
2000
2
10
903
100
2
44
231
600
2
120
910
1100
...
...
...
...
I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points.
For example I would like a maximum 50 data points randomly selected from id = 1, id = 2 etc...
I cannot find any previous questions similar to this but have tried taking a stab at at least logically working through the solution where I could iterate and union all queries by id and limit to 50:
SELECT * FROM (SELECT * FROM schema.table AS tbl WHERE tbl.id = X LIMIT 50) UNION ALL;
But it's obvious that you cannot use this type of solution because UNION ALL requires aggregating outputs from one id to the next and I do not have a list of id values to use in place of X in tbl.id = X.
Is there a way to accomplish this by gathering that list of unique id values and union all results or is there a more optimal way this could be done?
If you want to select a random sample for each id, then you need to randomize the rows somehow. Here is a way to do it:
select * from (
select *, row_number() over (partition by id order by random()) as u
from schema.table
) as a
where u <= 50;
Example (limiting to 3, and some row number for each id so you can see the selection randomness):
setup
DROP TABLE IF EXISTS foo;
CREATE TABLE foo
(
id int,
value1 int,
idrow int
);
INSERT INTO foo
select 1 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 2 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow
union all
select 3 as id, (1000*random())::int as value1, generate_series(1, 100) as idrow;
Selection
select * from (
select *, row_number() over (partition by id order by random()) as u
from foo
) as a
where u <= 3;
Output:
id
value1
idrow
u
1
542
6
1
1
24
86
2
1
155
74
3
2
505
95
1
2
100
46
2
2
422
33
3
3
966
88
1
3
747
89
2
3
664
19
3
In case you are looking to get 50 (or less) from each group of IDs then you can use windowing -
From question - "I want to take 50 sample rows per id that exists but if less than 50 exist for the group to simply take the entire set of data points."
Query -
with data as (
select row_number() over (partition by id order by random()) rn,
* from table_name)
select * from data where rn<=50 order by id;
Fiddle.
Your description of trying to get the UNION ALL without specifying all the branches ahead of time is aiming for a LATERAL join. And that is one way to solve the problem. But unless you have a table of all distinct ids, you would have to compute one on the fly. For example (using the same fiddle as Pankaj used):
with uniq as (select distinct id from test)
select foo.* from uniq cross join lateral
(select * from test where test.id=uniq.id order by random() limit 3) foo
This could be either slower or faster than the Window Function method, depending on your system and your data and your indexes. In my hands, it was quite a bit faster even with the need to dynamically compute the list of distinct ids.

Pivoting results from CTE in Postgres

I have a large SQL statements(PostgreSQL version 11) with many CTE's, i want to use the results from an intermediary CTE to create a PIVOTed set of results and join it with other CTE.
Below is a small part of my query and the CTE "previous_months_actual_sales" is the one i need to PIVOT.
,last_24 as
(
SELECT l_24m::DATE + (interval '1' month * generate_series(0,24)) as last_24m
FROM last_24_month_start LIMIT 24
)
,previous_months_actual_sales as
(
SELECT TO_CHAR(created_at,'YYYY-MM') as dates
,b.code,SUM(quantity) as qty
FROM base b
INNER JOIN products_sold ps ON ps.code=b.code
WHERE TO_CHAR(created_at,'YYYY-MM')
IN(SELECT TO_CHAR(last_24m,'YYYY-MM') FROM last_24)
GROUP BY b.code,TO_CHAR(created_at,'YYYY-MM')
)
SELECT * FROM previous_months_actual_sales
The results of this CTE "previous_months_actual_sales" is shown below,
dates code qty
"2018-04" "0009" 23
"2018-05" "0009" 77
"2018-06" "0008" 44
"2018-07" "0008" 1
"2018-08" "0009" 89
The expected output based on the above result is,
code. 2018-04. 2018-05. 2018-06. 2018-07. 2018-08
"0009". 23 77 89
"0008". 44 1
Is there a way to achieve this?

Select rows with second highest value for each ID repeated multiple times

Id values
1 10
1 20
1 30
1 40
2 3
2 9
2 0
3 14
3 5
3 7
Answer should be
Id values
1 30
2 3
3 7
I tried as below
Select distinct
id,
(select max(values)
from table
where values not in(select ma(values) from table)
)
You need the row_number window function. This adds a column with a row count for each group (in your case the ids). In a subquery you are able to ask for the second row of each group.
demo:db<>fiddle
SELECT
id, values
FROM (
SELECT
*,
row_number() OVER (PARTITION BY id ORDER BY values DESC)
FROM
table
) s
WHERE row_number = 2

PGSQL duplicate record in same column

i have a table and i want to know where duplicate records are present for same columns. These are my columns and i want to get record where group_id or week are different for same code and fweek and newcode
Id newcode fweek code group_id week
1 343001 2016-01 343 100 8
2 343002 2016-01 343 100 8
3 343001 2016-01 343 101 08
Required record is
Id newcode fweek code group_id week
3 343001 2016-01 343 101 08
To find the duplicate values i have joined the table with itself.
and we need to group the results with code,fweek and newcode to get more than one duplicate rows if they exist. i have used max() to get last inserted row.
you don't need to use is distinct from (it is same for inequality + NULL). if you don't want to compare NULL ones, use <> operator.
You find more information about here info
select r.*
from your_table r
where r.id in (select max(r.id)
from your_table r
join your_table r2 on r2.code = r.code and r2.fweek = r.fweek and r2.newcode = r.newcode
where
r2.group_id is distinct from r.group_id or
r2.week is distinct from r.week
group by r.code,
r.fweek,
r.newcode
having count(*) > 1)

Drive EndDate of Current Row From StarDate of Next Row

Can some one please help me with how to create end date from start date.
Products referred to a company for testing while the product with the company they carry out multiple tests on different dates and record the test date to establish the product condition i.e. (outcomeID).
I need to establish the StartDate which is the testDate and EndDate which is the start date of the next row. But if multiple consecutive tests resulted in the same OutcomeID I need to return only one row with the StartDate of the first test and the end date of the last test. In another word if the outcomeID did not change over a few consecutive tests.
Here is my data set
DECLARE #ProductTests TABLE
(
RequestID int not null,
ProductID int not null,
TestID int not null,
TestDate datetime null,
OutcomeID int
)
insert into #ProductTests
(RequestID ,ProductID ,TestID ,TestDate ,OutcomeID )
select 1,2,22,'2005-01-21',10
union all
select 1,2,42,'2007-03-17',10
union all
select 1,2,45,'2010-12-25',10
union all
select 1,2,325,'2011-01-14',13
union all
select 1,2,895,'2011-08-10',15
union all
select 1,2,111,'2011-12-23',15
union all
select 1,2,636,'2012-05-02',10
union all
select 1,2,554,'2012-11-08',17
--select *from #producttests
RequestID ProductID TestID TestDate OutcomeID
1 2 22 2005-01-21 10
1 2 42 2007-03-17 10
1 2 45 2010-12-25 10
1 2 325 2011-01-14 13
1 2 895 2011-08-10 15
1 2 111 2011-12-23 15
1 2 636 2012-05-02 10
1 2 554 2012-11-08 17
And this is what I need to achieve.
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17
As you see from the dataset the first three tests (22, 42, and 45) all resulted in OutcomeID 10 so in my result I only need start date of test 22 and end date of test 45 which is the start date of test 325.As you see in test 636 outcomeID has gone back to 10 from 15 so it needs to be returned too.
--This is what I have managed to achieve at the moment using the following script
select T1.RequestID,T1.ProductID,T1.TestDate AS StartDate
,MIN(T2.TestDate) AS EndDate ,T1.OutcomeID
from #producttests T1
left join #ProductTests T2 ON T1.RequestID=T2.RequestID
and T1.ProductID=T2.ProductID and T2.TestDate>T1.TestDate
group by T1.RequestID,T1.ProductID ,T1.OutcomeID,T1.TestDate
order by T1.TestDate
Result:
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2007-03-17 10
1 2 2007-03-17 2010-12-25 10
1 2 2010-12-25 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2011-12-23 15
1 2 2011-12-23 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17
nov 7 but still not answered
so here is my solution
not soo pretty but works
my hint is read about windowing , ranking and aggregate functions like row_number, rank , avg, sum etc.
those are essential when you want to write raports , and becoming quite powerfull in sql server 2012
i have also used CTE (common table expression) but it can be written as subquery or temporary table
;with cte ( ida, requestid, productid, testid, testdate, outcomeid) as
(
-- select rows where the outcome id is changing
select b.* from
(select ROW_NUMBER() over( partition by requestid, productid order by testDate) as id, * from #ProductTests)a
right outer join
(select ROW_NUMBER() over(partition by requestid, productid order by testDate) as id, * from #ProductTests) b
on a.requestID = b.requestID and a.productID = b.productID and a.id +1 = b.id
where 1=1
--or a.id = 1
and a.outcomeid <> b.outcomeid or b.outcomeid is null or a.id is null
)
select --*
a.RequestID,a.ProductID,a.TestDate AS StartDate ,MIN(b.TestDate) AS EndDate ,a.OutcomeID
from cte a left join cte b on a.requestid = b.requestid and a.productid = b.productid and a.testdate < b.testdate
group by a.RequestID,a.ProductID ,a.OutcomeID,a.TestDate
order by StartDate
Actually, there seem to be two problems in your question. One is how to group sequential (based on specific criteria) rows containing the same value. The other is the one actually spelled out in your title, i.e. how to use the next row's StartDate as the current row's EndDate.
Personally, I would solve these two problems in the order I mentioned them, so I would first address the grouping problem. One way to group the data properly in this case would be to use double ranking like this:
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT *
FROM grouped
;
This should give you the following output for your data sample:
RequestID ProductID StartDate OutcomeID
--------- --------- ---------- ---------
1 2 2005-01-21 10
1 2 2011-01-14 13
1 2 2011-08-10 15
1 2 2012-05-02 10
1 2 2012-11-08 17
Obviously, one thing is still missing, and it's EndDate, and now is the right time to care about it. Use ROW_NUMBER() once again, to rank the result set of the grouped CTE, then use the rankings in the join condition when joining the result set with itself (using an outer join):
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID,
rnk = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY MIN(TestDate))
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT
g1.RequestID,
g1.ProductID,
g1.StartDate,
g2.StartDate AS EndDate,
g1.OutcomeID
FROM grouped g1
LEFT JOIN grouped g2
ON g1.RequestID = g2.RequestID
AND g1.ProductID = g2.ProductID
AND g1.rnk = g2.rnk - 1
;
You can try this query at SQL Fiddle to verify that it returns the output you are after.