much appreciate any help. I have a table named lifelong like presented below.
id first_meal last_meal
0 1 2022-07-25 12:28:00 2022-07-25 20:06:00
1 2 2022-07-26 13:12:00 2022-07-26 19:09:00
2 3 2022-07-27 14:13:00 2022-07-27 20:13:00
3 4 2022-07-28 15:10:00 2022-07-28 21:22:00
I skip one row from column first_meal with
select f.id, f.first_meal from lifelong f offset 1;
I select all from the column last_meal with
select id, last_meal from lifelong
Now, I need to save this two different selects form the same table as a new table
I tried union or union all, I tried to put these two selects in parenthesis - no result so far
I expect an output like below
id first_meal last_meal
0 1 2022-07-26 13:12:00 2022-07-25 20:06:00
1 2 2022-07-27 14:13:00 2022-07-26 19:09:00
2 3 2022-07-28 15:10:00 2022-07-27 20:13:00
3 4 NaN 2022-07-28 21:22:00
I would suggest using ROW_NUMBER here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY first_meal DESC) rn
FROM lifelong
)
INSERT INTO newTable (id, first_meal, last_meal)
SELECT id, CASE WHEN rn > 1 THEN first_meal END, last_meal
FROM cte;
I'm running the postgresql query below in aws redshift. Each time I run this query I'm getting a different result for the number of records that are different on daily_table.product_repeat_sub_query side, using the except operator. Neither the daily_table.product_repeat_sub_query table or the daily_table.daily_sku_t are being updated during this time. the daily_table.product_repeat_sub_query table and the product_repeat_sub_query query both have the same record count. the schema for the daily_table.daily_sku_t is below, the matching fields in the daily_table.product_repeat_sub_query have the same data types. I've also included some sample records from the tables below. does anyone have an idea how the results of the except query can come out differently each time this query is run, when the underlying tables aren't changing?
daily_table.daily_sku_t schema:
customer_uuid string
boardname_12 string
producttype string
productsubtype string
storeid int
product_id string
dateclosed date
Size string
query:
with product_repeat_sub_query as
(
select
dateclosed, t.product_id, t.storeid, t.producttype, t.productsubtype, t.size, t.boardname_12,
case
when ticketid = first_value(ticketid) over (partition by t.product_id, customer_uuid
ORDER BY
dateclosed ASC rows between unbounded preceding and unbounded following) then 0
else grossreceipts
end as product_repeat_gross, datediff(day,
lag(dateclosed, 1) over (partition by t.boardname_12, customer_uuid, t.product_id
ORDER BY
dateclosed ASC ),
dateclosed) as product_cycle_days
from
daily_table.daily_sku_t t )
select count(*) from
(
select dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from daily_table.product_repeat_sub_query
except
select dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from product_repeat_sub_query
);
-- 36843
-- 36887
-- 36188
data:
daily_table.product_repeat_sub_query
dateclosed storeid boardname_12 producttype productsubtype size product_id product_cycle_days
2021-04-23 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 2
2021-04-24 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 6
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 8
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 3
2021-05-01 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 13
2020-06-18 61 FLAV RX WINGER BEVERAGE 100MT 0000265d-6b81-4d79-90cf-xxxxxxxxxxxx 5
2020-06-29
product_repeat_subquery
dateclosed storeid boardname_12 producttype productsubtype size product_id product_cycle_days
2021-04-23 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 2
2021-04-24 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 6
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 8
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 3
2021-05-01 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 13
2020-06-18 61 FLAV RX WINGER BEVERAGE 100MT 0000265d-6b81-4d79-90cf-xxxxxxxxxxxx 5
2020-06-29
update:
with product_repeat_sub_query as
(
select customer_uuid,
dateclosed, t.product_id, t.storeid, t.producttype, t.productsubtype, t.size, t.boardname_12,
case
when ticketid = first_value(ticketid) over (partition by t.product_id, customer_uuid
ORDER BY
dateclosed ASC rows between unbounded preceding and unbounded following) then 0
else grossreceipts
end as product_repeat_gross, datediff(day,
lag(dateclosed, 1) over (partition by t.boardname_12, customer_uuid, t.product_id
ORDER BY
dateclosed ASC,t.boardname_12, customer_uuid, t.product_id ),
dateclosed) as product_cycle_days
from
daily_table.daily_sku_t t
where (t.customer_uuid is not null)
and (trim(t.customer_uuid) != '')
and (t.product_id is not null)
and (trim(t.product_id) != '')
)
select count(*) from
(
select customer_uuid, dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from daily_table.product_repeat_sub_query
except
select customer_uuid, dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from product_repeat_sub_query
);
even after adding all the fields from the partition to the order by and filtering our nulls or blanks in the id fields, I'm still getting a different count each time.
Your window functions don't have fully qualified order by clauses. You have repeated "dateclosed" values within partitions. This means that Redshift can have different row orders for the lag and first-value functions. I expect that these "random" ordering differences are causing your changing results.
Id values
1 10
1 20
1 30
1 40
2 3
2 9
2 0
3 14
3 5
3 7
Answer should be
Id values
1 30
2 3
3 7
I tried as below
Select distinct
id,
(select max(values)
from table
where values not in(select ma(values) from table)
)
You need the row_number window function. This adds a column with a row count for each group (in your case the ids). In a subquery you are able to ask for the second row of each group.
demo:db<>fiddle
SELECT
id, values
FROM (
SELECT
*,
row_number() OVER (PARTITION BY id ORDER BY values DESC)
FROM
table
) s
WHERE row_number = 2
i have a table and i want to know where duplicate records are present for same columns. These are my columns and i want to get record where group_id or week are different for same code and fweek and newcode
Id newcode fweek code group_id week
1 343001 2016-01 343 100 8
2 343002 2016-01 343 100 8
3 343001 2016-01 343 101 08
Required record is
Id newcode fweek code group_id week
3 343001 2016-01 343 101 08
To find the duplicate values i have joined the table with itself.
and we need to group the results with code,fweek and newcode to get more than one duplicate rows if they exist. i have used max() to get last inserted row.
you don't need to use is distinct from (it is same for inequality + NULL). if you don't want to compare NULL ones, use <> operator.
You find more information about here info
select r.*
from your_table r
where r.id in (select max(r.id)
from your_table r
join your_table r2 on r2.code = r.code and r2.fweek = r.fweek and r2.newcode = r.newcode
where
r2.group_id is distinct from r.group_id or
r2.week is distinct from r.week
group by r.code,
r.fweek,
r.newcode
having count(*) > 1)
Can some one please help me with how to create end date from start date.
Products referred to a company for testing while the product with the company they carry out multiple tests on different dates and record the test date to establish the product condition i.e. (outcomeID).
I need to establish the StartDate which is the testDate and EndDate which is the start date of the next row. But if multiple consecutive tests resulted in the same OutcomeID I need to return only one row with the StartDate of the first test and the end date of the last test. In another word if the outcomeID did not change over a few consecutive tests.
Here is my data set
DECLARE #ProductTests TABLE
(
RequestID int not null,
ProductID int not null,
TestID int not null,
TestDate datetime null,
OutcomeID int
)
insert into #ProductTests
(RequestID ,ProductID ,TestID ,TestDate ,OutcomeID )
select 1,2,22,'2005-01-21',10
union all
select 1,2,42,'2007-03-17',10
union all
select 1,2,45,'2010-12-25',10
union all
select 1,2,325,'2011-01-14',13
union all
select 1,2,895,'2011-08-10',15
union all
select 1,2,111,'2011-12-23',15
union all
select 1,2,636,'2012-05-02',10
union all
select 1,2,554,'2012-11-08',17
--select *from #producttests
RequestID ProductID TestID TestDate OutcomeID
1 2 22 2005-01-21 10
1 2 42 2007-03-17 10
1 2 45 2010-12-25 10
1 2 325 2011-01-14 13
1 2 895 2011-08-10 15
1 2 111 2011-12-23 15
1 2 636 2012-05-02 10
1 2 554 2012-11-08 17
And this is what I need to achieve.
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17
As you see from the dataset the first three tests (22, 42, and 45) all resulted in OutcomeID 10 so in my result I only need start date of test 22 and end date of test 45 which is the start date of test 325.As you see in test 636 outcomeID has gone back to 10 from 15 so it needs to be returned too.
--This is what I have managed to achieve at the moment using the following script
select T1.RequestID,T1.ProductID,T1.TestDate AS StartDate
,MIN(T2.TestDate) AS EndDate ,T1.OutcomeID
from #producttests T1
left join #ProductTests T2 ON T1.RequestID=T2.RequestID
and T1.ProductID=T2.ProductID and T2.TestDate>T1.TestDate
group by T1.RequestID,T1.ProductID ,T1.OutcomeID,T1.TestDate
order by T1.TestDate
Result:
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2007-03-17 10
1 2 2007-03-17 2010-12-25 10
1 2 2010-12-25 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2011-12-23 15
1 2 2011-12-23 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17
nov 7 but still not answered
so here is my solution
not soo pretty but works
my hint is read about windowing , ranking and aggregate functions like row_number, rank , avg, sum etc.
those are essential when you want to write raports , and becoming quite powerfull in sql server 2012
i have also used CTE (common table expression) but it can be written as subquery or temporary table
;with cte ( ida, requestid, productid, testid, testdate, outcomeid) as
(
-- select rows where the outcome id is changing
select b.* from
(select ROW_NUMBER() over( partition by requestid, productid order by testDate) as id, * from #ProductTests)a
right outer join
(select ROW_NUMBER() over(partition by requestid, productid order by testDate) as id, * from #ProductTests) b
on a.requestID = b.requestID and a.productID = b.productID and a.id +1 = b.id
where 1=1
--or a.id = 1
and a.outcomeid <> b.outcomeid or b.outcomeid is null or a.id is null
)
select --*
a.RequestID,a.ProductID,a.TestDate AS StartDate ,MIN(b.TestDate) AS EndDate ,a.OutcomeID
from cte a left join cte b on a.requestid = b.requestid and a.productid = b.productid and a.testdate < b.testdate
group by a.RequestID,a.ProductID ,a.OutcomeID,a.TestDate
order by StartDate
Actually, there seem to be two problems in your question. One is how to group sequential (based on specific criteria) rows containing the same value. The other is the one actually spelled out in your title, i.e. how to use the next row's StartDate as the current row's EndDate.
Personally, I would solve these two problems in the order I mentioned them, so I would first address the grouping problem. One way to group the data properly in this case would be to use double ranking like this:
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT *
FROM grouped
;
This should give you the following output for your data sample:
RequestID ProductID StartDate OutcomeID
--------- --------- ---------- ---------
1 2 2005-01-21 10
1 2 2011-01-14 13
1 2 2011-08-10 15
1 2 2012-05-02 10
1 2 2012-11-08 17
Obviously, one thing is still missing, and it's EndDate, and now is the right time to care about it. Use ROW_NUMBER() once again, to rank the result set of the grouped CTE, then use the rankings in the join condition when joining the result set with itself (using an outer join):
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID,
rnk = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY MIN(TestDate))
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT
g1.RequestID,
g1.ProductID,
g1.StartDate,
g2.StartDate AS EndDate,
g1.OutcomeID
FROM grouped g1
LEFT JOIN grouped g2
ON g1.RequestID = g2.RequestID
AND g1.ProductID = g2.ProductID
AND g1.rnk = g2.rnk - 1
;
You can try this query at SQL Fiddle to verify that it returns the output you are after.