OVER PARTITION BY without ordering [duplicate]

OVER PARTITION BY without ordering [duplicate] - tsql

This question already has answers here:
SQL Query to add row number that resets everytime the column value changes (SQL Server 2014)
(2 answers)
Reset Row Number on value change, but with repeat values in partition
(2 answers)
Closed 1 year ago.
I have a table from which I would like to retrieve only the most recent record within each group of records, as marked by the value in a particular field.
The table content looks something like this:
state date
512 2021-03-09 11:31:38.300
512 2021-03-09 11:31:38.300
512 2021-03-09 11:31:31.693
512 2021-03-09 11:31:31.693
512 2021-03-08 12:49:10.753
512 2021-03-08 12:35:47.357
514 2021-03-08 12:35:01.030
512 2021-03-08 12:33:48.050
514 2021-03-08 12:14:29.537
514 2021-03-08 12:14:29.537
514 2021-03-08 12:14:18.760
512 2021-03-08 12:14:05.597
I would like to use OVER and PARTITION to SELECT output like this:
row state date
1 512 2021-03-09 11:31:38.300
2 512 2021-03-09 11:31:38.300
3 512 2021-03-09 11:31:31.693
4 512 2021-03-09 11:31:31.693
5 512 2021-03-08 12:49:10.753
6 512 2021-03-08 12:35:47.357
1 514 2021-03-08 12:35:01.030
1 512 2021-03-08 12:33:48.050
1 514 2021-03-08 12:14:29.537
2 514 2021-03-08 12:14:29.537
3 514 2021-03-08 12:14:18.760
1 512 2021-03-08 12:14:05.597
As you can see, the rows are ordered by date DESC and the state field is grouped by virtue of the row field starting at 1 for each change in the state field.
Currently, my code looks like this:
with query as
(
select state, date, row = row_number() over (partition by state order by date desc)
from table
)
select t.*
from table t
inner join query q on t.state = q.state and t.date = q.date
where row = 1
order by t.date desc
Unfortunately, this appears to group the records by state before ordering them by date DESC, so the result is only two result set records because there are only two different values in the state field. There should (for the example data above) be 5 resultset records.
How can I number the partition groups properly?

This works, though there may be a simpler way to do it.
WITH
cte1 AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY date_ DESC) as dateRow,
ROW_NUMBER() OVER(PARTITION BY state ORDER BY date_ DESC) as stateRow,
state,
date_
FROM StateDate
)
SELECT
ROW_NUMBER() OVER(PARTITION BY state, (dateRow - stateRow) ORDER BY date_ DESC) as row,
state, date_
FROM cte1
ORDER BY date_ DESC, state
Here's the data setup I used:
CREATE TABLE StateDate( state INT, date_ DATETIME)
GO
--state date
INSERT INTO StateDate VALUES (512, '2021-03-09 11:31:38.300');
INSERT INTO StateDate VALUES (512, '2021-03-09 11:31:38.300');
INSERT INTO StateDate VALUES (512, '2021-03-09 11:31:31.693');
INSERT INTO StateDate VALUES (512, '2021-03-09 11:31:31.693');
INSERT INTO StateDate VALUES (512, '2021-03-08 12:49:10.753');
INSERT INTO StateDate VALUES (512, '2021-03-08 12:35:47.357');
INSERT INTO StateDate VALUES (514, '2021-03-08 12:35:01.030');
INSERT INTO StateDate VALUES (512, '2021-03-08 12:33:48.050');
INSERT INTO StateDate VALUES (514, '2021-03-08 12:14:29.537');
INSERT INTO StateDate VALUES (514, '2021-03-08 12:14:29.537');
INSERT INTO StateDate VALUES (514, '2021-03-08 12:14:18.760');
INSERT INTO StateDate VALUES (512, '2021-03-08 12:14:05.597');
GO

Related

new table from two different select column results (from the same table) in postgresql

much appreciate any help. I have a table named lifelong like presented below.
id first_meal last_meal
0 1 2022-07-25 12:28:00 2022-07-25 20:06:00
1 2 2022-07-26 13:12:00 2022-07-26 19:09:00
2 3 2022-07-27 14:13:00 2022-07-27 20:13:00
3 4 2022-07-28 15:10:00 2022-07-28 21:22:00
I skip one row from column first_meal with
select f.id, f.first_meal from lifelong f offset 1;
I select all from the column last_meal with
select id, last_meal from lifelong
Now, I need to save this two different selects form the same table as a new table
I tried union or union all, I tried to put these two selects in parenthesis - no result so far
I expect an output like below
id first_meal last_meal
0 1 2022-07-26 13:12:00 2022-07-25 20:06:00
1 2 2022-07-27 14:13:00 2022-07-26 19:09:00
2 3 2022-07-28 15:10:00 2022-07-27 20:13:00
3 4 NaN 2022-07-28 21:22:00

I would suggest using ROW_NUMBER here:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY first_meal DESC) rn
FROM lifelong
)
INSERT INTO newTable (id, first_meal, last_meal)
SELECT id, CASE WHEN rn > 1 THEN first_meal END, last_meal
FROM cte;

results of except query change after repeated consecutive runs in redshift

I'm running the postgresql query below in aws redshift. Each time I run this query I'm getting a different result for the number of records that are different on daily_table.product_repeat_sub_query side, using the except operator. Neither the daily_table.product_repeat_sub_query table or the daily_table.daily_sku_t are being updated during this time. the daily_table.product_repeat_sub_query table and the product_repeat_sub_query query both have the same record count. the schema for the daily_table.daily_sku_t is below, the matching fields in the daily_table.product_repeat_sub_query have the same data types. I've also included some sample records from the tables below. does anyone have an idea how the results of the except query can come out differently each time this query is run, when the underlying tables aren't changing?
daily_table.daily_sku_t schema:
customer_uuid string
boardname_12 string
producttype string
productsubtype string
storeid int
product_id string
dateclosed date
Size string
query:
with product_repeat_sub_query as
(
select
dateclosed, t.product_id, t.storeid, t.producttype, t.productsubtype, t.size, t.boardname_12,
case
when ticketid = first_value(ticketid) over (partition by t.product_id, customer_uuid
ORDER BY
dateclosed ASC rows between unbounded preceding and unbounded following) then 0
else grossreceipts
end as product_repeat_gross, datediff(day,
lag(dateclosed, 1) over (partition by t.boardname_12, customer_uuid, t.product_id
ORDER BY
dateclosed ASC ),
dateclosed) as product_cycle_days
from
daily_table.daily_sku_t t )
select count(*) from
(
select dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from daily_table.product_repeat_sub_query
except
select dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from product_repeat_sub_query
);
-- 36843
-- 36887
-- 36188
data:
daily_table.product_repeat_sub_query
dateclosed storeid boardname_12 producttype productsubtype size product_id product_cycle_days
2021-04-23 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 2
2021-04-24 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 6
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 8
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 3
2021-05-01 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 13
2020-06-18 61 FLAV RX WINGER BEVERAGE 100MT 0000265d-6b81-4d79-90cf-xxxxxxxxxxxx 5
2020-06-29
product_repeat_subquery
dateclosed storeid boardname_12 producttype productsubtype size product_id product_cycle_days
2021-04-23 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 2
2021-04-24 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 6
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 8
2021-04-26 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 3
2021-05-01 427 22RED DRUMER 1T 000011aa-4f03-4f0b-a621-xxxxxxxxxxxx 13
2020-06-18 61 FLAV RX WINGER BEVERAGE 100MT 0000265d-6b81-4d79-90cf-xxxxxxxxxxxx 5
2020-06-29
update:
with product_repeat_sub_query as
(
select customer_uuid,
dateclosed, t.product_id, t.storeid, t.producttype, t.productsubtype, t.size, t.boardname_12,
case
when ticketid = first_value(ticketid) over (partition by t.product_id, customer_uuid
ORDER BY
dateclosed ASC rows between unbounded preceding and unbounded following) then 0
else grossreceipts
end as product_repeat_gross, datediff(day,
lag(dateclosed, 1) over (partition by t.boardname_12, customer_uuid, t.product_id
ORDER BY
dateclosed ASC,t.boardname_12, customer_uuid, t.product_id ),
dateclosed) as product_cycle_days
from
daily_table.daily_sku_t t
where (t.customer_uuid is not null)
and (trim(t.customer_uuid) != '')
and (t.product_id is not null)
and (trim(t.product_id) != '')
)
select count(*) from
(
select customer_uuid, dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from daily_table.product_repeat_sub_query
except
select customer_uuid, dateclosed, storeid, boardname_12, producttype, productsubtype, size, product_id, product_cycle_days from product_repeat_sub_query
);
even after adding all the fields from the partition to the order by and filtering our nulls or blanks in the id fields, I'm still getting a different count each time.

Your window functions don't have fully qualified order by clauses. You have repeated "dateclosed" values within partitions. This means that Redshift can have different row orders for the lag and first-value functions. I expect that these "random" ordering differences are causing your changing results.

Select rows with second highest value for each ID repeated multiple times

Id values
1 10
1 20
1 30
1 40
2 3
2 9
2 0
3 14
3 5
3 7
Answer should be
Id values
1 30
2 3
3 7
I tried as below
Select distinct
id,
(select max(values)
from table
where values not in(select ma(values) from table)
)

You need the row_number window function. This adds a column with a row count for each group (in your case the ids). In a subquery you are able to ask for the second row of each group.
demo:db<>fiddle
SELECT
id, values
FROM (
SELECT
*,
row_number() OVER (PARTITION BY id ORDER BY values DESC)
FROM
table
) s
WHERE row_number = 2

PGSQL duplicate record in same column

i have a table and i want to know where duplicate records are present for same columns. These are my columns and i want to get record where group_id or week are different for same code and fweek and newcode
Id newcode fweek code group_id week
1 343001 2016-01 343 100 8
2 343002 2016-01 343 100 8
3 343001 2016-01 343 101 08
Required record is
Id newcode fweek code group_id week
3 343001 2016-01 343 101 08

To find the duplicate values i have joined the table with itself.
and we need to group the results with code,fweek and newcode to get more than one duplicate rows if they exist. i have used max() to get last inserted row.
you don't need to use is distinct from (it is same for inequality + NULL). if you don't want to compare NULL ones, use <> operator.
You find more information about here info
select r.*
from your_table r
where r.id in (select max(r.id)
from your_table r
join your_table r2 on r2.code = r.code and r2.fweek = r.fweek and r2.newcode = r.newcode
where
r2.group_id is distinct from r.group_id or
r2.week is distinct from r.week
group by r.code,
r.fweek,
r.newcode
having count(*) > 1)

Drive EndDate of Current Row From StarDate of Next Row

Can some one please help me with how to create end date from start date.
Products referred to a company for testing while the product with the company they carry out multiple tests on different dates and record the test date to establish the product condition i.e. (outcomeID).
I need to establish the StartDate which is the testDate and EndDate which is the start date of the next row. But if multiple consecutive tests resulted in the same OutcomeID I need to return only one row with the StartDate of the first test and the end date of the last test. In another word if the outcomeID did not change over a few consecutive tests.
Here is my data set
DECLARE #ProductTests TABLE
(
RequestID int not null,
ProductID int not null,
TestID int not null,
TestDate datetime null,
OutcomeID int
)
insert into #ProductTests
(RequestID ,ProductID ,TestID ,TestDate ,OutcomeID )
select 1,2,22,'2005-01-21',10
union all
select 1,2,42,'2007-03-17',10
union all
select 1,2,45,'2010-12-25',10
union all
select 1,2,325,'2011-01-14',13
union all
select 1,2,895,'2011-08-10',15
union all
select 1,2,111,'2011-12-23',15
union all
select 1,2,636,'2012-05-02',10
union all
select 1,2,554,'2012-11-08',17
--select *from #producttests
RequestID ProductID TestID TestDate OutcomeID
1 2 22 2005-01-21 10
1 2 42 2007-03-17 10
1 2 45 2010-12-25 10
1 2 325 2011-01-14 13
1 2 895 2011-08-10 15
1 2 111 2011-12-23 15
1 2 636 2012-05-02 10
1 2 554 2012-11-08 17
And this is what I need to achieve.
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17
As you see from the dataset the first three tests (22, 42, and 45) all resulted in OutcomeID 10 so in my result I only need start date of test 22 and end date of test 45 which is the start date of test 325.As you see in test 636 outcomeID has gone back to 10 from 15 so it needs to be returned too.
--This is what I have managed to achieve at the moment using the following script
select T1.RequestID,T1.ProductID,T1.TestDate AS StartDate
,MIN(T2.TestDate) AS EndDate ,T1.OutcomeID
from #producttests T1
left join #ProductTests T2 ON T1.RequestID=T2.RequestID
and T1.ProductID=T2.ProductID and T2.TestDate>T1.TestDate
group by T1.RequestID,T1.ProductID ,T1.OutcomeID,T1.TestDate
order by T1.TestDate
Result:
RequestID ProductID StartDate EndDate OutcomeID
1 2 2005-01-21 2007-03-17 10
1 2 2007-03-17 2010-12-25 10
1 2 2010-12-25 2011-01-14 10
1 2 2011-01-14 2011-08-10 13
1 2 2011-08-10 2011-12-23 15
1 2 2011-12-23 2012-05-02 15
1 2 2012-05-02 2012-11-08 10
1 2 2012-11-08 NULL 17

nov 7 but still not answered
so here is my solution
not soo pretty but works
my hint is read about windowing , ranking and aggregate functions like row_number, rank , avg, sum etc.
those are essential when you want to write raports , and becoming quite powerfull in sql server 2012
i have also used CTE (common table expression) but it can be written as subquery or temporary table
;with cte ( ida, requestid, productid, testid, testdate, outcomeid) as
(
-- select rows where the outcome id is changing
select b.* from
(select ROW_NUMBER() over( partition by requestid, productid order by testDate) as id, * from #ProductTests)a
right outer join
(select ROW_NUMBER() over(partition by requestid, productid order by testDate) as id, * from #ProductTests) b
on a.requestID = b.requestID and a.productID = b.productID and a.id +1 = b.id
where 1=1
--or a.id = 1
and a.outcomeid <> b.outcomeid or b.outcomeid is null or a.id is null
)
select --*
a.RequestID,a.ProductID,a.TestDate AS StartDate ,MIN(b.TestDate) AS EndDate ,a.OutcomeID
from cte a left join cte b on a.requestid = b.requestid and a.productid = b.productid and a.testdate < b.testdate
group by a.RequestID,a.ProductID ,a.OutcomeID,a.TestDate
order by StartDate

Actually, there seem to be two problems in your question. One is how to group sequential (based on specific criteria) rows containing the same value. The other is the one actually spelled out in your title, i.e. how to use the next row's StartDate as the current row's EndDate.
Personally, I would solve these two problems in the order I mentioned them, so I would first address the grouping problem. One way to group the data properly in this case would be to use double ranking like this:
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT *
FROM grouped
;
This should give you the following output for your data sample:
RequestID ProductID StartDate OutcomeID
--------- --------- ---------- ---------
1 2 2005-01-21 10
1 2 2011-01-14 13
1 2 2011-08-10 15
1 2 2012-05-02 10
1 2 2012-11-08 17
Obviously, one thing is still missing, and it's EndDate, and now is the right time to care about it. Use ROW_NUMBER() once again, to rank the result set of the grouped CTE, then use the rankings in the join condition when joining the result set with itself (using an outer join):
WITH partitioned AS (
SELECT
*,
grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY TestDate)
- ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
FROM #ProductTests
)
, grouped AS (
SELECT
RequestID,
ProductID,
StartDate = MIN(TestDate),
OutcomeID,
rnk = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY MIN(TestDate))
FROM partitioned
GROUP BY
RequestID,
ProductID,
OutcomeID,
grp
)
SELECT
g1.RequestID,
g1.ProductID,
g1.StartDate,
g2.StartDate AS EndDate,
g1.OutcomeID
FROM grouped g1
LEFT JOIN grouped g2
ON g1.RequestID = g2.RequestID
AND g1.ProductID = g2.ProductID
AND g1.rnk = g2.rnk - 1
;
You can try this query at SQL Fiddle to verify that it returns the output you are after.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

OVER PARTITION BY without ordering [duplicate] - tsql

Related

new table from two different select column results (from the same table) in postgresql

results of except query change after repeated consecutive runs in redshift

Select rows with second highest value for each ID repeated multiple times

PGSQL duplicate record in same column

Drive EndDate of Current Row From StarDate of Next Row

Categories

Resources