I am trying to generate result set similar in the following table. However, could not achieve the goal. I want to assign each row of the table as shown in the 'I want' column of the following table.
Following SQL generated 'RowNbr' column. Any suggestion would be appreciated. Thank you
SELECT Date, Nbr, status, ROW_NUMBER () over (partition by Date,staus order by date asc) as RowNbr
Thank you
This is a classic "gaps and islands" problem, in case you are searching for similar solutions in the future. Basically you want the counter to reset every time you hit a new status for a given Nbr, ordered by date.
This general overall technique was developed, I believe, by Itzik Ben-Gan, and he has tons of articles and book chapters about it.
;WITH cte AS
(
SELECT [Date], Nbr, [Status],
rn = ROW_NUMBER() OVER (PARTITION BY Nbr ORDER BY [Date])
- ROW_NUMBER() OVER (PARTITION BY Nbr,[Status] ORDER BY [Date])
FROM dbo.your_table_name
)
SELECT [Date], Nbr, [Status],
[I want] = ROW_NUMBER() OVER (PARTITION BY Nbr,rn ORDER BY [Date])
FROM cte
ORDER BY Nbr, [Date];
On 2012, you may be able to achieve something similar using LAG and LEAD; I made a few honest attempts but couldn't get anywhere that would end up being anything less complex than the above.
Related
I have huge Database with 15 tables.
I need to make light version of that and leave only first 1000 rows in each table based on DESC Date. I did try to find on google how to do that but nothing really works.
It will be perfect it there will be automated way to go through each table and leave only 1000 rows.
But If I need to do that manually with each table it will be fine as well.
Thank you,
This looks positively awful, but maybe it's a starting point from which you can build.
with cte as (
select mod_date, row_number() over (order by mod_date desc) as rn
from table1
),
min_date as (
select mod_date
from cte
where rn = 1000
)
delete from table1 t1
where t1.mod_date < (select mod_date from min_date)
So solution is:
DELETE FROM "table" WHERE "date" < now() - interval '1 year';
That way it will delete all data from table where Date is older that 1 year.
I have the query below, but,sometimes the 'code' value is not available and then i have to use 'new_code' instead.
My question is: how can I change the query below to prioritize use of code but when its missing use new_code instead? is it possible?
with cte as (select *,
row_number() over (partition by code order by price) rn_low,
row_number() over (partition by code order by price DESC) rn_high
from t
I had a spreadsheet that looked like a prior "group by" had left many rows blank where I needed them to be filled with the data above it (see example picture below). I needed each account number to fill all the cells beneath it until the start of the next account number (i.e., A1234 needs to be in all the cells up to B4325, B4325 needs to be in all the cells up to C3452 and so on).
From this stack exchange answer by benjamin berhault I found this code and tailored it to my problem:
SELECT rn, acct, FIRST_VALUE(acct) OVER(PARTITION BY grp)
FROM (SELECT rn, acct, SUM(CASE WHEN acct <> '' THEN 1 END) OVER (ORDER BY rn) AS grp
FROM
(SELECT ROW_NUMBER() OVER() rn
, acct
FROM dataset AS d) AS sub1 ) sub2;
What I don't understand about this query is the ORDER BY clause in this part
SUM(CASE WHEN acct <> '' THEN 1 END) OVER (ORDER BY rn) AS grp
This whole line works to successfully create a new grp column that is all 1's for the first account, all 2's for the second account and so on. From here it can use the FIRST VALUE PARTITION BY in the main query to get the result I am looking for, but what I do not understand is why does ORDER BY rn cause the column to sum in that manner? I would have thought a PARTITION BY would be needed there, but it does not work.
Every month we receive a roster which we run queries on and then generate data that gets uploaded into a table for an outside source to retrieve. My question is what would be the easiest way to remove the duplicate data from the prior months upload bearing in mind that not all data is duplicated and that if a person does not appear on the new roster their prior month needs to remain. The data is time stamped when it gets uploaded.
Thank you
You can use a cte and Row_Number() to identify and remove dupes
;with cte as (
Select *
,RN = Row_Number() over (Partition By SomeKeyField(s) Order By SomeDate Desc)
From YourTable
)
Select * -- << Remove if Satisfied
-- Delete -- << Remove Comment if Statisfied
From cte
Where RN>1
Without seeing your data structure, take a hard look a the Partition By and Order By within the OVER clause of Row_Number()
Short and efective way is delete via derived table.
delete from f
from (
select *, row_number() over (partition by col order by (select 0)) rn
from tbl) f
where rn > 1
But the most efective way is remove duplicates on input and prevent them (for example with unique constraint).
Is there any option to get the average of the same values using the RANK() function in PostgreSQL? Here is the example of what I want to do:
This query will do the trick for you
SELECT
test_score,
row_number() OVER (ORDER BY test_score) AS rank,
rank() OVER (ORDER BY test_score)
+ (count(*) OVER (PARTITION BY test_score) - 1) / 2.0 AS "rank (with tied)"
FROM scores
SQLFiddle
Remarks:
What you believe is the "rank" is really the row_number() (i.e. a consecutive series of positive integer with no gaps and no duplicates).
That rank "with tied" that you're looking for can be calculated from the real rank() (rank with gaps) + the number of other elements of the same rank divided by two. This is a faster shortcut to calculate the average row_number() given your specific requirements.
I'm pretty sure you want row_number(), not rank(). Rank will not give repeated values in the way you presented. To get the answer you're looking for:
with rwn as (
select
test_score
,row_number() over (order by test_score) rwn
from
score
)
select
test_score
,avg(rwn) average_rank
from
rwn
group by
test_score;
Here's a SQLFiddle.
#Lukas and #jeremy already explained the difference between rank() and row_number() you seemed to be missing.
You can also compute the row number (rn), and the average over rn (avg_rn) per rank (= per group of same values) in the next step:
SELECT test_score, rn, avg(rn) OVER (PARTITION BY test_score) AS avg_rn
FROM (SELECT test_score, row_number() OVER (ORDER BY test_score) AS rn FROM tbl) sub;
You need a subquery because window functions cannot be nested on the same query level.
You need another window function (not an aggregate function like has been suggested) to preserve all original rows.
The result is ordered by rn by default (for this simple query), but this is just an implementation detail. To guarantee an ordered result, add an explicit ORDER BY (for practically no cost):
...
ORDER BY rn;
SQL Fiddle.