Find date gaps in a table - amazon-redshift

I have an AWS Redshift table looking like this:
id, id_aw_sk, id_ai_sk, snapshot_date, update_timestamp
3278059021, 3197624, 173642, today-1, today
3278059021, 3197624, 173642, today-2, today-1
3278059021, 3197624, 173642, today-3, today-2
3278059021, 3197624, 173642, today-4, today-3
etc.
3278059021, 3224904, 173642, date in past -1, date in past
This table contains snapshot on every day, to see changes in some other columns. if there's a change id_aw_sk would be different than the previous one.
What seems to be the issue is that I have some date gaps for some rows, accidently deleted rows.
As i can't retrieve those, I would like to "create" them by finding gaps in dates.
I am not sure how to do this. Please, help?
I understand that i should firstly find the gaps and for each row i would use lead function to update values from current (known) rows.
e.g. i have dates for 3278059021 where id_aw_sk was 3224904, but i have gaps for dates between 16th March 2021 until 11th April 2021 for id_aw_sk 3197624.
I know that all rows between those dates haven't changed. I only need to populated gaps with first known data (from 11th April) as rows from 16th March and later are the same even now.
I hope that I explained it okay :)
Thanks upfront for your help.

I've found a gap like this:
select *, case when lag(snapshot_date) over (partition by site_id, site_aw_sk order by snapshot_date ASC)<>snapshot_date-1
then 1 else 0 end as gap
from
dwh.fact_site_ps where site_id=3278059021;
that way I can know where are the gaps but don't know how to populate them with data that's missing.

I've solved it.
Step 1:
create table dwh_dev.tmp_fact_site_ps_gaps as
select t.*,case when gap=1 then snapshot_date else null end as gap_start,
case when gap=1 then lead(snapshot_date) over (partition by site_id, site_aw_sk order by snapshot_date asc) else null end as gap_end
from
(select *, case when lead(snapshot_date) over (partition by site_id, site_aw_sk order by snapshot_date ASC)<>snapshot_date+1
then 1 else 0 end as gap
from
dwh.fact_site_ps)t
step 2:
select tfspg.*, tdr."date"
from dwh.tmp_date_range tdr
inner join (select * from dwh_dev.tmp_fact_site_ps_gaps where gap=1)tfspg
on tdr."date" > tfspg.gap_start AND tdr."date" < tfspg.gap_end
where site_id=3278059021
order by site_id
dwh.tmp_date_range is only a table with all dates for the last 2 years.

Related

How to select data between two dates using only the start date?

I have problem select data between two dates if the only start_date is available.
The example I want to see is what discount_nr was active between 2020-07-01 and 2020-07-15 or only one day 2020-07-14. I tried different solutions, date range, generate series, and so on, but was still not able to get it to work.
Table only have start dates, no end dates
Example:
discount_nr, start_date
1, 2020-06-30
2, 2020-07-03
3, 2020-07-10
4, 2020-07-15
You can get the end dates by looking at the start date of the next row. This is done with lead. lead(start_date) over(order by start_date asc) will get you the start_date of the next row. If we take 1 day from that we'll get the inclusive end date.
Rather than separate start/end columns, a single daterange column is easier to work with. You can use that as a CTE or create a view.
create view discount_durations as
select
id,
daterange(
start_date,
lead(start_date) over(order by start_date asc)
) as duration
from discounts
Now querying it is easy using range operators. #> to check if the range contains a date.
select *
from discount_durations
where duration #> '2020-07-14'::date
And use && to see if they have any overlap.
select *
from discount_durations
where duration && daterange('2020-07-01', '2020-07-15');
Demonstration

delete all but two sorted items postgresql

In my structure I have the following, I would like to keep (yellow) the most recent dates and delete the remaining? I don't necessary know the most recent date (ie 17/4/2021 and 10/2/2021 in my example) for each stock_id but I know I want to keep only the two most recent items.
Is that possible?
Thank you
Note: this assumes that dates do not repeat within each stock_id group in your table, so top two dates are always unique.
You can assign rank to each row within stock_id after ordering by date and delete rows where rank is greater than 2.
DELETE FROM mytable
WHERE (stock_id, date) NOT IN (
SELECT
stock_id,
date
FROM (
SELECT
stock_id,
date,
row_number() over (partition by stock_id order by date desc) as rank
FROM mytable
) ranks
WHERE rank <= 2
)

HQL: Max date of previous month

Good morning,
I have a problem I've been trying to solve for but am getting now where.
I need to find the max date of the previous month. Normally I would just use the following to find the last day of the previous month: last_day(add_months(current_date, -1)
However, this particular data set doesn't always have the last day with data. E.g. Last day in the data for May was May 30th. Obviously if i try using the syntax above it would return no data because it would be looking for 5/31.
So is there a way to find the "max" day available in the data of the previous month? Or the month prior etc.?
For example like this (two scans of table: one in subquery to find max date and one in main query):
select *
from mytable
where as_of_date in (select max(as_of_date) from mytable where as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
Or (single scan + analytic function) like this
select col1 ... colN
from
(
select t.*, rank() over (partition by month (t.as_of_date) order by t.as_of_date desc) rnk
from mytable t
where --If you have partition on date, this WHERE may improve performance
t.as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
)s
where rnk=1

Have Datetable with dates and if business day, need to find the 11th business day after a date

I need to find a date that is 11 business days after a date.
I did not have a date table. Requested one, long lead time for one.
Used a CTE to produce results that have a datekey, 1 if weekday, and 1 if holiday, else 0. Put those results into a Table Variable, now Business_Day is (weekday-holiday). Much Googling has already happened.
select dt.Datekey,
(dt.Weekdaycount - dt.HolidayCount) as Business_day
from #DateTable dt[enter image description here][1]
UPDATE, I've figured it out in Excel. Running count of business days, a column of business day count + 11, then a Vlookup finding the +11 date . Now how do I do that in SQL?
Results like this
Datekey
2019-01-01
Business_day 0
Datekey
2019-01-02
Business_day
1
I will assume you want to set your weekdays, and you can enter the holidays in a variable table, so you can do the below:-
here set the weekend names
Declare #WeekDayName1 varchar(50)='Saturday'
Declare #WeekDayName2 varchar(50)='Sunday'
Set the holiday table variable, you may have it as a specific table your database
Declare #Holidays table (
[Date] date,
HolidayName varchar(250)
)
Lets insert a a day or two to test it.
insert into #Holidays values (cast('2019-01-01' as date),'New Year')
insert into #Holidays values (cast('2019-01-08' as date),'some other holiday in your country')
lets say your date you want to start from is action date and you need 11 business days after it
Declare #ActionDate date='2018-12-28'
declare #BusinessDays int=11
A recursive CTE to count the days till you get the correct one.
;with cte([date],BusinessDay) as (
select #ActionDate [date],cast(0 as int) BusinessDay
union all
select dateadd(day,1,cte.[date]),
case
when DATENAME(WEEKDAY,dateadd(day,1,cte.[date]))=#WeekDayName1
OR DATENAME(WEEKDAY,dateadd(day,1,cte.[date]))=#WeekDayName2
OR (select 1 from #Holidays h where h.Date=dateadd(day,1,cte.[date])) is not null
then cte.BusinessDay
else cte.BusinessDay+1
end BusinessDay
From cte where BusinessDay<#BusinessDays
)
--to see the all the dates till business day + 11
--select * from cte option (maxrecursion 0)
--to get the required date
select MAX([date]) from cte option (maxrecursion 0)
In my example the date I get is as below:-
ActionDate =2018-12-28
After 11 business days :2019-01-16
Hope this helps
1st step was to create a date table. Figuring out weekday verse weekends is easy. Weekdays are 1, weekends are 0. Borrowed someone else's holiday calendar, if holiday 1 else 0. Then Business day is Weekday-Holiday = Business Day. Next was to create a running total of business days. That allows you to move from whatever running total day you're current on to where you want to be in the future, say plus 10 business days. Hard coded key milestones in the date table for 2 and 10 business days.
Then JOIN your date table with your transaction table on your zero day and date key.
Finally this allows you to make solid calculations of business days.
WHERE CONVERT(date, D.DTRESOLVED) <= CONVERT(date, [10th_Bus_Day])

Updating a single column in a table

I have a table for Inventory Dollars by Vendor by Month. I want to be able to update the dollar amounts for the current month on a daily basis, but I don't want to lose the previous month's data. Here is the basic query I have:
DELETE Inventory_Dollars
FROM Inventory_Summary
WHERE MonthNum = '4'
SELECT
SUM(Cost*OnHand) AS Inventory_Dollars
FROM Inventory
The Inventory table will always hold the current data. How can I just Insert Into Inventory_Summary the data from the Select statement?
Just preface your query with an INSERT:
INSERT INTO Inventory_Summary
(Inventory_Dollars)
SELECT SUM(Cost * OnHand) AS Inventory_Dollars
FROM Inventory
If you've already inserted the inventory_dollars amount for the current month, you can then update the value every day with something like this:
UPDATE Inventory_Summary
SET Inventory_Dollars = (
SELECT (Cost * OnHand)
FROM Inventory
)
WHERE MonthNum = DATEPART(m, GETDATE()) AND Year = DATEPART(year, GETDATE())
The DATEPART can be used to fill in the number of the month for the current date, GETDATE(). Then you won't be updating the inventory_dollars values for past months.
Edit: Also added a year to the where clause, so you don't update months from past years.
Edit 2: If you use a subquery in the SET, make sure only one result can come back.