recently I asked how to convert calendar weeks into a list of dates and received a great and most helpful answer:
convert calendar weeks into daily dates
I tried to apply the above method to create a list of dates based on a column with "year - month". Alas i cannot make out how to account for the different number of days in different months.
And I wonder whether the package lubridate 'automatically' takes leap years into account?
Sample data:
df <- data.frame(YearMonth = c("2016 - M02", "2016 - M06"), values = c(28,60))
M02 = February, M06 = June (M11 would mean November, etc.)
Desired result:
DateList Values
2016-02-01 1
2016-02-02 1
ect
2016-02-28 1
2016-06-01 2
etc
2016-06-30 2
Values would something like
df$values / days_in_month()
Thanks a million in advance - it is honestly very much appreciated!
I'll leave the parsing of the line to you.
To find the last day of a month, assuming you have GNU date, you can do this:
year=2016
month=02
last_day=$(date -d "$year-$month-01 + 1 month - 1 day" +%d)
echo $last_day # => 29 -- oho, a leap year!
Then you can use a for loop to print out each day.
thanks to answer 6 at Add a month to a Date and answer for (how to extract number with leading 0) i got an idea to solve my own question using lubridate. It might not be the most elegant way, but it works.
sample data
data <- data_frame(mon=c("M11","M02"), year=c("2013","2014"), costs=c(200,300))
step 1: create column with number of month
temp2 <- gregexpr("[0-9]+", data$mon)
data$monN <- as.numeric(unlist(regmatches(data$mon, temp2)))
step 2: from year and number of month create a column with the start date
data$StartDate <- as.Date(paste(as.numeric(data$year), formatC(data$monN, width=2, flag="0") ,"01", sep = "-"))
step 3: create a column EndDate as last day of the month based on startdate
data$EndDate <- data$StartDate
day(data$EndDate) <- days_in_month(data$EndDate)
step 4: apply answer from Apply seq.Date using two dataframe columns to create daily list for respective month
data$id <- c(1:nrow(data))
dataL <- setDT(data)[,list(datelist=seq(StartDate, EndDate, by='1 day'), costs= costs/days_in_month(EndDate)) , by = id]
Related
I am trying to create a list of the last days of each month for the past n months from the current date but not including current month
I tried different approaches:
def last_n_month_end(n_months):
"""
Returns a list of the last n month end dates
"""
return [datetime.date.today().replace(day=1) - datetime.timedelta(days=1) - datetime.timedelta(days=30*i) for i in range(n_months)]
somehow this partly works if each every month only has 30 days and also not work in databricks pyspark. It returns AttributeError: 'method_descriptor' object has no attribute 'today'
I also tried the approach mentioned in Generate a sequence of the last days of all previous N months with a given month
def previous_month_ends(date, months):
year, month, day = [int(x) for x in date.split('-')]
d = datetime.date(year, month, day)
t = datetime.timedelta(1)
s = datetime.date(year, month, 1)
return [(x - t).strftime('%Y-%m-%d')
for m in range(months - 1, -1, -1)
for x in (datetime.date(s.year, s.month - m, s.day) if s.month > m else \
datetime.date(s.year - 1, s.month - (m - 12), s.day),)]
but I am not getting it correctly.
I also tried:
df = spark.createDataFrame([(1,)],['id'])
days = df.withColumn('last_dates', explode(expr('sequence(last_day(add_months(current_date(),-3)), last_day(add_months(current_date(), -1)), interval 1 month)')))
I got the last three months (Sep, oct, nov), but all of them are the 30th but Oct has Oct 31st. However, it gives me the correct last days when I put more than 3.
What I am trying to get is this:
(last days of the last 4 months not including last_day of current_date)
daterange = ['2022-08-31','2022-09-30','2022-10-31','2022-11-30']
Not sure if this is the best or optimal way to do it, but this does it...
Requires the following package since datetime does not seem to have anyway to subtract months as far as I know without hardcoding the number of days or weeks. Not sure, so don't quote me on this....
Package Installation:
pip install python-dateutil
Edit: There was a misunderstanding from my end. I had assumed that all dates were required and not just the month ends. Anyways hope the updated code might help. Still not the most optimal, but easy to understand I guess..
# import datetime package
from datetime import date, timedelta
from dateutil.relativedelta import relativedelta
def previous_month_ends(months_to_subtract):
# get first day of current month
first_day_of_current_month = date.today().replace(day=1)
print(f"First Day of Current Month: {first_day_of_current_month}")
# Calculate and previous month's Last date
date_range_list = [first_day_of_current_month - relativedelta(days=1)]
cur_iter = 1
while cur_iter < months_to_subtract:
# Calculate First Day of previous months relative to first day of current month
cur_iter_fdom = first_day_of_current_month - relativedelta(months=cur_iter)
# Subtract one day to get the last day of previous month
cur_iter_ldom = cur_iter_fdom - relativedelta(days=1)
# Append to the list
date_range_list.append(cur_iter_ldom)
# Increment Counter
cur_iter+=1
return date_range_list
print(previous_month_ends(3))
Function to calculate date list between 2 dates:
Calculate the first of current month.
Calculate start and end dates and then loop through them to get the list of dates.
I have ignored the date argument, since I have assumed that it will be for current date. alternatively it can be added following your own code which should work perfectly.
# import datetime package
from datetime import date, timedelta
from dateutil.relativedelta import relativedelta
def gen_date_list(months_to_subtract):
# get first day of current month
first_day_of_current_month = date.today().replace(day=1)
print(f"First Day of Current Month: {first_day_of_current_month}")
start_date = first_day_of_current_month - relativedelta(months=months_to_subtract)
end_date = first_day_of_current_month - relativedelta(days=1)
print(f"Start Date: {start_date}")
print(f"End Date: {end_date}")
date_range_list = [start_date]
cur_iter_date = start_date
while cur_iter_date < end_date:
cur_iter_date += timedelta(days=1)
date_range_list.append(cur_iter_date)
# print(date_range_list)
return date_range_list
print(gen_date_list(3))
Hope it helps...Edits/Comments are welcome - I am learning myself...
I just thought a work around I can use since my last codes work:
df = spark.createDataFrame([(1,)],['id'])
days = df.withColumn('last_dates', explode(expr('sequence(last_day(add_months(current_date(),-3)), last_day(add_months(current_date(), -1)), interval 1 month)')))
is to enter -4 and just remove the last_date that I do not need days.pop(0) that should give me the list of needed last_dates.
from datetime import datetime, timedelta
def get_last_dates(n_months):
'''
generates a list of lastdates for each month for the past n months
Param:
n_months = number of months back
'''
last_dates = [] # initiate an empty list
for i in range(n_months):
last_dates.append((datetime.today() - timedelta(days=i*30)).replace(day=1) - timedelta(days=1))
return last_dates
This should give you a more accurate last_days
I have 4 dates in a month (4 weeks)
like:
var1
2001-01-10
2001-01-15
2001-01-20
2001-01-30
2001-02-10
2001-02-15
2001-02-20
2001-02-30
I want to creat a new var with just the max date in this month:
like:
var2=max(var1) for the same month
0
0
0
2001-01-30
0
0
0
2001-02-30
how can i do this?
I think I understand what you're after, do you only want to show the max DAY for a given month, for every month?
If so, create a table calculation with a formula like this:
{FIXED DATETRUNC('month',[date field]): MAX([date field])}
To explain:
FIXED = apply the aggregate function to only these 'windows' for the calculation. Here we're fixing at the month level. But you could do several layers like month, category, subcategory
: = after the colon you can apply the calculation that you want to do, in your case it's the maximum month
So bringing that together in a sentence, what we're doing is "For each date, truncate to the month level, then for each month, what's the maximum date?"
Hope that helped
i need to calculate difference between two date excluding sunday. I have table with dates and i need to calculate number of dates of repeated days from last date.
if i have dates like that
27-05-2017
29-05-2017
30-05-2017
I use this code in script
date(max(Date)) as dateMax,
date(min(Date)) as dateMin
And i get min date = 27-05-2017 and max date = 30-05-2017 then i use in expressions
=floor(((dateMax - dateMin)+1)/7)*6 + mod((dateMax - dateMin)+1,7)
+ if(Weekday(dateMin) + mod((dateMax - dateMin)+1,7) < 7, 0, -1)
And get result 3 days. Thats OK, but the problem is if I have next dates:
10-05-2017
11-05-2017
27-05-2017
29-05-2017
30-05-2017
When use previously code I get min date = 10-05-2017 and max date = 30-05-2017 and result 18, but this is not OK.
I need to count only dates from
27-05-2017
29-05-2017
30-05-2017
I need to get max date and go throw loop repeated dates and if have brake to see is that date sunday if yes then step that date and continue to count repeated dates and if i again have break and if not sunday than close loop and remember number of days.
In my case instead of 18 days i need to get 3 days.
Any idea?
I'd recommend you creating a master calendar in the script where you can apply weights or any other rule to your days. Then in your table or app you can just loop through the dates or perform operations and sum their weights (0: if sunday, 1: if not). Let's see an example:
// In this case I'll do a master calendar of the present year
LET vMinDate = Num(MakeDate(year(today()),1,1));
LET vMaxDate = Num(MakeDate(year(today()),12,31));
Calendar_tmp:
LOAD
$(vMinDate) + Iterno() - 1 as Num,
Date($(vMinDate) + Iterno() - 1) as Date_tmp
AUTOGENERATE 1 WHILE $(vMinDate) + Iterno() - 1 <= $(vMaxDate);
Master_Calendar:
LOAD
Date_tmp AS Date,
Week(Date_tmp) as Week,
Year(Date_tmp) as Year,
Capitalize(Month(Date_tmp)) as Month,
Day(Date_tmp) as Day,
WeekDay(Date_tmp) as WeekDay,
if(WeekDay = '7',0,1) as DayWeight //HERE IS WHERE YOU COULD DEFINE A VARIABLE TO DIRECTLY COUNT THE DAY IF IT IS NOT SUNDAY
'T' & ceil(num(Month(Date_tmp))/3) as Quarter,
'T' & ceil(num(Month(Date_tmp))/3) & '-' & right(year(Date_tmp),2) as QuarterYear,
date(monthStart(Date_tmp),'MMMM-YYYY') as MonthYear,
date(monthstart(Date_tmp),'MMM-YY') as MonthYear2
RESIDENT Calendar_tmp
ORDER BY Date_tmp ASC;
DROP Table Calendar_tmp;
I have a combo chart which shows the average days it took for a person to pay their bill.
The Dimension of the Chart is = [Pay Month Year last 12 months]
There are no Dimension limits
There is 1 expression which is called Average and its definition is:
avg({< InvoicefromSqlType = {'Invoices'},[Is Invoice Paid] = {'Y'},[Is Positive Amount] = {'Y'},[Is Paid last 12 months] = {'Y'},DueGroups=,[Pay Month Year last 12 months]=>}[Days to Pay])`
Its is sorted by expression, which is [Pay Month Year last 12 months]
Now the above fields are built up like this:
[Pay Month Year last 12 months]
If([Pay Date] >= '$(vPeriodS12)',[Pay Month year]) as [Pay Month Year last 12 months],
PayLoadOrder:
Load * Inline [Pay Month Year last 12 months
May-2014
Jun-2014
Jul-2014
Aug-2014
Sep-2014
Oct-2014
Nov-2014
Dec-2014
Jan-2015
Feb-2015
Mar-2015
Apr-2015
May-2015
];
Now what is happening is every month when it reaches the end, the next month is needing to be manually added and the first month removed (e.g. in the above I would delete the line May-2014 and at the end add the line Jun-2015)
Also if there are months defined which there is no data for yet i.e. you have Jun-2015 hard coded and the current month this May-2015 then Jun-2015 will show data from 2014 and the order of the months will get mixed up.
What I want to do is completely remove the need to hard code the months above and have this done its self.
If there are any more information you need let me know
Instead of using a "sort order" table that you may have to manually update, it may be worth doing the following:
Create a new field derived from [Pay Date] that returns a month and year that you can sort. For example:
dual(date(makedate(year([Pay Date]),num(month([Pay Date]))),'MMM-yyyy'),
year([Pay Date]) * 100 + num(month([Pay Date]))) as PayMonthYear
Here, the dual function allows you to associate a different representation of a field value to its underlying value. For example, here we set the underlying data to be the year of [Pay Date] added to the month, but state that it should be displayed as MMM-yyyy. For example, internally, QV still sees the value 201502, but displays it as Feb-2015. This means you can sort it correctly based on its underlying value.
Using dual is a big topic, please consult the built-in help for QV for more information.
Change your chart dimension from [Pay Month Year last 12 months] to use PayMonthYear and set the sort to ascending. This would then mean that your months would be sorted correctly, even if a new month was added.
Remove table PayLoadOrder from your script.
Alternative Method
An alternative would be to use a calendar table which joins on your Pay Date field. This would achieve the same thing, however, you could also integrate your "year to date" indicator into the calendar as well and remove it from your main table. An example I quickly threw together is shown below:
MinMax:
LOAD
Max([Pay Date]) AS MaxDate,
Min([Pay Date]) AS MinDate
RESIDENT MyData;
LET varMinDate = Num(Peek('MinDate',0,'MinMax')); // 0 is first record
LET varMaxDate = Num(Peek('MaxDate',-1,'MinMax')); // -1 is last record
LET varToday = Num(Today());
MasterCalendar:
LOAD
monthstart([Pay Date]) >= monthstart(AddMonths(Today(),-12)) as PaidInLast12MonthsFlag,
dual(date(makedate(year([Pay Date]),num(month([Pay Date]))),'MMM-yyyy'),year([Pay Date]) * 100 + num(month([Pay Date]))) as PayMonthYear
[Pay Date];
LOAD
date($(varMinDate) + RecNo() - 1,'DD/MM/YYYY') as [Pay Date]
AUTOGENERATE num($(varMaxDate)) - num($(varMinDate)) + 1;
DROP TABLE MinMax;
So, in the above, the field PaidInLast12MonthsFlag is equal to -1 if the value of the [Pay Date] field occurs in the last 12 months, 0 otherwise. You could use this in your set analysis expression as a filter. Furthermore, you can use PayMonthYear as your chart dimension.
In my quarterly report Im trying to validate the two parameters StartDate and EndDate.
I first check if the difference between the dates is 2 months:
Switch(DateDiff(
DateInterval.Month, Parameters!StartDate.Value, Parameters!EndDate.Value) <> 2,
"Error message")
Then I try to add whether the StartDate is the first day of month AND EndDate is last day of month:
And (Day(Parameters!StartDate.Value) <> 1
And Day(DATEADD(DateInterval.Day,1,Parameters!EndDate.Value)))
So the whole expression looks like this:
Switch(DateDiff(DateInterval.Month, Parameters!StartDate.Value, Parameters!EndDate.Value) <> 2
And
Parameters!IsQuarterly.Value = true
And
Day(Parameters!StartDate.Value) <> 1
And
Day(DATEADD(DateInterval.Day,1,Parameters!EndDate.Value))<>1),
"Error: Quarterly report must include 3 months")
But It works wrong when the difference between dates is still 2 months, but StartDate and EndDate are not first and last day of the whole period.
I'd appreciate any help :)
I would say just change the implementation Add another two Parameter With Quarter and Year
Quarter like Q1,Q2,Q3 & Q4 with Value 1,2,3 & 4 respectively and year 2012,2013,2014 & so on
Now based on the parameter selected Qtr & Year set Default value of start & End Date
=DateSerial(Parameters!Year.Value), (3*Parameters!Qtr.Value)-2, 1) --First day of Quarter
=DateAdd("d",-1,DateAdd("q",1,Parameters!Year.Value, (3*Parameters!Qtr.Value)-2, 1))) --Last day of quarter
Doing this no need to do any validation bcz its always get the correct Date Difference.
Other Reference
First day of current quarter
=DateSerial(Year(Now()), (3*DatePart("q",Now()))-2, 1)
Last day of current quarter
=DateAdd("d",-1,DateAdd("q",1,DateSerial(Year(Now()), (3*DatePart("q",Now()))-2, 1)))