How to calculate age from different dates and years data - date

Unfortunately, my dataframe contains a mixture of my 'screenfa' date informations:
ID birthdate
a 01.03.1960
b 1943
c 1987
d 06.12.1985
e 02.06.1984
I would like to calculate the age of the IDs. How can I calculate this? It would be sufficient to calculate the age for all IDs only from the birthyear, but I cannot convert it.
as.POSIXlt(screenfa$birthdate, tz = "",
tryFormats = c('%d.%m.%Y',
'%Y'),
optional = F)
But then R changed for the dates when only the year was known to a wrong/different date.

Related

DAX Calculate Billing Days Between Two Variable Dates

I have a dimdate table that is represented below. I have each day flagged as BusinessDay Y/N. I also have a DimSalesRep table that has a daily goal for each rep. I want to be able to allow users to input a StartDt and EndDt with filters on the report and have a calculated column look at the business days between those dates. I can calculate daysbetween with defined dates but I am unsure how I would use DAX with variable dates that are applied through Report filters.
I should also note I am not sure how best to handle a startdt and enddt filter based of the column, TheDate
Cheers!
Reference your dimdate table twice
StartDate = 'dimdate'
EndDate = 'dimdate'
and use this measure:
Num BusinessDays =
CALCULATE(
COUNTROWS('dimdate'),
'dimdate'[BusinessDay] = "Y",
'dimdate'[Date] >= SELECTEDVALUE(StartDate[Date]),
'dimdate'[Date] <= SELECTEDVALUE(EndDate[Date])
)

Grouping by two variables (month-year and country)

I have to sum the number of deaths by country and by specificdate (i.e. March 2021). See the df
I manage to do it only by country by -->
aggregate(corona$deaths, by = list(corona$country),
FUN = sum)
but I don't know how to specify the exact month and year.

Totaling multiple lines into once cell with different date formats

I want to search through a column of dates in the format YYYY-MM-DD (column G - in a random order) and sum up all corresponding cost values for all dates in the same month.
So, for example, the total cost for December 2019 would be 200.
My current formula is:
=SUMPRODUCT((MONTH(G2:G6)=12)*(YEAR(G2:G6)=2019)*(H2:H6))
This gives me the total cost for that month correctly, but I cannot work out how to do this without hardcoding the year and month!
How would I do this with a formula (given the two date columns are a different format)?
You can do this easily combining SUMIFS with EDATE:
SUMIFS function
EDATE function
The formula I've used in cell B2 is:
=SUMIFS($F$2:$F$6;$E$2:$E$6;">="&A2;$E$2:$E$6;"<="&(EDATE(A2;1)-1))
For this formula to work, in column A must be first day of each month!. In cell A2 the value is 01/11/2019, but applied a format of mmmm yyyy to see it like that (and chart will do the same).
paste in D2 cell:
=ARRAYFORMULA(QUERY({EOMONTH(G2:G, -1)+1, H2:H},
"select Col1,sum(Col2)
where Col1 is not null
and not Col1 = date '1900-01-01'
group by Col1
label sum(Col2)''
format Col1 'mmm yyyy'", 0))

convert year-month string into daily dates

recently I asked how to convert calendar weeks into a list of dates and received a great and most helpful answer:
convert calendar weeks into daily dates
I tried to apply the above method to create a list of dates based on a column with "year - month". Alas i cannot make out how to account for the different number of days in different months.
And I wonder whether the package lubridate 'automatically' takes leap years into account?
Sample data:
df <- data.frame(YearMonth = c("2016 - M02", "2016 - M06"), values = c(28,60))
M02 = February, M06 = June (M11 would mean November, etc.)
Desired result:
DateList Values
2016-02-01 1
2016-02-02 1
ect
2016-02-28 1
2016-06-01 2
etc
2016-06-30 2
Values would something like
df$values / days_in_month()
Thanks a million in advance - it is honestly very much appreciated!
I'll leave the parsing of the line to you.
To find the last day of a month, assuming you have GNU date, you can do this:
year=2016
month=02
last_day=$(date -d "$year-$month-01 + 1 month - 1 day" +%d)
echo $last_day # => 29 -- oho, a leap year!
Then you can use a for loop to print out each day.
thanks to answer 6 at Add a month to a Date and answer for (how to extract number with leading 0) i got an idea to solve my own question using lubridate. It might not be the most elegant way, but it works.
sample data
data <- data_frame(mon=c("M11","M02"), year=c("2013","2014"), costs=c(200,300))
step 1: create column with number of month
temp2 <- gregexpr("[0-9]+", data$mon)
data$monN <- as.numeric(unlist(regmatches(data$mon, temp2)))
step 2: from year and number of month create a column with the start date
data$StartDate <- as.Date(paste(as.numeric(data$year), formatC(data$monN, width=2, flag="0") ,"01", sep = "-"))
step 3: create a column EndDate as last day of the month based on startdate
data$EndDate <- data$StartDate
day(data$EndDate) <- days_in_month(data$EndDate)
step 4: apply answer from Apply seq.Date using two dataframe columns to create daily list for respective month
data$id <- c(1:nrow(data))
dataL <- setDT(data)[,list(datelist=seq(StartDate, EndDate, by='1 day'), costs= costs/days_in_month(EndDate)) , by = id]

Merging average of time series corresponding to time span in a different data set

I have two datasets, one with contracts and one with market prices. The gist of what I am trying to accomplish is to find the average value of a time series that corresponds to a period of time in a cross-sectional data set. Please see below.
Example Dataset 1:
Beginning Ending Price
1/1/2014 5/15/2014 $19.50
3/2/2012 10/9/2015 $20.31
...
1/1/2012 1/8/2012 $19.00
In the example above there are several contracts, the first spanning from January 2014 to May 2014, the second from March 2012 to October 2015. Each one has a single price. The second dataset has weekly market prices.
Example Dataset 2:
Date Price
1/1/2012 $18
1/8/2012 $17.50
....
1/15/2015 $21.00
I would like to find the average "market price" (i.e. the average of the price in dataset 2) between the beginning and ending period for each contract on dataset 1. So, for the third contract from 1/1/2012 to 1/8/2012, from the second dataset the output would be (18+17.50)/2 = 17.75. Then merge this value back to the original dataset.
I work with Stata, but can also work with R or Excel.
Also, if you have a better suggestion for a title I would really appreciate it!
You can cross the contracts cross section data with the time series, which forms every pairwise combination, drop the prices from outside the date range, and calculate the mean like this:
/* Fake Data */
tempfile ts ccs
clear
input str9 d p_daily
"1/1/2012" 18
"1/8/2012" 17.50
"1/15/2015" 21.00
end
gen date = date(d,"MDY")
format date %td
drop d
rename date d
save `ts'
clear
input id str8 bd str9 ed p_contract
1 "1/1/2014" "5/15/2014" 19.50
2 "3/2/2012" "10/9/2015" 20.31
3 "1/1/2012" "1/8/2012" 19.00
end
foreach var of varlist bd ed {
gen date = date(`var',"MDY")
format date %td
drop `var'
rename date `var'
}
save `ccs'
/* Calculate Mean Prices and Merge Contracts Back In */
cross using `ts'
sort id d
keep if d >= bd & d <=ed
collapse (mean) mean_p = p_daily, by(id bd ed p_contract)
merge 1:1 id using `ccs', nogen
sort id
This gets you something like this:
id p_contract bd ed mean_p
1 19.5 01jan2014 15may2014 .
2 20.31 02mar2012 09oct2015 21
3 19 01jan2012 08jan2012 17.75