I tried creating lag variable for a momentum plan but not sure how to proceed - momentum

this is how I sorted for lag variables data
tsset permno date, monthly
sort permno date
by permno: gen lagret1=ret[_n-1]
by permno: gen lagret2=ret[_n-2]
by permno: gen lagret3=ret[_n-3]
by permno: gen lagret4=ret[_n-4]
by permno: gen lagret5=ret[_n-5]
i don't know the rest

*Step 1: Upload the data and create key variables
*Upload the dataset that contains CRSP information and create key variables.
use "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/CRSPforMOM.dta", clear
*Keep only common stock
keep if shrcd == 10 | shrcd == 11
*Create monthindex variable
gen monthindex = year(date)*12+month(date)
*Create past 5 months of returns using lag function
*in order to use the built- in lag function I need to tell stata the
*structure of the data
tsset permno date, monthly
sort permno date
by permno: gen lagret1=ret[_n-1]
by permno: gen lagret2=ret[_n-2]
by permno: gen lagret3=ret[_n-3]
by permno: gen lagret4=ret[_n-4]
by permno: gen lagret5=ret[_n-5]
*Create a variable that captures cumulative retruns of stock i,
*from month -5 through current month
*Compounding requires multiplying consecutive returns
gen cumret6 = (1+ret)*(1+lagret1)*(1+lagret2)*(1+lagret3)*(1+lagret4)* (1+lagret5)
*Save
save "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOM1.dta", replace
*Step 2: Create and apply filters
*Before allocating stocks to portfolios, we should create and apply filters
*Select only NYSE stocks and find the 10th percentil of NYSE size in each month
use "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOM1.dta", clear
*Keep only NYSE stocks
keep if exchcd == 1
*Keep if market cap is larger than 0
keep if mktcap >0
*Drop missing observations where marketcap is missing
drop if missing(mktcap)
*Since we create portfolios monthly, we need breakpoints monthly
sort date
by date: egen p10=pctile(mktcap), p(10)
*We only need date variable (for merging) and p10 variable (as a filter),
*so we drop everything else
keep date p10
*Drop duplicates so that p10 repeats once for every month in the sample
duplicates drop
*save
save "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOMNYSEBreakpoints.dta", replace
*Merge the breakpoints into the dataset created in step 1,
*so that we can remove small firms
*Break points are date specific so merge on date
use "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOM1.dta", clear
sort date
merge m:1 date using "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOMNYSEBreakpoints.dta"
*merge==3 indicates that an observation is present in both
*master and using datasets, that is the only data that is properly merged
*and the only data that should be kept
keep if _merge==3
*We need to drop _merge variable to be able to merge data again
drop _merge
*Apply filters, i.e. remove small firms and firms priced below $5
drop if missing(mktcap)
drop if mktcap<=p10
*use absolute value because CRSP denotes BID-ASK midpoint with negative sign
drop if abs(prc)<5
*Save
save "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOM2.dta", replace
*Step 3: Allocate stocks in 10 portfolios and hold for 6 months
*Use new file
use "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOM2.dta", clear
sort date
*We will create variable prret6, which will tell us which portfolio a stock
*belongs to based on cumret 6
*We will use command xtile puts a prespecified percent of firms into
*each portfolio
*nq() tells stata how many portfolios we want
by date: egen prret6 = xtile (cumret6), nq(10) // takes ~20min to run
*Save
save "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOM3.dta", replace
*Use the portfolios
use "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOM3.dta", clear
drop if missing(prret6)
*Expand data, i.e. create 6 copies of the data
expand 6
sort permno date
*Create variable n which trackswhat copy of the data it is,
*n will go from 1 to 6
*_n is the count for the dataset/ the number for each observation
by permno date: gen n=_n
*Use n variable to increment monthindex by 1
replace monthindex = monthindex+n
sort permno monthindex
*Drop return from the master dataset because we want the one from the
*using dataset
drop ret
merge m:1 permno monthindex using "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOM1.dta"
keep if _merge==3
drop _merge
save "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOM4.dta", replace
*Step 4: Analysis
use "/Users/dk/Desktop/USD Documents/MSF/MFIN 518/MOM4.dta", clear
sort monthindex prret6 date
*Average returns based on each portfolio in each calendar month and by
*formation month
collapse (mean) ret, by (monthindex prret6 date)
*Summarize again to get average portfolio returns by calendar month (monthindex)
collapse (mean) ret, by (monthindex prret6)
*Transpose the data
reshape wide ret, i(monthindex) j(prret6) // i(rows) j(columns)
*Generate year and month variable for clarity
gen year= round(monthindex/12)
gen month=(monthindex-year*12)+6
*create momentum return variable and check for significance
gen momret=ret10-ret1
ttest ret10=ret1
*testing momentum returns from year 2000 onward
keep if monthindex>=24000
ttest ret10=ret1

Related

CDO - Resample netcdf files from monthly to daily timesteps

I have a netcdf file that has monthly global data from 1991 to 2000 (10 years).
Using CDO, how can I modify the netcdf from monthly to daily timesteps by repeating the monthly values each day of each month?
for eaxample,
convert from
Month 1, value = 0.25
to
Day 1, value = 0.25
Day 2, value = 0.25
Day 3, value = 0.25
....
Day 31, value = 0.25
convert from
Month 2, value = 0.87
to
Day 1, value = 0.87
Day 2, value = 0.87
Day 3, value = 0.87
....
Day 28, value = 0.87
Thanks
##############
Update
my monthly netcdf has the monthly values not on the first day of each month, but in sparse order. e.g. on the 15th, 7th, 9th, etc.. however one value for each month.
The question is perhaps ambiguously worded. Adrian Tompkins' answer is correct for interpolation. However, you are actually asking to set the value for each day of the month to that for the first day of the month. You could do this by adding a second CDO call as follows:
cdo -inttime,1991-01-01,00:00:00,1day in.nc temp.nc
cdo -monadd -gtc,100000000000000000 temp.nc in.nc out.nc
Just set the value after gtc to something much higher than anything in your data.
You can use inttime which interpolates in time at the interval required, but this is not exactly what you asked for as it doesn't repeat the monthly values and your series will be smoothed by the interpolation.
If we assume your dataset starts on the 1st January at time 00:00 (you don't state in the question) then the command would be
cdo inttime,1991-01-01,00:00:00,1day in.nc out.nc
This performs a simple linear interpolation between steps.
Note: This is fine for fields like temperature and seems to be want you ask for, but readers should note that one has to be more careful with flux fields such as rainfall, where one might want to scale and/or change the units appropriately.
I could not find a solution with CDO but I solved the issue with R, as follows:
library(dplyr)
library(ncdf4)
library(reshape2)
## Read ncfile
ncpath="~/my/path/"
ncname="my_monthly_ncfile"
ncfname=paste(ncpath, ncname, ".nc", sep="")
ncin=nc_open(ncfname)
var=ncvar_get(ncin, "nc_var")
## melt ncfile
var=melt(var)
var=var[complete.cases(var), ] ## remove any NA
## split ncfile by gridpoint (lat and lon) into a list
var=split(var, list(var$lat, var$lon))
var=var[lapply(var,nrow)>0] ## remove any empty list element
## create new list and replicate, for each gridpoint, each monthly value n=30 times
var_rep=list()
for (i in 1:length(var)) {
var_rep[[i]]=data.frame(value=rep(var[[i]]$value, each=30))
}

How to get corresponding price of the smallest date chosen in date slicer in Power BI

I am quite new to working with DAX and Power BI so please don't judge. My problem seems (and might be) simple. Anyways, here we go:
I have a dataset that contains 3 colulmns: Date (date), Price (float), Performance (%)
Attribute descriptions:
Date and Price are constants that are pulled from an external data source. Performance is a variable of the price change over time in percent. It is the percentage change of the price of the current date to the first date in the time-series selection (Selected "from date" of date slicer visual).
I want to create a dynamic line chart that shows performance over time. Difficulty here is when I change the "from date" I want the performance to be variable. Meaning, the price of the chosen "from date" is the new base price and should be calculated accordingly.
Formula:
Date = t, price at date t = pt, performance at date t = pert
Date range:
1.1.2000 to 31.12.2010
Initial situation when "date from" in the date slicer visual = 1.1.2000:
t0 = 1.1.2000
pt0 = 5,00
pert0 = 0%
t5 = 6.1.2000
pt5 = 5,054
pert5 = (pt5-pt0)/pt0 = 1.08%
After changing date slicer so that "from date" is now 10.10.2009:
t0new = 10.10.2009
pt0new = 9,938
pert0new = 0%
t5new = 15.10.2009
pt5new = 9,832
pert5new = (pt5-pt0)/pt0 = -1,05%
As described, I want whatever is selected as starting point from the date slicer as the new base value for the performance calculation and the line chart should adjust accordingly.
I know how to do the dynamic line chart but I cannot figure out the measures and calculated columns I need to do so.
Any help is very much appreciated!
Cheers,
MLU
Calculate the benchmark as the price associated to the first date in
the period. SELECTEDVALUE assumes you have one price per Date,
otherwise use an aggregator (e.g. MIN, MAX, AVERAGE). I use ALLSELECTED so the Benchmark is affected only by Filter Context (slicers) and you can easily use it in visualizations that change the context.
Save our benchmark in a variable for later use
Divide each price by the benchmark. Here we need to apply an aggregator to the Price,
I used AVERAGE assuming you have only one Price per day, therefore, the result is the
price itself.
Here is the measure:
Price vs Dynamic Benchmark :=
VAR vbenchmark = CALCULATE(SELECTEDVALUE(Dataset[Price]),FILTER(ALL( Dataset[Date]), Dataset[Date] = CALCULATE(min(Dataset[Date])), ALLSELECTED(Dataset))
return
AVERAGE(Price) / vbenchmark

Manipulating last two rows if there's data based on a Cut date

This question is a slightly varied version of this one...
Now I'm using Measures instead of Calculated columns and the date is static instead of having it based on a dropdown list.
Here's the Power BI test .pbix file:
https://drive.google.com/open?id=1OG7keqhdvDUDYkFQFMHyxcpi9Zi6Pn3d
This printscreen describes what I'm trying to accomplish:
Basically the date in P6 Update table is used as a cut date and will be fixed\static. It's imported from an Excel sheet where the user can customize it however they want.
Here's what should happen when a matching row in Test data table is found for P6 Update date:
column Earned Daily - must have its value summed with the next row if there's one;
column Earned Cum - must grab the next row's value;
all the previous rows should remain intact, that is, their values won't change;
all subsequent rows must have their values assigned 0.
So for example:
If P6 Update is 1-May-2018, this is the expected result:
1-May 7,498 52,106
2-May 0 0
If P6 Update is 30-Apr-2018, this is the expected result:
30-Apr 13,173 50,699
1-May 0 0
2-May 0 0
If P6 Update is 29-Apr-2018, this is the expected result:
29-Apr 11,906 44,608
30-Apr 0 0
1-May 0 0
2-May 0 0
and so on...
Hope this makes sense.
This is easier in Excel, but trying to do this in Power BI is making me go nuts.
I will ignore previously asked related questions and start from scratch.
First, create a measure:
Current Earn =
CALCULATE (
SUM( 'Test data'[Value]),
'Test data'[Act Rem] = "Actual Units",
'Test data'[Type] = "Current"
)
This measure will be used in other measures, to save you from typing all these conditions ("Actual Units" and "Current") again and again. It's a great practice to re-use measures in other measures - saves work, makes code cleaner and easier to refactor.
Create another measure:
Cut Date = SELECTEDVALUE('P6 Update'[Date])
We will use this measure whenever we need a cut off date. Please note that it does not have to be hard-coded - if P6 table contains a list of dates, you can create a pull-down slicer from the dates, and can choose the cut-off date dynamically. The formula will work properly.
Create third measure:
Next Earn =
VAR Cut_Date = [Cut Date]
VAR Current_Date = MAX ( 'Test data'[Date] )
VAR Next_Date = Current_Date + 1
VAR Current_Earn = [Current Earn]
VAR Next_Earn = CALCULATE ( [Current Earn], 'Test data'[Date] = Next_Date )
RETURN
SWITCH (
TRUE,
Current_Date < Cut_Date, Current_Earn,
Current_Date = Cut_Date, Current_Earn + Next_Earn,
BLANK ()
)
I am not sure if "Next Earn" is a good name for it, hopefully you will find a more intuitive name. The way it works: we save all necessary inputs into variables, and then use SWITCH function to define the results. Hopefully it's self-explanatory. (Note: if you need 0 above Cut Date, replace BLANK() with 0).
Finally, we define a measure for cumulative earn. It does not require any special logic, because previous measure takes care of it properly:
Cum Earn =
VAR Current_Date = MAX('Test data'[Date])
RETURN
CALCULATE(
[Next Earn],
FILTER(ALL('Test data'[Date]), 'Test data'[Date] <= Current_Date))
Result:

How do you merge lines in a single dataset with some duplicate values?

I am analyzing a medical record dataset where the patients were screened for STIs at 4 different times points. The data manager created a line per patient per STI for each time period. I want to merge the dataset so there is one line per patient at each time point with all of the diagnosed STI listed.
I created the new variables to capture each STI that would be listed under the Dx variable, but I can't figure out how to merge data within the same dataset so there is only one per patient at each timepoint.
data dx;
set dx;
if dx='ANOGENITAL WARTS (CONDYLOMATA ACUMINATA)' then MRWarts=1;
if dx='CHLAMYDIA' then MRCHLAMYDIA=1;
if dx='DYSPLASIA (ANAL, CERVICAL, OR VAGINAL)' then MRDYSPLASIA=1;
if dx='GONORRHEA' then MRGONORRHEA=1;
if dx='HEPATITIS B (HBV)' then MRHEPB=1;
if dx='HUMAN PAPILLOMAVIRUSES (HPV)-ANY MANIFESTATION' then MRHPV=1;
if dx='PEDICULOSIS PUBIS' then MRPUBIS=1;
if dx='SYPHILIS' then MRSYPHILIS=1;
if dx='TRICHOMONAS VAGINALIS' then MRTRICHOMONAS=1;
run;
Image of data structure I am looking for
taking the sample dataset that you provided in the image, you can use simple transpose for desired outcome.
data have;
input Pt_ID interval_round DX $10.;
datalines;
4 1 HIV
4 1 Warts
3 1 HIV
5 2 Chlamydia
;
run;
proc sort data=have1; by Pt_Id; run;
proc transpose data=have1 out=want(drop=_NAME_);
by Pt_Id;
id Dx;
var interval_round;
run;
proc print data=want; run;
Now this code will create all variables except interval_round, Say for example - a patient was screened for HIV in round 1 and Warts for round 2. Technically it should have only one row .. so how would you represent the interval_round then?

displaying dates of values - time series data

I am trying to maintain a table using some panel data. I have all the data outputting fine, but I am having difficulty getting the correct dates to display. The method I am using is the following:
gen ymdny = date(date,"MDY"); /*<- date var from panel dataset that i import*/
sort name ymdny;
summ ymdny;
local lastdate : disp %tdM-D r(max);
local lastdate2 : disp %tdM-D (r(max)-1);
local lastw : disp %tdM-D (r(max)-7);
This would work fine if the data were daily, but the dataset I have is actually business daily (ie. missing for the weekends and bank national holidays). It seems silly but I have not been able to figure out a workaround that does the job. Ideally - there is a function that i can use to print the corresponding date to a particular value.
For example:
gen resbal_1d = round(l1.resbal,0.1);
gen dateOf = dateOf(resbal_1d); /* <- pseudocode example of what I would like */
I'm not sure what you're asking for but my guess is that you want to see a human readable form date as the output, given a numerical input. (This is your last sentence.) So simply try something like:
display %td 10
The format is important as the following shows (see help format):
display %tq 10
Same numerical input, different format, different output.
Two other examples from the manual:
* string to integer
display date("5-12-1998", "MDY")
* string to date format
display %td date("5-12-1998", "MDY")
As for your example code, I don't get what you're aiming for. In effect, you can summarize the date variable because in Stata, dates are just integers. It's legal but couldn't say if it's good form. Below a simple example.
clear all
set more off
set obs 10
gen date = _n // create the data
format date %td // give date format
list
summarize date
local onedate = r(max)
display %td `onedate'
Some references:
[U] 24 Working with dates and time
help datetime
help datetime business calendars
http://www.stata.com/support/faqs/data-management/creating-date-variables/
http://www.ats.ucla.edu/stat/stata/modules/dates.htm
(Maybe you can explain with more detail and context what it is you want.)
Edit
Your comment
I do not see how this helps with the date output. For example,
displaying r(max) - 1 on a monday will still display the sunday date.
does not explain, at all, the problems you're having with Stata's business calendars.
I'm adding what is basically an example taken from the help file I already referenced. I do this with the hope of convincing you that (re)-reading the help files is worthwhile.
*clear all
set more off
* import string dates
infile str10 sdate float x using http://www.stata-press.com/data/r13/bcal_simple
list
*----- Regular dates -----
* create elapsed dates - Stata's way of managing dates
generate rdate = date(sdate, "MD20Y")
format rdate %td
drop sdate x
list
* compute previous and next dates
generate tomorrow1 = rdate + 1
format tomorrow1 %td
generate yesterday1 = rdate - 1
format yesterday1 %td
list
*----- Business dates -----
* convert regular date to business dates
generate bdate = bofd("simple", rdate)
format bdate %tbsimple
* compute previous and next dates
generate tomorrow2 = bdate + 1
format tomorrow2 %tbsimple
generate yesterday2 = bdate - 1
format yesterday2 %tbsimple
order yesterday1 rdate tomorrow1 yesterday2 bdate tomorrow2
list
/*
The stbcal-file for simple, the calendar shown below,
November 2011
Su Mo Tu We Th Fr Sa
---------------------------
1 2 3 4 X
X 7 8 9 10 11 X
X 14 15 16 17 18 X
X 21 22 23 X X X
X 28 29 30
---------------------------
*/
Notice that if you add or substract 1 from a regular date, then business days are not taken into account. If you do the same with a business calendar date, you get what you want. Business calendars are defined by .stbcal files; the example uses a built-in calendar called simple. You maybe need to make your own .stbcal file but it is not difficult. Again, the details are in the help files.