Bins in Stata that will work in cem - match

In Stata is it possible (using the cem command) to create overlapping bins? For example, if a record in my treatment has DATE January 1, 2012, I want a match to be 'true' if a control record's DATE is within 2 days in either direction. I tried coding the bins manually with the treatment dates in the middle but since I have thousands of dates this is taking too long.
Using the above example control cases that would match could have dates December 30, 2011; December 31, 2011; January 1, 2012; January 2, 2012; or January 3, 2012.

You say:
I want a match to be 'true' if a control record's DATE is within 2 days
in either direction.
I have not checked the inner workings of the user-written command cem, but the variable cem_matched
(created after running cem) denotes whether an observation is matched or not and it
seems to depend on the observation belonging to a stratum in which there are
control and treatment observations. If a stratum has controled and treated
observations, they are all considered matched and cem_matched = 1. If not,
then all observations in the stratum have cem_matched = 0. So I do not see very
well how you want to modify this variable using as reference another.
Maybe you want to create the strata using the DATE variable. I'm no expert,
but to my knowledge, an observation must belong exclusively to one stratum or
another (this seems true for cem, at least). Overlapping bins violates this.
Your rule implies observations that could be to the right and left of a
certain cutpoint. From help cem:
. cem age (10 20 30 40 50) education (scott) re74, treatment(treated)
will coarsen the first variable, age into bins of (0-10), (10-20), (20-30), (30-40), (40-50) and (50+).
As you see, non-overlapping bins. What would it do if some overlapped? Where
would it assign the observation, to the bin on the left or to the right?
Some other criteria would be needed.
Maybe you want to discard (or flag) some observations per stratum based on the
DATE variable, after you run cem with other confounding covariates?
I'm not sure. Recall however that date variables in Stata can be computed on. See for example: http://www.ats.ucla.edu/stat/stata/modules/dates.htm
Note: cem is made available running ssc install cem.

Related

Looking for YOY YTD formula that works with fiscal years in tableau

I am looking for YOY YTD formula that works with fiscal years in tableau.
Method 1 appears to work here: https://resources.useready.com/blog/ytd-cy-vs-py-in-tableau-2-methods/ but does not work for Fiscal years.
the current year filter starts at Jan.
Is there anyway to adjust method 1 to work with fiscal years?
Note: i tried default properties-> fiscal year start to July and that did not work
See "goal in tableau screenshot below"
You're almost there, assuming you computed what you're showing in Tableau
and didn't just mock it up in Excel.
If you don't have one, create a calculated field "FY-Number" where
January -> 1, Feb -> 2, etc. You're already sorting in that order
so maybe you already have such a field.
Assuming your "average" is a "total" ( using AVERAGE ),
all you need now is a filter to select FY-Number < 7.
Which you COULD do manually.
But if you want this to be automatic, and always total down to the most recent month, you could do the following. There may be better ways but this works.
And I have a hierarchy of FY-number and Fiscal-Month-number for generating the grid.
Compute a running number which we can maximize, a sequential number crossing all years, for the month. I used formula for FMRun to be
[FY num]*12 + [FM num]
which has a max of 270 for December 2021. Tableau can find the max.
Next, invert that, basically, to find the max month number.
FMInvert formula:
{FIXED :(MAX([FMRun])/12 - FLOOR(max([FMRun])/12))*12}
which has a value of 6 for the data given.
Finally, filter on that with this filter I called whatmax, which we want true:
if [FM num] <= [FMinvert] then TRUE
else FALSE
END
And voila, you're done.
Here's the workbook.
https://public.tableau.com/app/profile/wade.schuette/viz/YOY-YTD-answer/Dashboard1?publish=yes

How to fill the last observations with retime in matlab?

I am interpolating variables from quarterly to monthly frequency in MATLAB. However, when I use retime it doesn't go as far as the end of the sample but it stops 2 months before.
Let me give you an example:
T = datetime(2002,01,01):calquarters:datetime(2019,12,01);
TT = timetable(T', randn(72,1))
x = retime(TT, 'monthly', 'spline') % interpolate
As you can see it gives me back 214 observations rather than 216, November and December 2019 are missing.
How can I fix it?
Thanks!
I don't have enough reputation to add a comment, but TT having 72 quarters instead of 73 means that you are actually storing dates from 1st January 2002 to 1st October 2019 - as the next quarter would start from 1st January 2020, which is then not included in your original array (you can check this by printing TT and checking if this date is included or not).
If this is the case, there is no way for retime to interpolate the missing months, as they aren't in the original matrix (that is, retime cannot interpolate from October to January, since there is no such thing in TT).
Replacing datetime(2019,12,01) with datetime(2020,01,01), as well as replacing randn(72,1) with randn(73,1), might solve your issue.

dissimilarity of community matrix with many samples equal zero, vegan package

I have community dataset for macrofauna associated with corals that I am struggling to analyze it in vegan package.
Coral colonies were imaged in 2015 (for two coral species at five sites) and we counted macrofauna species found on each colony. In 2016 and 2017, we revisited the same coral colonies to count associated fauna. So far, this is a repeated measure experiment (Year/colonyID), but I have two problems:
1- some of the revisited colonies in 2016 and 2017 had no fauna (143 out of 686 total colonies) meaning we have zero samples (n=143). This caused a problem in adonis function to test the dissimilarity.
adonis(F_Mat ~ Species+Site+Year, data = F_Meta, permutations = 9999)
you have empty rows: their dissimilarities may be meaningless in method �bray�missing values in results
I understand this message, but I must account for zero samples as they represent the dynamic of fauna community over time. I tried "bray" and "Jaccard" methods but both give me the same error message as above.
I used log1p (1+F_Mat ) to replace zeros to transform my values and replace zeros, but it did not work to calculate alpha diversity; Chaol diversity index, but worked for adonis function. To cope with that, I used dist.zeroes function in BiodiversityR package to deal with adonis, and use abundance matrix for the alpha diversity. Not sure if it the right approach though
2- Some colonies in 2015 could not be found in later years (2016 and 2017), and instead, we took images for new colonies in 2016 and 2017 that have not been visited previously. So, it is not really repeated measure and I think we should account for colony ID as a random effect instead, but this is not doable in vegan to my knowledge.
Any advice on how to analyze this dataset and troubleshoot my experimental problems? Your help is really appreciated.
From: https://rdrr.io/cran/vegan/man/vegdist.html
In principle, you cannot study species composition without species and you should remove empty sites from community data. Since vegdist is passed to adonis I believe this statement carries through.
A potential solution might be adding a dummy species that has a constant value for each site.
If you end up needing to remove these rows or columns that sum to 0 you can reference this question How to remove columns and rows that sum to 0 while preserving non-numeric columns
df <- df[rowSums(df[-(1:7)]) !=0, ]

How to get monthly totals from linearly interpolated data

I am working with a data set of 10,000s of variables which have been repeatedly measured since the 1980s. The first meassurements for each variable are not on the same date and the variables are irregularly measured - sometimes measurements are only a month apart, in a small number of cases they are decades apart.
I want to get the change in each variable per month.
So far I have a cell of dates of measurements,and interpolated rates of change between measurements (each cell represents a single variable in either, and I've only posted the first 5 cells in each array)
DateNumss= {[736614;736641;736669] [736636;736666] 736672 [736631;736659;736685] 736686}
LinearInterpss={[17.7777777777778;20.7142857142857;0] [0.200000000000000;0] 0 [2.57142857142857;2.80769230769231;0]}
How do I get monthly sums of the interpolated change in variable?
i.e.
If the first measurement for a variable is made on the January 1st, and the linearly interpolated change between that an the next measurement is 1 per day; and the next measurement is on Febuary the 5th and the corresponding linearly interpolated change is 2; then January has a total change of 1*31 (31 days at 1) and febuary has a total change of 1*5+2*23 (5 days at 1, 23 days at 2).
You would need the points in the serial dates that correspond with the change of a month.
mat(:,1)=sort(repmat(1980:1989,[1,12]));
mat(:,2)=repmat(1:12,[1,size(mat,1)/12]);
mat(:,3)=1;
monthseps=datenum(mat);
This gives you a list of all 120 changes of months in the eighties.
Now you want, for each month, the change per day, and sum it. If you take the original data it is easier, since you can just interpolate each day's value using matlab. If you only have the "LinearInterpss" you need to map it on the days using interp1 with the method 'previous'.
for ct = 2:length(monthseps)
days = monthseps(ct-1):(monthseps(ct)-1); %days in the month
%now we need each day assigned a certain change. This value depends on your "LinearInterpss". interp1 with method 'previous' searches LineairInterpss for the last value.
vals = interp1(DateNumss,LinearInterpss,days,'previous');
sum(vals); %the sum over the change in each day is the total change in a month
end

MATLAB Change numbers to date

I have time set up as serial dates. Each number corresponds to a day, in order, from 20100101 to 20130611. How do I convert the serial date to a date in the format month-year? I need this because I want to plot data and need the x axis to show the date.
Thanks!
The first step is to convert your date-format into one of the standard Matlab date formats. The best format to use for plots is the "serial date format". The numbers itself are a bit awkward, since they represent the "amount of time after 0/0/0000, in days", which is a huge number. Also, this date actually never existed, making it really weird when you want to work with dates that are BC.
However, the conversion is easy, since your format also counts the days, but you count after 31st of December, 2009. You can convert this using
numeric_date_vec = datenum(2009, 12, 31) + x;
You then plot your data using
plot(numeric_date_vec, y)
and you let Matlab add the date-ticks automatically by calling
datetick('mmm yyyy')
The problem is, the ticks do not update after zooming in. You can either call
datetick('mmm yyyy','keeplimits')
again, after each zooming or panning, or you download datetickzoom from the Matlab file exchange. It takes the same arguments as datetick, but it hooks into the zoom function and updates the ticks automatically.
Edit:
Sometimes, the dateticks are not spaced in any sensible way, then you can either try to zoom in and out a little until it snaps to something good, or you have to set the ticks manually:
% Set ticks to first day of the months in 2010
tick_locations = datenum(2012,[1:12],1);
% Set ticks on x-axis
set(gca, 'XTick', tick_locations)
% Call datetick again to get the right date labels, use option "keepticks"
datetick('mmm yyyy','keeplimits', 'keepticks')
You might have to modify the tick_locations = datenum(2012,[1:12],1) a bit to get the ticks that you want. For instance, you can use
tick_locations = datenum(2012,[1:2:25],1)
to get every second month between Jan 2012 and Jan 2013.
For day number n use
datestr(datenum(2009, 12, 31) + n, 'yyyy-mm')
for example
>> datestr(datenum(2009, 12, 31)+365, 'yyyy-mm')
ans =
2010-12
>> datestr(datenum(2009, 12, 31)+366, 'yyyy-mm')
ans =
2011-01