Stata: Loop up if observation exists - find

I have a dataset of the following form:
vars:
year, firm, executive
data:
2002, initech, steve
2002, microtech, john
2003, initech, mike
2003, microtech, john
I want to add a new variable "sticksaround" that indicates whether a given executive is still with his firm the next year. For my data, I would want the created values to be:
0
1
0 or missing (both fine)
0 or missing (both fine)
Any thoughts on how I would best go about this?
I was thinking about looping over all observations -- but how do I check if there is an entry with the same executive for the next year?

Try:
bysort firm (year): gen sticksaround = executive == executive[_n+1]
E.g.:
clear
input year str15 firm str5 executive
2002 "initech" "steve"
2002 "microtech" "john"
2003 "initech" "mike"
2003 "microtech" "john"
end
bysort firm (year): gen sticksaround = executive == executive[_n+1]
li
Or if you need to compare (e.g.) 2002 to 2003, and not 2002 to 2004 if 2003 is missing, use tsset and try (note that you will need to encode firm and executive):
gen sticksaround = executive == F.executive
See help by and help tsset for more information.

Related

CDO - Resample netcdf files from monthly to daily timesteps

I have a netcdf file that has monthly global data from 1991 to 2000 (10 years).
Using CDO, how can I modify the netcdf from monthly to daily timesteps by repeating the monthly values each day of each month?
for eaxample,
convert from
Month 1, value = 0.25
to
Day 1, value = 0.25
Day 2, value = 0.25
Day 3, value = 0.25
....
Day 31, value = 0.25
convert from
Month 2, value = 0.87
to
Day 1, value = 0.87
Day 2, value = 0.87
Day 3, value = 0.87
....
Day 28, value = 0.87
Thanks
##############
Update
my monthly netcdf has the monthly values not on the first day of each month, but in sparse order. e.g. on the 15th, 7th, 9th, etc.. however one value for each month.
The question is perhaps ambiguously worded. Adrian Tompkins' answer is correct for interpolation. However, you are actually asking to set the value for each day of the month to that for the first day of the month. You could do this by adding a second CDO call as follows:
cdo -inttime,1991-01-01,00:00:00,1day in.nc temp.nc
cdo -monadd -gtc,100000000000000000 temp.nc in.nc out.nc
Just set the value after gtc to something much higher than anything in your data.
You can use inttime which interpolates in time at the interval required, but this is not exactly what you asked for as it doesn't repeat the monthly values and your series will be smoothed by the interpolation.
If we assume your dataset starts on the 1st January at time 00:00 (you don't state in the question) then the command would be
cdo inttime,1991-01-01,00:00:00,1day in.nc out.nc
This performs a simple linear interpolation between steps.
Note: This is fine for fields like temperature and seems to be want you ask for, but readers should note that one has to be more careful with flux fields such as rainfall, where one might want to scale and/or change the units appropriately.
I could not find a solution with CDO but I solved the issue with R, as follows:
library(dplyr)
library(ncdf4)
library(reshape2)
## Read ncfile
ncpath="~/my/path/"
ncname="my_monthly_ncfile"
ncfname=paste(ncpath, ncname, ".nc", sep="")
ncin=nc_open(ncfname)
var=ncvar_get(ncin, "nc_var")
## melt ncfile
var=melt(var)
var=var[complete.cases(var), ] ## remove any NA
## split ncfile by gridpoint (lat and lon) into a list
var=split(var, list(var$lat, var$lon))
var=var[lapply(var,nrow)>0] ## remove any empty list element
## create new list and replicate, for each gridpoint, each monthly value n=30 times
var_rep=list()
for (i in 1:length(var)) {
var_rep[[i]]=data.frame(value=rep(var[[i]]$value, each=30))
}

lubridate assigns unrealistic date

I have a vector of dates in either dmY formats and Ymd format.
These are all dates in the last century.
From each, I need to extract just the year (Y).
I use the following code
library(lubridate)
sampleDates <- c(20100517,17052010)
result <- year(parse_date_time(x, guess_formats(as.character(x), c("Ymd","dmY"))))
result
517 2010
However, I expect something like
result
2010 2010
Here is a base R solution to your problem that takes a particular difficulty into account with your date format. Let's say you have the date 20112020, i.e. November 20th in the year 2020. For your function, it is not easy to distinguish which part of the string is the year - is it 2011 or 2020? The following code takes this difficulty into account, though let me mention that there surely must be simpler solutions.
Code
NonID <- grepl("^2", sampleDates) & (substr(sampleDates, 5, 5) == "2")
ID <- !NonID
dates_normal <- sampleDates[!NonID]
dates_special <- sampleDates[NonID]
normal_years <- as.numeric(c(substr(dates_normal, nchar(dates_normal) - 3, nchar(dates_normal)), substr(dates_normal, 1, 4)))
normal_years <- normal_years[normal_years > 1999]
special_years <- as.numeric(substr(dates_special, nchar(dates_special) - 3, nchar(dates_special)))
all_years <- c(normal_years, special_years)
all_years
> all_years
[1] 2010 2010
Explanation
First, we divide the date vector into those dates which exhibit the indistinguishability (dates_normal) and those which do not (dates_special). Then, for the normal dates, we use the substr() function to extract the first four and last four digits of the string and keep only those values which exceed 2000. For the special dates, we only keep the last four digits because the year can't be possibly included in the first four digits for this date format.

What is number 1262304000000 in flot charts?

I am newbie in flot charts, i don't know what is the number 1262304000000, 1264982400000, 1267401600000,... meaning.
var d1 = [[1262304000000, 6], [1264982400000, 2057], [1267401600000, 2043], [1270080000000, 2198], [1272672000000, 2660], [1275350400000, 1826], [1277942400000, 1302], [1280620800000, 2237], [1283299200000, 2004], [1285891200000, 2144], [1288569600000, 1577], [1291161600000, 1295]];
Thank you very much
The three first numbers is the unix-date of 1 jan 2010, 1. feb 2010, 1. mar 2010.
The unit is milliseconds since 1/1/1970
You can try at http://www.onlineconversion.com/unix_time.htm and delete the last three zeroes, as this site counts in seconds, not milliseconds.

Loop over fieldnames in a MatLab structure

I have a MatLab "struct", with different "level" and "sub-structures". When printed to a cell, the data contained in the "struct", look like that:
report.COUNTRY.SOURCE.SCENARIO.CATEGORY.ENTITY = YEAR YEAR ...;
As a minimal example:
report.HUN.CRF2014.BASEYEAR.CAT0.CO2 = 1991 1992 1993 1994
report.HUN.CRF2014.BASEYEAR.CAT0.CH4 = 1995 1996 1997
report.HUN.CRF2014.BASEYEAR.CAT0.H2S = 1990 1991 1992
report.HUN.CRF2014.BASEYEAR.CAT1.N2 = 1991 1992 1993
report.HUN.CRF2014.BASEYEAR.CAT1.FGASES = 1990 1991 1992
In order to produce tables listing the different variables combinations, I would like to loop over the fieldnames contained within the "struct".
I am currently trying to write a function able to do that:
fields=fieldnames(struct);
for categoryidx=1:length(fields)
categoryname=fields{categoryidx};
if isstruct(struct.(categoryname))
category=fieldnames(struct.(categoryname));
for entityidx = 1:length(category);
entityname = category{entityidx};
if isstruct(struct.(categoryname).(entityname))
gases=fieldnames(struct.(categoryname).(entityname));
end
end
end
end
Unfortunately, this is just producing anything! Does anyone has any idea how to loop over fieldnames in such a matlab structure? Thank you!
You might want to check out:
struct2tabler. This is a MATLAB function that recursively goes through a structure to convert into a table.
For example:
a.a = 5
a.b.c = 10
a.b.d = 15
Would return a table:
a_a a_b_c a_b_d
---------------------------
5 10 15
Disclaimer: I have written struct2tabler, so might be a little biased, however it was created out of a requirement, I think, very similar to yours.

Take out date values between two dates from matrix variable, Matlab

I'm trying to take out two separate years from a date table.
% Date table
Datez = [2001 2;2001 5;2001 9;2001 11;2002 3;2002 5;2002 7;2002 9;2002 11;...
2003 2;2003 4;2003 6;2003 8;2003 10;2003 12;2004 3;2004 5;2004 7;...
2004 9;2004 11; 2005 10;2005 12]
I want to take out all values as 1 or 0. I want the dates from 2001-11 to 2002-11 plus all values from 2004-11 to 2005-11.
In total I should get a new vector, called test:
test = [0;0;0;1;1;1;1;1;1;0;0;0;0;0;0;0;0;0;0;1;1;0] % final result
I tried these combinations, but I don't know how to combine these four statements into a vector that looks like "test" or if there are any better solutions?
xjcr = 1:length(Datez)
(Datez(xjcr,1) >= 2001 & Datez(xjcr,2) >= 11) % greater than 2001-11
(Datez(xjcr,1) <= 2002 & Datez(xjcr,2) <= 11) % smaller than 2002-11
(Datez(xjcr,1) >= 2004 & Datez(xjcr,2) >= 11) % greater than 2004-11
(Datez(xjcr,1) <= 2005 & Datez(xjcr,2) <= 11) % smaller than 2005-11
Any ideas are much appreciated, thanks in advance!
Your issue is that you do not want to filter on two items independently, years greater than 2001 and months greater than November. This would give you December 2001 but not January 2002. The solution I believe is to treat your two composite numbers as a single number so that the comparison operator can operate on them as a pair. Here is an easy method:
Datez2 = Datez(:,1)*100 + Datez(:,2);
test = (Datez2>=200111 & Datez2<=200211) | (Datez2>=200411 & Datez2<=200511)
Maybe multiplying by 12 and adding (month - 1) would be best depending on if you are building something that needs to be very robust or if you are just hacking something together.