I am trying to calculate how long a child has been in foster care. However, I am having some issues. My data should look like something below:
For each individual (ID) I need to calculate the duration (end_date-start_date). However, I also need to apply a rule that states that if there are less than 5 days between the end date and the start date within the same type of foster care, it should be considered as one consecutively placement. If there are more than five days between the end date end the start date within the same type of foster care for the same individual, it is a new placement. If it is a new type of foster care, it is a new placement. The variable “duration” is how, it is supposed to be calculated.
I have tried the following code, but it doesn't work the proper way + I don't know how to apply my "five day"-rule.
Proc sort data=have out=want;
by id type descending start_date;
run;
Data want;
set want;
by id type;
retain last_date;
if first.id or first.type then do;
last_date=end_date;
end;
if last.id or last.type then duration=(end_date-start_date);
run;
Any help is much appreciated!
Using a bunch of retain statements here to achieve this:
data want;
set have;
by id ;
retain true_sd prev_ed prev_type;
if first.id then call missing(prev_type);
if type ~= prev_type then do;
true_sd = sd;
call missing(prev_ed);
call missing(prev_type);
end;
if sd - prev_ed > 5 then true_sd = sd;
duration = ed - true_sd;
output;
prev_type = type;
prev_ed = ed;
format sd ed true_sd prev_ed date.;
run;
(assuming type and id are numeric here. ed is end_date, sd is start_date)
I want to update a history file in SAS. I have new observations, which may overlap with existing data lines.
What is needed, is a file, which would have lines from dataset (new_data) where they exist and in case the lines do not exist, then from old set (old_data). What I've come up is a clunky merge operation, which is conditional on the order of the datasets. (==Works only if New_data is after Old_data. :?)
data new_data;
input key value;
datalines;
1 10
1 11
2 20
2 21
;
run;
data old_data;
input key value;
datalines;
2 50
2 51
3 30
3 31
;
run;
So I'd like to have the following:
key value
1 10
1 11
2 20
2 21
3 30
3 31
However the following does not work. It produces the output below it.
data updated_history;
merge New_data(in=a) old_data(in=b) ;
by key;
if a or (b and not a );
run;
....
2 50
2 51
...
But for some reason this does:
data updated_history;
merge old_data(in=b) New_data(in=a);
by key;
if a or (b and not a );
run;
Question: Is there an intelligent way to manage from which dataset the values are select from. Something like: if a then value_from_dataset a;
The order in which you list the data sets in the MERGE is the order the data is taken. So when the order is old, new values from old are read and then values from new overwrite the values from old. This is why your second version works and the first does not.
Since you have multiple observations per key value you probably do NOT want to use MERGE to combine these files. You could do it using SET by reading the data twice using two DOW loops. In that case it won't matter the order of the dataset in the SET statement since the records are interleaved instead of being joined. This first loop will calculate which of the two input datasets have any observations for this KEY value.
data want ;
anyold=0;
anynew=0;
do until (last.key);
set old_data (in=inold) new_data(in=innew);
by key ;
if inold then anyold=1;
if innew then anynew=1;
end;
do until (last.key);
set old_data (in=inold) new_data(in=innew);
by key ;
if not (anyold and anynew and inold) then output;
end;
drop anyold anynew;
run;
This type of combination is probably easier to code using SQL.
proc sql ;
create table want as
select key,value from new_data
union
select key,value from old_data
where key in (select key from old_data except select key from new_data)
order by 1
;
quit;
I have 2 data sets that I want to merge by territory #...the first dataset has territory information including territory #, the second dataset has territory #'s but they are across 4 different columns titled drug_terr1, drug_terr2, drug_terr3, and drug_Terr4...I need to merge on all 4 columns because they each have different territory #'s and I want those numbers to be included in my merge with the dataset that has all the territory information...I tried a rename but that didn't work because it only changed the first column...is there a way to combine all this data, and rename it by territory # so I can do the merge?
ultimately would like it to look like this, but need to get the 4 columns from 'terrfile' to become 1 column called territory_nbr so I can merge.
%let output = E:\Horizon\Adhoc\AH\;
%let terrs =\\uslsasas1\E$\Horizon\IMS Processing\Weekly Data\20161230\Demo\;
libname terrs "&terrs.";
%let curr_process_wk = '12-30-2016';
%let curr_quarter =_q1;
**0 Grab pskw;
data pskw_data;
set PSKW.PSKWMaster ;
where week in ('12-16-2016','12-23-2016','12-30-2016','01-06-2017') and CopayType ="FBD" and FNRX=1 and pme_id in (46,42,55,38) and product in ('DUEXIS','VIMOVO','PENNSAID')
and
(COBPrimaryRejectCode1 in ('75','76') or COBPrimaryRejectCode2 in ('75', '76') or COBPrimaryRejectCode3 in ('75' , '76'));
run;
proc sort data=pskw_data;
by imsid;
run;
** 01 Grab tbl HCP;
proc sort data=ims.tblhcp (where = (week = &curr_process_wk.) keep = week imsid first_name last_name address1 address2 city state zip spec)
out = IMS_demo (drop = week);
by IMSID;
run;
** 02 Grab tbl terrs_by_imsid;
data terrfile;
set terrs.wd2_terrs_by_imsid&curr_quarter.;
run;
proc sort data = terrfile;
by imsid;
run;
** 03 Grab tbl roster;
data roster (keep = territorycode repname territoryname teamname);
set ims.tblRoster;
repname = trim(left(FirstName))||" "||trim(left(LastName));
run;
**04 link ;
data combine_dbs;
merge pskw_data (in=in1)
ims.tblhcp (in=in2);
by imsid;
if in1;
run;
data territories; ***can't merge because territory code is not in terrfile, just 4 columns as I mentioned above***;
merge terrfile (in=in1)
roster (in=in2);
by territorycode;
if in2;
run;
You need to merge the fact table with the lookup table four times. Let's say your territory identifier is called ID in your lookup table you want to take the field IMS_ID from it. Let's also assume your four fields in your fact table are named ID1-ID4.
proc sql ;
create table want as
select a.*
, b.ims_id as ims_id1
, c.ims_id as ims_id2
, d.ims_id as ims_id3
, e.ims_id as ims_id4
from FACT a
left join LU b on a.id1=b.id
left join LU c on a.id2=c.id
left join LU d on a.id3=d.id
left join LU e on a.id4=e.id
;
quit;
In your example it looks ROSTER is your FACT table and TERRFILES is your LU table. Your ID variable looks like it is name TERRITORYCODE, at least in your lookup file. Hard to tell what the four variables in ROSTER are named.
I have a dataset of CASE_ID (x y and z), a set of multiple dates (including duplicate dates) for each CASE_ID, and a variable VAR. I would like to create a dummy variable DUMMYVAR by group within a group whereby if VAR="C" for CASE_ID x on some specific date, then DUMMYVAR=1 for all observations corresponding to CASE_ID x on with that date.
I believe that a Classic 2XDOW would be the key here but this is my third week using SAS and having difficulty getting this by two BY groups here.
I have referenced and attempted to write a variation of Haikuo's code here:
PROC SORT have;
by CASE_ID DATE;
RUN;
data want;
do until (last.DATE);
set HAVE;
by date notsorted;
if var='c' then DUMMYVAR=1;
do until (last.DATE);
set HAVE;
by DATE notsorted;
if DATE=1 then ????????
end;
run;
Change your BY statements to match the grouping you are doing. And in the second loop add a simple OUTPUT; statement. Then your new dataset will have all the rows in your original dataset and the new variable DUMMYVAR.
data want;
do until (last.DATE);
set HAVE;
by case_id date;
if var='c' then DUMMYVAR=1;
end;
do until (last.DATE);
set HAVE;
by case_id date;
output;
end;
run;
This will create the variable DUMMYVAR with values of either 1 or missing. If you want the values to be 1 or 0 then you could either set it to 0 before the first DO loop. Or add if first.date then dummyvar=0; statement before the existing IF statement.
hoping for some help on this one.
Currently, the query uses the below to create references for m1-m6 and d1-d6.
%let m1=1114;
%let d1 ='30NOV2014'd;
%let m2=1214;
%let d2='31DEC2014'd;
%let m3=0115;
%let d3='31JAN2015'd;
%let m4=0215;
%let d4='28FEB2015'd;
%let m5=0315;
%let d5='31MAR2015'd;
%let m6=0415;
%let d6='30APR2015'd;
Based on the rest of the code, the m1-m6 dates must be formatted as mmyy. I have tried to swap the above out with this:
data _datemacro_;
m1 = put(intnx('day','01NOV2014'd,0),mmyyn4.);
call symput('m1',"'"||put(m1,9.)) ;
d1 = put(intnx('day','30NOV2014'd,0),date9.);
call symput('d1',"'"||put(d1,9.)) ;
m2 = put(intnx('day',&d1,+1),mmyyn4.);
call symput('m2',"'"||put(m2,9.)) ;
d2 = put(intnx('month',&d1,+1,'e'),date9.);
call symput('d2',"'"||put(d2,9.)||"'"||"d") ;
...etc through m6 and d6
run;
Below is the rest of the code that yields a garden variety of errors, including
ERROR 22-322: Syntax error, expecting one of the following: a name, a
quoted string, (, /, ;, DATA, LAST, NULL.
ERROR 200-322: The symbol is not recognized and will be ignored.
proc sql;
create table perf as
select a. field, a. field, a. field, a. reportingdate,
a. field, a. field,
e. field,
f. field
from table a, table2 e, table3 f
where a. reportingdate between &d1. and &d6.
and (a. field=1 or a. field=1)
and a. field = e. field and a. field = f. field;
quit;
/*Creates performance file by month*/
%macro month (mon,date);
data m&mon. (rename=(field=active&mon. field=co&mon. field=es&mon. field=sr&mon.));
set perf;
where datepart(reportingdate)=&date.;
run;
proc sort data=m&mon.; by field descending co&mon.; run;
proc sort data=m&mon. nodupkey out=m2&mon.; by field; run;
%mend month;
%month (&m1.,&d1.);
%month (&m2.,&d2.);
%month (&m3.,&d3.);
%month (&m4.,&d4.);
%month (&m5.,&d5.);
%month (&m6.,&d6.);
I am able to get it to run accurately until the last 6 lines, where it comes up with 78 errors just from running those 6 lines.
Any suggestions on how to write the macro to keep the correct data type while accurately defining the month start and end dates? When I try and change the start date and end date of each month to the same format, something within the rest of the code causes an error stating that it cannot work with two variables of different formats, even when they are clearly the same format as defined in the code.
Please let me know if there is anything I can clarify, as this was a little harder to explain than I intended.
Thank you for your help.
So you're basically trying to build a macro that will run a report month-by-month. I think using a macro is a good idea, but your structure could benefit from a re-org.
The first thing to fix is the hardcoded dates. Hardcoding is bad 99% of the time. Why not use a loop instead?
Initialise the start and end dates at the top of your program. In future they're easy to find and change if they're at the top, and you won't need to search through your code trying to figure out what else needs to change:
* PICK ANY DATES IN THE MONTHS YOU WANT TO START AND END. IE. DOESNT MATTER IF YOU CHOOSE THE FIRST OR THE 20TH. IT WILL RUN FOR THAT MONTH;
%let rpt_start = %sysfunc(mdy(11,1,2014));
%let rpt_end = %sysfunc(mdy( 4,1,2015));
Go and get the data between the start and end dates:
proc sql;
create table perf as
select a. field, a. field, a. field, a. reportingdate,
a. field, a. field,
e. field,
f. field
from table a, table2 e, table3 f
where a. reportingdate between &rpt_start and &rpt_end
and (a. field=1 or a. field=1)
and a. field = e. field and a. field = f. field;
quit;
Now loop over each month inbetween the start and end dates. Create the desired datasets as we go.
%macro create_monthly_datasets;
%local tmp_dt tmp_end rpt_dt;
%let tmp_end = %sysfunc(intnx(month,&rpt_end,0,end)); *CALC END-DATE DESIRED;
%let tmp_dt = %sysfunc(intnx(month,&rpt_start,0,beginning)); *INITIALISE LOOP;
%do %while (&tmp_dt le &tmp_end);
* CALC ACUTAL DATE WANTED AND STORE IT IN RPT_DT;
* CALC THE MMYY VAL YOU NEED;
* PRINT OUT BOTH VALUES TO MAKE SURE THEYRE CORRECT;
%let rpt_dt = %sysfunc(intnx(month,&tmp_dt,0,end));
%let mmyy = %sysfunc(month(&rpt_dt),z2.)%substr(%sysfunc(year(&rpt_dt)),3,2);
%put %sysfunc(sum(&rpt_dt),date9.) &mmyy;
* DO THE WORK;
data m&mmyy (rename=(field=active&mmyy field=co&mmyy field=es&mmyy field=sr&mmyy));
set perf;
where datepart(reportingdate)=&rpt_dt;
run;
proc sort data=m&mmyy; by field descending co&mmyy; run;
proc sort data=m&mmyy nodupkey out=m2&mmyy; by field; run;
%let tmp_dt = %sysfunc(intnx(month,&tmp_dt,1,beginning)); * ITERATE LOOP;
%end;
%mend;
%create_monthly_datasets;
Try the following for creating the macro variables:
data _null_;
START_MTH = '01nov2014'd;
do i = 1 to 6;
T_DATE = intnx('month',START_MTH,i)-1; /*Shift forwards i months then back 1 day*/
call symput(cats('m',i),put(T_DATE,mmyyn4.));
call symput(cats('d',i),cats("'",put(T_DATE,date9.),"'d"));
end;
run;