proc ttest class, default group issue - class

I would like to compare mean values of two groups using proc ttest, and I successfully did it as below.
proc ttest;
class group;
var score;
run;
But, this code just assumes observations with group = 0 as the default group. So, the t-statistics is calculated based on Mean (score of obs with group= 0) minus Mean (score of obs with group= 1). But, I would like to have it the other way around.
It would just change the sign of t-statistics, but it's just what I wanted to do.
Is there an option to do so by simply adding an option?
I know I could have done it if I have made another dummy variable which is exactly the opposite of my group variable. But, I don't want to create more dummy variables.

ORDER=DATA will tell SAS to order the class variable based on when it encounters the values. So if the 1 values are earlier than the 0 values, it will be first in the comparison.
For example:
data for_ttest;
call streaminit(7);
do group = 0 to 1;
do _n_ = 1 to 50;
score = rand('NORMAL',1,0.5)+group;
output;
end;
end;
run;
proc sort data=for_ttest;
by descending group;
run;
proc ttest data=for_ttest order=data;
class group;
var score;
run;
Without ORDER=DATA, it behaves as you saw, but with it, 1 is the first group.
You could also combine ORDER=FORMATTED with a format.
proc format;
value groupf
1="Group 1 (Value=1)"
0="Group 2 (Value=0)"
;
quit;
proc ttest data=for_ttest order=formatted;
class group;
format group groupf.;
var score;
run;
The labels in the PROC FORMAT are irrelevant, other than that they must be alphabetically sorted. Unfortunately the PRELOADFMT option is not available in PROC TTEST, so you can't use the NOTSORTED trick in PROC FORMAT to allow this to work even with the original values (though you can use non-printing characters to mess with sort order if you really want to).

Related

Calculate duration between dates by group using SAS

I am trying to calculate how long a child has been in foster care. However, I am having some issues. My data should look like something below:
For each individual (ID) I need to calculate the duration (end_date-start_date). However, I also need to apply a rule that states that if there are less than 5 days between the end date and the start date within the same type of foster care, it should be considered as one consecutively placement. If there are more than five days between the end date end the start date within the same type of foster care for the same individual, it is a new placement. If it is a new type of foster care, it is a new placement. The variable “duration” is how, it is supposed to be calculated.
I have tried the following code, but it doesn't work the proper way + I don't know how to apply my "five day"-rule.
Proc sort data=have out=want;
by id type descending start_date;
run;
Data want;
set want;
by id type;
retain last_date;
if first.id or first.type then do;
last_date=end_date;
end;
if last.id or last.type then duration=(end_date-start_date);
run;
Any help is much appreciated!
Using a bunch of retain statements here to achieve this:
data want;
set have;
by id ;
retain true_sd prev_ed prev_type;
if first.id then call missing(prev_type);
if type ~= prev_type then do;
true_sd = sd;
call missing(prev_ed);
call missing(prev_type);
end;
if sd - prev_ed > 5 then true_sd = sd;
duration = ed - true_sd;
output;
prev_type = type;
prev_ed = ed;
format sd ed true_sd prev_ed date.;
run;
(assuming type and id are numeric here. ed is end_date, sd is start_date)

Is there a way to pass a list under a macro code?

I have a customer survey data like this:
data feedback;
length customer score comment $50.;
input customer $ score comment & $;
datalines;
A 3 The is no parking
A 5 The food is expensive
B . I like the food
C 5 It tastes good
C . blank
C 3 I like the drink
D 4 The dessert is tasty
D 2 I don't like the service
;
run;
There is a macro code like this:
%macro subset( cust=);
proc print data= feedback;
where customer = "&cust";
run;
%mend;
I am trying to write a program that call the %subset for each customer value in feedback data. Note that we do not know how many unique values of customer there are in the data set. Also, we cant change the %subset code.
I tried to achieve that by using proc sql to create a unique list of customers to pass into macro code but I think you cannot pass a list in a macro code.
Is there a way to do that? p.s I am beginner in macro
I like to keep things simple. Take a look at the following:
data feedback;
length customer score comment $50.;
input customer $ score comment & $;
datalines;
A 3 The is no parking
A 5 The food is expensive
B . I like the food
C 5 It tastes good
C . blank
C 3 I like the drink
D 4 The dessert is tasty
D 2 I don't like the service
;
run;
%macro subset( cust=);
proc print data= feedback;
where customer = "&cust";
run;
%mend subset;
%macro test;
/* first get the count of distinct customers */
proc sql noprint;
select count(distinct customer) into : cnt
from feedback;quit;
/* do this to remove leading spaces */
%let cnt = &cnt;
/* now get each of the customer names into macro variables
proc sql noprint;
select distinct customer into: cust1 - :cust&cnt
from feedback;quit;
/* use a loop to call other macro program, notice the use of &&cust&i */
%do i = 1 %to &cnt;
%subset(cust=&&cust&i);
%end;
%mend test;
%test;
of course if you want short and sweet you can use (just make sure your data is sorted by customer):
data _null_;
set feedback;
by customer;
if(first.customer)then call execute('%subset(cust='||customer||')');
run;
First fix the SAS code. To test if a value is in a list using the IN operator, not the = operator.
where customer in ('A' 'B')
Then you can pass that list into your macro and use it in your code.
%macro subset(custlist);
proc print data= feedback;
where customer in (&custlist);
run;
%mend;
%subset(custlist='A' 'B')
Notice a few things:
Use quotes around the values since the variable is character.
Use spaces between the values. The IN operator in SAS accepts either spaces or comma (or both) as the delimiter in the list. It is a pain to pass in comma delimited lists in a macro call since the comma is used to delimit the parameters.
You can defined a macro parameter as positional and still call it by name in the macro call.
If the list is in a dataset you can easily generate the list of values into a macro variable using PROC SQL. Just make sure the resulting list is not too long for a macro variable (maximum of 64K bytes).
proc sql noprint;
select distinct quote(trim(customer))
into :custlist separated by ' '
from my_subset
;
quit;
%subset(&custlist)

SAS: Coding a dummy variable for a value of a variable by group within group

I have a dataset of CASE_ID (x y and z), a set of multiple dates (including duplicate dates) for each CASE_ID, and a variable VAR. I would like to create a dummy variable DUMMYVAR by group within a group whereby if VAR="C" for CASE_ID x on some specific date, then DUMMYVAR=1 for all observations corresponding to CASE_ID x on with that date.
I believe that a Classic 2XDOW would be the key here but this is my third week using SAS and having difficulty getting this by two BY groups here.
I have referenced and attempted to write a variation of Haikuo's code here:
PROC SORT have;
by CASE_ID DATE;
RUN;
data want;
do until (last.DATE);
set HAVE;
by date notsorted;
if var='c' then DUMMYVAR=1;
do until (last.DATE);
set HAVE;
by DATE notsorted;
if DATE=1 then ????????
end;
run;
Change your BY statements to match the grouping you are doing. And in the second loop add a simple OUTPUT; statement. Then your new dataset will have all the rows in your original dataset and the new variable DUMMYVAR.
data want;
do until (last.DATE);
set HAVE;
by case_id date;
if var='c' then DUMMYVAR=1;
end;
do until (last.DATE);
set HAVE;
by case_id date;
output;
end;
run;
This will create the variable DUMMYVAR with values of either 1 or missing. If you want the values to be 1 or 0 then you could either set it to 0 before the first DO loop. Or add if first.date then dummyvar=0; statement before the existing IF statement.

create a macro in sas

I have a report that is generated once a year. each report has the form of the year inside the name - report-2011.xls, report-2012.xls etc. each report contains the following vars: ID, SAL=average monthly salary of that year, Gender (0=male, 1=female), Married (0=not married, 1=married), I need to create a macro that calculates the mean.std,min and max of the salary, per year in accordance to gender type and married type. in the macro I need to include a parameter for the relevant year.
how do I refer to each type separately in calculating these parameters?
and how do I create a separate parameter for the year var?
proc summary allows you to control exactly which ways you want to cross the data.
%macro report(year);
proc import datafile="/path/to/report-&year..xls"
out= salary_data
dbms=csv replace ;
proc summary data = salary_data;
class married gender;
types married gender married*gender;
var sal;
output out = salary_results mean(sal) = mean_salary std(sal) = std_salary;
* Print the summary;
proc print;
* Delete the data and summary after using them;
proc delete data= salary_data salary_results; run;
%mend report;
Note that proc summary produces other useful information which you can read about here. You can drop them if you don't need them.
I macro is just a way to generate code. So first design the code you want it to generate. Then you can figure out what parts of it vary and replace those with macro variable references. The macro variables then become the parameters for your macro.
proc import datafile="report-2011.xls" out=report_2011 ; run;
proc means data=report_2011 ;
class gender married;
run;
%macro reporting(year, gender, marital_status);
proc means data=data&year min max std; * <== you should have separate datasets for different years
class gender married ;
%mend reporting
%reporting( 2015, 1, 1)
Is that something you are looking for?

References to SAS date macros not working due to differing data types

hoping for some help on this one.
Currently, the query uses the below to create references for m1-m6 and d1-d6.
%let m1=1114;
%let d1 ='30NOV2014'd;
%let m2=1214;
%let d2='31DEC2014'd;
%let m3=0115;
%let d3='31JAN2015'd;
%let m4=0215;
%let d4='28FEB2015'd;
%let m5=0315;
%let d5='31MAR2015'd;
%let m6=0415;
%let d6='30APR2015'd;
Based on the rest of the code, the m1-m6 dates must be formatted as mmyy. I have tried to swap the above out with this:
data _datemacro_;
m1 = put(intnx('day','01NOV2014'd,0),mmyyn4.);
call symput('m1',"'"||put(m1,9.)) ;
d1 = put(intnx('day','30NOV2014'd,0),date9.);
call symput('d1',"'"||put(d1,9.)) ;
m2 = put(intnx('day',&d1,+1),mmyyn4.);
call symput('m2',"'"||put(m2,9.)) ;
d2 = put(intnx('month',&d1,+1,'e'),date9.);
call symput('d2',"'"||put(d2,9.)||"'"||"d") ;
...etc through m6 and d6
run;
Below is the rest of the code that yields a garden variety of errors, including
ERROR 22-322: Syntax error, expecting one of the following: a name, a
quoted string, (, /, ;, DATA, LAST, NULL.
ERROR 200-322: The symbol is not recognized and will be ignored.
proc sql;
create table perf as
select a. field, a. field, a. field, a. reportingdate,
a. field, a. field,
e. field,
f. field
from table a, table2 e, table3 f
where a. reportingdate between &d1. and &d6.
and (a. field=1 or a. field=1)
and a. field = e. field and a. field = f. field;
quit;
/*Creates performance file by month*/
%macro month (mon,date);
data m&mon. (rename=(field=active&mon. field=co&mon. field=es&mon. field=sr&mon.));
set perf;
where datepart(reportingdate)=&date.;
run;
proc sort data=m&mon.; by field descending co&mon.; run;
proc sort data=m&mon. nodupkey out=m2&mon.; by field; run;
%mend month;
%month (&m1.,&d1.);
%month (&m2.,&d2.);
%month (&m3.,&d3.);
%month (&m4.,&d4.);
%month (&m5.,&d5.);
%month (&m6.,&d6.);
I am able to get it to run accurately until the last 6 lines, where it comes up with 78 errors just from running those 6 lines.
Any suggestions on how to write the macro to keep the correct data type while accurately defining the month start and end dates? When I try and change the start date and end date of each month to the same format, something within the rest of the code causes an error stating that it cannot work with two variables of different formats, even when they are clearly the same format as defined in the code.
Please let me know if there is anything I can clarify, as this was a little harder to explain than I intended.
Thank you for your help.
So you're basically trying to build a macro that will run a report month-by-month. I think using a macro is a good idea, but your structure could benefit from a re-org.
The first thing to fix is the hardcoded dates. Hardcoding is bad 99% of the time. Why not use a loop instead?
Initialise the start and end dates at the top of your program. In future they're easy to find and change if they're at the top, and you won't need to search through your code trying to figure out what else needs to change:
* PICK ANY DATES IN THE MONTHS YOU WANT TO START AND END. IE. DOESNT MATTER IF YOU CHOOSE THE FIRST OR THE 20TH. IT WILL RUN FOR THAT MONTH;
%let rpt_start = %sysfunc(mdy(11,1,2014));
%let rpt_end = %sysfunc(mdy( 4,1,2015));
Go and get the data between the start and end dates:
proc sql;
create table perf as
select a. field, a. field, a. field, a. reportingdate,
a. field, a. field,
e. field,
f. field
from table a, table2 e, table3 f
where a. reportingdate between &rpt_start and &rpt_end
and (a. field=1 or a. field=1)
and a. field = e. field and a. field = f. field;
quit;
Now loop over each month inbetween the start and end dates. Create the desired datasets as we go.
%macro create_monthly_datasets;
%local tmp_dt tmp_end rpt_dt;
%let tmp_end = %sysfunc(intnx(month,&rpt_end,0,end)); *CALC END-DATE DESIRED;
%let tmp_dt = %sysfunc(intnx(month,&rpt_start,0,beginning)); *INITIALISE LOOP;
%do %while (&tmp_dt le &tmp_end);
* CALC ACUTAL DATE WANTED AND STORE IT IN RPT_DT;
* CALC THE MMYY VAL YOU NEED;
* PRINT OUT BOTH VALUES TO MAKE SURE THEYRE CORRECT;
%let rpt_dt = %sysfunc(intnx(month,&tmp_dt,0,end));
%let mmyy = %sysfunc(month(&rpt_dt),z2.)%substr(%sysfunc(year(&rpt_dt)),3,2);
%put %sysfunc(sum(&rpt_dt),date9.) &mmyy;
* DO THE WORK;
data m&mmyy (rename=(field=active&mmyy field=co&mmyy field=es&mmyy field=sr&mmyy));
set perf;
where datepart(reportingdate)=&rpt_dt;
run;
proc sort data=m&mmyy; by field descending co&mmyy; run;
proc sort data=m&mmyy nodupkey out=m2&mmyy; by field; run;
%let tmp_dt = %sysfunc(intnx(month,&tmp_dt,1,beginning)); * ITERATE LOOP;
%end;
%mend;
%create_monthly_datasets;
Try the following for creating the macro variables:
data _null_;
START_MTH = '01nov2014'd;
do i = 1 to 6;
T_DATE = intnx('month',START_MTH,i)-1; /*Shift forwards i months then back 1 day*/
call symput(cats('m',i),put(T_DATE,mmyyn4.));
call symput(cats('d',i),cats("'",put(T_DATE,date9.),"'d"));
end;
run;