create a macro in sas - macros

I have a report that is generated once a year. each report has the form of the year inside the name - report-2011.xls, report-2012.xls etc. each report contains the following vars: ID, SAL=average monthly salary of that year, Gender (0=male, 1=female), Married (0=not married, 1=married), I need to create a macro that calculates the mean.std,min and max of the salary, per year in accordance to gender type and married type. in the macro I need to include a parameter for the relevant year.
how do I refer to each type separately in calculating these parameters?
and how do I create a separate parameter for the year var?

proc summary allows you to control exactly which ways you want to cross the data.
%macro report(year);
proc import datafile="/path/to/report-&year..xls"
out= salary_data
dbms=csv replace ;
proc summary data = salary_data;
class married gender;
types married gender married*gender;
var sal;
output out = salary_results mean(sal) = mean_salary std(sal) = std_salary;
* Print the summary;
proc print;
* Delete the data and summary after using them;
proc delete data= salary_data salary_results; run;
%mend report;
Note that proc summary produces other useful information which you can read about here. You can drop them if you don't need them.

I macro is just a way to generate code. So first design the code you want it to generate. Then you can figure out what parts of it vary and replace those with macro variable references. The macro variables then become the parameters for your macro.
proc import datafile="report-2011.xls" out=report_2011 ; run;
proc means data=report_2011 ;
class gender married;
run;

%macro reporting(year, gender, marital_status);
proc means data=data&year min max std; * <== you should have separate datasets for different years
class gender married ;
%mend reporting
%reporting( 2015, 1, 1)
Is that something you are looking for?

Related

Is there a way to pass a list under a macro code?

I have a customer survey data like this:
data feedback;
length customer score comment $50.;
input customer $ score comment & $;
datalines;
A 3 The is no parking
A 5 The food is expensive
B . I like the food
C 5 It tastes good
C . blank
C 3 I like the drink
D 4 The dessert is tasty
D 2 I don't like the service
;
run;
There is a macro code like this:
%macro subset( cust=);
proc print data= feedback;
where customer = "&cust";
run;
%mend;
I am trying to write a program that call the %subset for each customer value in feedback data. Note that we do not know how many unique values of customer there are in the data set. Also, we cant change the %subset code.
I tried to achieve that by using proc sql to create a unique list of customers to pass into macro code but I think you cannot pass a list in a macro code.
Is there a way to do that? p.s I am beginner in macro
I like to keep things simple. Take a look at the following:
data feedback;
length customer score comment $50.;
input customer $ score comment & $;
datalines;
A 3 The is no parking
A 5 The food is expensive
B . I like the food
C 5 It tastes good
C . blank
C 3 I like the drink
D 4 The dessert is tasty
D 2 I don't like the service
;
run;
%macro subset( cust=);
proc print data= feedback;
where customer = "&cust";
run;
%mend subset;
%macro test;
/* first get the count of distinct customers */
proc sql noprint;
select count(distinct customer) into : cnt
from feedback;quit;
/* do this to remove leading spaces */
%let cnt = &cnt;
/* now get each of the customer names into macro variables
proc sql noprint;
select distinct customer into: cust1 - :cust&cnt
from feedback;quit;
/* use a loop to call other macro program, notice the use of &&cust&i */
%do i = 1 %to &cnt;
%subset(cust=&&cust&i);
%end;
%mend test;
%test;
of course if you want short and sweet you can use (just make sure your data is sorted by customer):
data _null_;
set feedback;
by customer;
if(first.customer)then call execute('%subset(cust='||customer||')');
run;
First fix the SAS code. To test if a value is in a list using the IN operator, not the = operator.
where customer in ('A' 'B')
Then you can pass that list into your macro and use it in your code.
%macro subset(custlist);
proc print data= feedback;
where customer in (&custlist);
run;
%mend;
%subset(custlist='A' 'B')
Notice a few things:
Use quotes around the values since the variable is character.
Use spaces between the values. The IN operator in SAS accepts either spaces or comma (or both) as the delimiter in the list. It is a pain to pass in comma delimited lists in a macro call since the comma is used to delimit the parameters.
You can defined a macro parameter as positional and still call it by name in the macro call.
If the list is in a dataset you can easily generate the list of values into a macro variable using PROC SQL. Just make sure the resulting list is not too long for a macro variable (maximum of 64K bytes).
proc sql noprint;
select distinct quote(trim(customer))
into :custlist separated by ' '
from my_subset
;
quit;
%subset(&custlist)

SAS Macro to extract selected rows from table

I have a dataset having 20 observations and 6 variables ID, Gender, Age,Height,Weight,Year. All are numeric except gender variable. I would like to extract 10 observations starting from fifth observation using SAS macros.
I have the code below to import and extract the selected rows from the table.
I want to extract the selected rows using macros as part of an exercise. Please let me know your advice how to use macros to extract specific observations.
Thank you for your time.
%macro one (a, b, c);
proc import out=&a
datafile= "C:\Users\komal\Desktop\&b"
dbms=&c replace;
getnames=yes;
run;
%mend one;
%one (outcsv, Sample.csv, csv);
data test;
set outcsv;
if _N_ in (5,6,7,8,9,10,11,12,13,14) then output;
run;
you could do something like this
%macro one (a, b, c,strtpt,endpt);
proc import out=&a
datafile= "C:\Users\komal\Desktop\&b"
dbms=&c replace;
getnames=yes;
run;
data test;
set &a;
if _n_ >= &strtpt and _n_ =< &endpt;
run;
%mend one;
%one (outcsv, Sample.csv, csv,5,14);
There is no need to use PROC IMPORT to read from a CSV file. Especially if you already know the names/types of the variables. So something like this should work.
data want ;
infile "C:\Users\komal\Desktop\&b" dsd firstobs=5 obs=14 truncover ;
input ID Gender $ Age Height Weight Year ;
run;
You might need to use 6 to 15 instead if the file has a header row.

proc ttest class, default group issue

I would like to compare mean values of two groups using proc ttest, and I successfully did it as below.
proc ttest;
class group;
var score;
run;
But, this code just assumes observations with group = 0 as the default group. So, the t-statistics is calculated based on Mean (score of obs with group= 0) minus Mean (score of obs with group= 1). But, I would like to have it the other way around.
It would just change the sign of t-statistics, but it's just what I wanted to do.
Is there an option to do so by simply adding an option?
I know I could have done it if I have made another dummy variable which is exactly the opposite of my group variable. But, I don't want to create more dummy variables.
ORDER=DATA will tell SAS to order the class variable based on when it encounters the values. So if the 1 values are earlier than the 0 values, it will be first in the comparison.
For example:
data for_ttest;
call streaminit(7);
do group = 0 to 1;
do _n_ = 1 to 50;
score = rand('NORMAL',1,0.5)+group;
output;
end;
end;
run;
proc sort data=for_ttest;
by descending group;
run;
proc ttest data=for_ttest order=data;
class group;
var score;
run;
Without ORDER=DATA, it behaves as you saw, but with it, 1 is the first group.
You could also combine ORDER=FORMATTED with a format.
proc format;
value groupf
1="Group 1 (Value=1)"
0="Group 2 (Value=0)"
;
quit;
proc ttest data=for_ttest order=formatted;
class group;
format group groupf.;
var score;
run;
The labels in the PROC FORMAT are irrelevant, other than that they must be alphabetically sorted. Unfortunately the PRELOADFMT option is not available in PROC TTEST, so you can't use the NOTSORTED trick in PROC FORMAT to allow this to work even with the original values (though you can use non-printing characters to mess with sort order if you really want to).

Macro (loop) functions in SAS

I am doing some very simple analysis in SAS, finding mean, standard deviation and median, and the code is like
proc means data=data001
mean median;
VAR= price volume;
output out=new001;
mean=avprice avvolume
median=medprice medvolume; run;
But the thing is that I have more than 100 dataset (data001 to data299).
I just want to use Macro to process all datasets at once (from 001 to 299) and output result into one table ? Is there any way to do this ?
Thanks and have a good weekend!
Append them all to one table and use a CLASS or BY variable to differentiate.
Data combined;
Set data001-data099 indsname=source;
Data_source=source;
Run;
Proc sort data=combined; by data_source;
proc means data=combined noprint;
By data_source;
VAR= price volume;
output out=new001;
mean=avprice avvolume
median=medprice medvolume; run;

References to SAS date macros not working due to differing data types

hoping for some help on this one.
Currently, the query uses the below to create references for m1-m6 and d1-d6.
%let m1=1114;
%let d1 ='30NOV2014'd;
%let m2=1214;
%let d2='31DEC2014'd;
%let m3=0115;
%let d3='31JAN2015'd;
%let m4=0215;
%let d4='28FEB2015'd;
%let m5=0315;
%let d5='31MAR2015'd;
%let m6=0415;
%let d6='30APR2015'd;
Based on the rest of the code, the m1-m6 dates must be formatted as mmyy. I have tried to swap the above out with this:
data _datemacro_;
m1 = put(intnx('day','01NOV2014'd,0),mmyyn4.);
call symput('m1',"'"||put(m1,9.)) ;
d1 = put(intnx('day','30NOV2014'd,0),date9.);
call symput('d1',"'"||put(d1,9.)) ;
m2 = put(intnx('day',&d1,+1),mmyyn4.);
call symput('m2',"'"||put(m2,9.)) ;
d2 = put(intnx('month',&d1,+1,'e'),date9.);
call symput('d2',"'"||put(d2,9.)||"'"||"d") ;
...etc through m6 and d6
run;
Below is the rest of the code that yields a garden variety of errors, including
ERROR 22-322: Syntax error, expecting one of the following: a name, a
quoted string, (, /, ;, DATA, LAST, NULL.
ERROR 200-322: The symbol is not recognized and will be ignored.
proc sql;
create table perf as
select a. field, a. field, a. field, a. reportingdate,
a. field, a. field,
e. field,
f. field
from table a, table2 e, table3 f
where a. reportingdate between &d1. and &d6.
and (a. field=1 or a. field=1)
and a. field = e. field and a. field = f. field;
quit;
/*Creates performance file by month*/
%macro month (mon,date);
data m&mon. (rename=(field=active&mon. field=co&mon. field=es&mon. field=sr&mon.));
set perf;
where datepart(reportingdate)=&date.;
run;
proc sort data=m&mon.; by field descending co&mon.; run;
proc sort data=m&mon. nodupkey out=m2&mon.; by field; run;
%mend month;
%month (&m1.,&d1.);
%month (&m2.,&d2.);
%month (&m3.,&d3.);
%month (&m4.,&d4.);
%month (&m5.,&d5.);
%month (&m6.,&d6.);
I am able to get it to run accurately until the last 6 lines, where it comes up with 78 errors just from running those 6 lines.
Any suggestions on how to write the macro to keep the correct data type while accurately defining the month start and end dates? When I try and change the start date and end date of each month to the same format, something within the rest of the code causes an error stating that it cannot work with two variables of different formats, even when they are clearly the same format as defined in the code.
Please let me know if there is anything I can clarify, as this was a little harder to explain than I intended.
Thank you for your help.
So you're basically trying to build a macro that will run a report month-by-month. I think using a macro is a good idea, but your structure could benefit from a re-org.
The first thing to fix is the hardcoded dates. Hardcoding is bad 99% of the time. Why not use a loop instead?
Initialise the start and end dates at the top of your program. In future they're easy to find and change if they're at the top, and you won't need to search through your code trying to figure out what else needs to change:
* PICK ANY DATES IN THE MONTHS YOU WANT TO START AND END. IE. DOESNT MATTER IF YOU CHOOSE THE FIRST OR THE 20TH. IT WILL RUN FOR THAT MONTH;
%let rpt_start = %sysfunc(mdy(11,1,2014));
%let rpt_end = %sysfunc(mdy( 4,1,2015));
Go and get the data between the start and end dates:
proc sql;
create table perf as
select a. field, a. field, a. field, a. reportingdate,
a. field, a. field,
e. field,
f. field
from table a, table2 e, table3 f
where a. reportingdate between &rpt_start and &rpt_end
and (a. field=1 or a. field=1)
and a. field = e. field and a. field = f. field;
quit;
Now loop over each month inbetween the start and end dates. Create the desired datasets as we go.
%macro create_monthly_datasets;
%local tmp_dt tmp_end rpt_dt;
%let tmp_end = %sysfunc(intnx(month,&rpt_end,0,end)); *CALC END-DATE DESIRED;
%let tmp_dt = %sysfunc(intnx(month,&rpt_start,0,beginning)); *INITIALISE LOOP;
%do %while (&tmp_dt le &tmp_end);
* CALC ACUTAL DATE WANTED AND STORE IT IN RPT_DT;
* CALC THE MMYY VAL YOU NEED;
* PRINT OUT BOTH VALUES TO MAKE SURE THEYRE CORRECT;
%let rpt_dt = %sysfunc(intnx(month,&tmp_dt,0,end));
%let mmyy = %sysfunc(month(&rpt_dt),z2.)%substr(%sysfunc(year(&rpt_dt)),3,2);
%put %sysfunc(sum(&rpt_dt),date9.) &mmyy;
* DO THE WORK;
data m&mmyy (rename=(field=active&mmyy field=co&mmyy field=es&mmyy field=sr&mmyy));
set perf;
where datepart(reportingdate)=&rpt_dt;
run;
proc sort data=m&mmyy; by field descending co&mmyy; run;
proc sort data=m&mmyy nodupkey out=m2&mmyy; by field; run;
%let tmp_dt = %sysfunc(intnx(month,&tmp_dt,1,beginning)); * ITERATE LOOP;
%end;
%mend;
%create_monthly_datasets;
Try the following for creating the macro variables:
data _null_;
START_MTH = '01nov2014'd;
do i = 1 to 6;
T_DATE = intnx('month',START_MTH,i)-1; /*Shift forwards i months then back 1 day*/
call symput(cats('m',i),put(T_DATE,mmyyn4.));
call symput(cats('d',i),cats("'",put(T_DATE,date9.),"'d"));
end;
run;