SAS macros to print multiple reports - macros

I have a data set that looks like this:
data have;
input name $ class $ time score;
cards;
chewbacca wookie 1 97
chewbacca wookie 2 100
chewbacca wookie 3 95
saruman wizard 1 79
saruman wizard 2 85
saruman wizard 3 40
gandalf wizard 1 22
gandalf wizard 2 50
gandalf wizard 3 87
bieber canadian 1 50
bieber canadian 2 45
bieber canadian 3 10
;
run;
I'm creating a program that does two things:
1. prints the data for each distinct class
2. creates a scatterplot x=time y=score for each name.
Executing the code below will illustrate my desired output:
data chewbacca saruman gandalf bieber;
set have;
if name='chewbacca' then output chewbacca;
else if name='saruman' then output saruman;
else if name='gandalf' then output gandalf;
else if name='bieber' then output bieber;
run;
title 'Report for wookie';
proc print data=have;
where class='wookie';
run;
title 'Plot Chewbacca';
proc sgplot data=chewbacca;
scatter x=time y=score;
run;
title 'Report for wizard';
proc print data=have;
where class='wizard';
run;
title 'Plot Saruman';
proc sgplot data=saruman;
scatter x=time y=score;
run;
title 'Plot Gandalf';
proc sgplot data=gandalf;
scatter x=time y=score;
run;
title 'Report for canadian';
proc print data=have;
where class='canadian';
run;
title 'Plot Bieber';
proc sgplot data=bieber;
scatter x=time y=score;
run;
Ideally, I'd like to automate this. I've been trying to set this up, but am missing something. Here is my attempt:
proc sql;
select count(distinct name) into :numname
from have;
%let numname=&numname;
select distinct name into :name1 - :name&numname
from have;
select count(distinct class) into :numclass
from have;
%let numclass=&numclass;
select distinct class into :class1 - :class&numclass
from have;
quit;
%macro printit;
%do i = 1 %to &numclass;
title 'Report for &&class&i';
proc print data=have;
where class=&&class&i;
run;
/*insert sgplot here*/
%end;
%mend;
%printit;
Please help here. Can't get the syntax sorted....
Thanks.

I see 4 issues.
Macros will only resolve inside double quotes. Single quotes mask the resolution. So change the title statement to:
title "Report for &&class&i";
The class variable is a string. You need to quote the string in the where clause:
where class="&&class&i";
You don't need to generate the separate data sets. You can add a where clause when you specify the data for SGPLOT
proc sgplot data=have(where=(name="&&name&i"));
The number of names and classes are different, so you need two loops.
EDIT: Also look at SGPANEL and/or SGRENDER. You can generate all the charts in 1 call.

The print procedure and most ODS procedures support by-group processing, which might be a lot simpler and save a lot of time depending on what you require.
proc sort data=have; by class;
proc print data=have;
by class;
run;
and
proc sort data=have; by name;
proc sgplot data=have;
by name;
scatter x=time y=score;
run;

Related

SAS: How to reference a global macro variable to create new table or dataset?

I'm having some trouble referencing a global macro variable outside of the macro to create a new data set. The global variable was created to run a loop for creating several yearly data sets using a vector of specified years, as you can see in the code below:
%macro loopyear;
%global year;
%do year = 2004 %to 2017;
proc import datafile = "C:\Filepath\blah.txt"
dbms = dlm out = blah&year.; /*Creates a dataset for each year, e.g. blah2004, blah2005, etc.) */
delimiter = " ";
getnames = no;
run;
data blah&year.;
set blah&year.;
year = &year.;
run;
proc sql;
create table blah&year._rail as
select year, var1, var2, var3, var4
from blah&year.
where var2= "rail";
quit;
%end;
%mend loopyear;
%loopyear;
/*Merge all year datasets into one master set*/
data blah_total;
set blah&year._rail;
run;
When I try to create the master data set outside of the macro, however, I get the following error:
data blah;
set blah&year._rail;
run;
ERROR: File work.blah2018_rail.data does not exist
This is frustrating because I'm only trying to create the master set based on 2004-2017 data, as referenced in the macro variable. Can someone help me pinpoint my error -- is it in the way I defined the global variable, or am I missing a step somewhere? Any help is appreciated.
Thanks!
This is an interesting quirk of both macro and data step do-loops in SAS - the loop counter is incremented before the exit condition is checked, so after your loop has run it will be one increment past your stop value, e.g.:
%macro example;
%do i = 1 %to 3;
%put i = &i;
%end;
%put i = &i;
%mend;
%example;
Output:
i = 1
i = 2
i = 3
i = 4
For your final step you probably want the set statement to look like this:
set blah2004_rail ... blah2017_rail;
You could write a macro loop to generate the list and move the data step inside your macro, e.g.
set %do year = 2004 %to 2017; blah&year._rail %end;;
The second semi-colon is important! You need one to close the %end and one to terminate the set statement.
Change your naming structure. Have a common prefix and put the year at the end, then you can use the semi colon to short reference all the datasets at once.
%macro loopyear;
%global year;
%do year = 2004 %to 2017;
proc import datafile = "C:\Filepath\blah.txt"
dbms = dlm out = blah&year.; /*Creates a dataset for each year, e.g. blah2004, blah2005, etc.) */
delimiter = " ";
getnames = no;
run;
data blah&year.;
set blah&year.;
year = &year.;
run;
proc sql;
create table blah_rail_&year. as
select year, var1, var2, var3, var4
from blah&year.
where var2= "rail";
quit;
%end;
%mend loopyear;
%loopyear;
/*Merge all year datasets into one master set*/
data blah_total;
set blah_rail: ;
run;

Defining Fixed SAS Macro Variables

I am trying to have a macro run but I'm not sure if it will resolve since I don't have connection to my database for a little while. I want to know if the macro is written correctly and will resolve the states on each pass through the code (ie do it repetitively and create a table for each state).
The second thing I would like to know is if I can run a macro through a from statement. For example let entpr be the database that I'm pulling from. Would the following resolve correctly:
proc sql;
select * from entpr.&state.; /*Do I need the . after &state?*/
The rest of my code:
libname mdt "........."
%let state = ny il ar ak mi;
proc sql;
create table mdt.&state._members
as select
corp_ent_cd
,mkt_sgmt_admnstn_cd
,fincl_arngmt_cd
,aca_ind
,prod_type
,cvyr
,cvmo
,sum(1) as mbr_cnt
from mbrship1_&state.
group by 1,2,3,4,5,6,7;
quit;
If &state contains ny il ar ak mi then as it is written, the from statement in your code will resolve to: from mbrship1_ny il ar ak mi - which is invalid SQL syntax.
My guess is that you're wanting to run the SQL statement for each of the following tables:
mbrship1_ny
mbrship1_il
mbrship1_ar
mbrship1_ak
mbrship1_mi
In which case the simplest macro would look something like this:
%macro do_sql(state=);
proc sql;
create table mdt.&state._members
as select
...
from mbrship1_&state
group by 1,2,3,4,5,6,7;
quit;
%mend;
%do_sql(state=ny);
%do_sql(state=il);
%do_sql(state=ar);
%do_sql(state=ak);
%do_sql(state=mi);
As to your question regarding whether or not to include the . the rule is that if the character following your macro variable is not a-Z, 0-9, or the underscore, then the period is optional. Those characters are the list of valid characters for a macro variable name, so as long as it's not one of those you don't need it as SAS will be able to identify where the name of the macro finishes. Some people always include it, personally I leave it out unless it's required.
When selecting data from multiple tables, whose names themselves contain some data (in your case the state) you can stack the data with:
UNION ALL in SQL
SET in Data step
As long as you are stacking data, you should also add a new column to the query selection that tracks the state.
Consider this pattern for stacking in SQL
data one;
do index = 1 to 10; do _n_ = 1 to 2; output; end; end;
run;
data two;
do index = 101 to 110; do _n_ = 1 to 2; output; end; end;
run;
proc sql;
create table want as
select
source, index
from
(select 'one' as source, * from one)
union all
(select 'two' as source, * from two)
;
The pattern can be abstracted into a template for SQL source code that will be generated by macro.
%macro my_ultimate_selector (out=, inlib=, prefix= states=);
%local index n state;
%let n = %sysfunc(countw(&states));
proc sql;
create table &out as
select
state
, corp_ent_cd
, mkt_sgmt_admnstn_cd
, fincl_arngmt_cd
, aca_ind
, prod_type
, cvyr
, cvmo
, count(*) as state_7dim_level_cnt
from
%* ----- use the UNION ALL pattern for stacking data -----;
%do index = 1 %to &n;
%let state = %scan(&states, &index);
%if &index > 1 %then %str(UNION ALL);
(select "&state" as state, * from &inlib..&prefix.&state.)
%end;
group by 1,2,3,4,5,6,7,8 %* this seems to be to much grouping ?;
;
quit;
%mend;
%my_ultimate_selector (out=work.want, inlib=mdt, prefix=mbrship1_, states=ny il ar ak mi)
If the columns of the inlib tables are not identical with regard to column order and type, use a UNION ALL CORRESPONDING to have the SQL procedure line up the columns for you.

enrollment date 12 months before and after a specific date

enter image description hereI am trying to find persons by id who have continuous, 12 months enrollment before the hospitalization date and another 12 months after the hospitalization date. Each member will have one row.
This is using claim database in US. Any help is appreciated.
Example of the dataset:
ID Enr_date End_Date hosp_date
1 1/5/2004 1/6/2008 2/2/2006
2 .... and so on
3
4
id start_e end_e date_h
1 1/1/2005 1/1/2006 2/8/2008
1 2/3/2006 4/5/2013
2 5/7/2005 8/8/2006 4/5/2007
2 1/1/2007 2/2/2012
3 5/9/2005 5/9/2007 1/1/2007
3 6/4/2008 7/7/2012
assuming my last comments have answers there are many ways you can do this. Starting out it may be difficult to get outer joins, cross joins etc working in a way that's easy to understand. With a SAS macro we can break the problem down so it's easy to understand and do any debugging that may be necessary. Here's one approach that may work for you:
%macro hdates;
/* get number of hosp_dates */
proc sql noprint;
select count(*) into: cnt
from date where hosp_date ne .;
quit;
%let cnt = &cnt;
/* place hdates and ids into macro vars */
proc sql noprint;
select enrolid, hosp_date into: id_1 - :id_&cnt, : hdate_1 - :hdate_&cnt
from date;
quit;
proc delete data= hcov; run;
/* for each hdate id pair go through the dataset and test for 12 mo coverage
%do i = 1 %to &cnt;
data new;
set date;
if (enrolid = &&id_&i) then do;
preDays = "&&hdate_&i"d - start_date ;
postDays = end_date - "&&hdate_&i"d;
if (preDays >= 365 and postDays >= 365) then output;
end;
run;
proc append base = hcov data=new;run;
%end;
%mend hdates;
%hdates;
I work in claims data and I think I understand what you are trying to ask. I recommend making one table with the "condensed" enrollment ranges and another with the hospitalization dates. Then you may merge them together and keep only those patients who meet your criteria. The following code will condense the enrollment ranges (assuming good records):
PROC SORT DATA=dset_in; BY id enr_date end_date; RUN;
DATA enrollment (KEEP=id enroll_start enroll_stop);
SET dset_in;
FORMAT enroll_start enroll_stop DATE9.;
BY id enr_date end_date;
RETAIN enroll_start enroll_stop;
IF first.id THEN DO;
enroll_start=enr_date;
enroll_stop=end_date;
END;
ELSE IF enr_date-enroll_stop <= 1 THEN enroll_stop=end_date;
ELSE DO;
OUTPUT;
enroll_start=enr_date;
enroll_stop=end_date;
END;
IF last.id THEN OUTPUT;
RUN;
Then this code will keep only those patients with a hospitalization and 365 days enrollment before and after. If the hosp_claims dataset has more than 1 hospitalization per patient, sort then take the first obs per id after this step:
PROC SQL;
CREATE TABLE hosp_enrolled AS
SELECT DISTINCT a.id, a.hosp_dt, b.enroll_start, b.enroll_stop
FROM hosp_claims AS a, enrollment AS b
WHERE a.id=b.id AND b.enroll_start+365 <= a.hosp_dt <= enroll_stop-365;
QUIT;

Extracting certain rows from data using hash object in SAS

I have two SAS data tables. The first has many millions of records, and each record is identified with a sequential record ID, like this:
Table A
Rec Var1 Var2 ... VarX
1 ...
2
3
The second table specifies which rows from Table A should be assigned a coding variable:
Table B
Code BegRec EndRec
AA 1200 4370
AX 7241 9488
BY 12119 14763
So the first row of Table B means any data in Table A that has rec between 1200 and 4370 should be assigned code AA.
I know how to accomplish this with proc sql, but I want to see how this is done with a hash object.
In SQL, it's just:
proc sql;
select b.code, a.*
from tableA a, tableB b
where b.begrec<=a.rec<=b.endrec;
quit;
My actual data contains hundreds of gigabytes of data, so I want to do the processing as efficiently as possible. My understanding is that using a hash object may help here, but I haven't been able to figure out how to map what I'm doing to use that way.
A hash object solution (data input code borrowed from #Rob_Penridge).
data big;
do rec = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
data created;
format code $4.;
format begrec endrec best8.;
if _n_=1 then do;
declare hash h(dataset:'lookup');
h.definekey('Code');
h.definedata('code','begrec','endrec');
h.definedone();
call missing(code,begrec,endrec);
declare hiter iter('h');
end;
set big;
iter.first();
do until (rc^=0);
if begrec <= rec <= endrec then do;
code_dup=code;
end;
rc=iter.next();
end;
keep rec code_dup;
run;
I'm not sure a hash table would even be the most efficient approach here. I would probably solve this problem using a SELECT statement as the conditional logic will be fast and it still only requires 1 parse through the data:
select;
when ( 1200 <= _n_ <=4370) code = 'AA';
...
otherwise;
end;
Assuming that you will need to run this code multiple times and the data may change each time you may not want to hardcode the select statement. So the best solution would dynamically build it using a macro. I have a utility macro I use for these kinds of situations (included at the bottom):
1) Create the data
data big;
do i = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
2) Save the contents of the smaller table into macro variables. You could also do this using call symput or other preferred method. This method assumes you don't have too many rows in your lookup table.
%table_parse(iDs=lookup, iField=code , iPrefix=code);
%table_parse(iDs=lookup, iField=begrec, iPrefix=begrec);
%table_parse(iDs=lookup, iField=endrec, iPrefix=endrec);
3) Dynamically build the SELECT statement.
%macro ds;
%local cnt;
data final;
set big;
select;
%do cnt=1 %to &code;
when (&&begrec&cnt <= _n_ <= &&endrec&cnt) code = "&&code&cnt";
%end;
otherwise;
end;
run;
%mend;
%ds;
Here is the utility macro:
/*****************************************************************************
** MACRO.TABLE_PARSE.SAS
**
** AS PER %LIST_PARSE BUT IT TAKES INPUT FROM A FIELD IN A TABLE.
** STORE EACH OBSERVATION'S FIELD'S VALUE INTO IT'S OWN MACRO VARIABLE.
** THE TOTAL NUMBER OF WORDS IN THE STRING IS ALSO SAVED IN A MACRO VARIABLE.
**
** THIS WAS CREATED BECAUSE %LIST_PARSE WOULD FALL OVER WITH VERY LONG INPUT
** STRINGS. THIS WILL NOT.
**
** EACH VALUE IS STORED TO ITS OWN MACRO VARIABLE. THE NAMES
** ARE IN THE FORMAT <PREFIX>1 .. <PREFIX>N.
**
** PARAMETERS:
** iDS : (LIB.DATASET) THE NAME OF THE DATASET TO USE.
** iFIELD : THE NAME OF THE FIELD WITHIN THE DATASET.
** iPREFIX : THE PREFIX TO USE FOR STORING EACH WORD OF THE ISTRING TO
** ITS OWN MACRO VARIABLE (AND THE TOTAL NUMBER OF WORDS).
** iDSOPTIONS : OPTIONAL. ANY DATSET OPTIONS YOU MAY WANT TO PASS IN
** SUCH AS A WHERE FILTER OR KEEP STATEMENT.
**
******************************************************************************
** HISTORY:
** 1.0 MODIFIED: 01-FEB-2007 BY: ROBERT PENRIDGE
** - CREATED.
** 1.1 MODIFIED: 27-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW UNMATCHED QUOTES ETC IN VALUES BEING RETURNED BY
** CHARACTER FIELDS.
** 1.2 MODIFIED: 30-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW BLANK CHARACTER VALUES AND ALSO REMOVED TRAILING
** SPACES INTRODUCED BY CHANGE 1.1.
** 1.3 MODIFIED: 31-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW PARENTHESES IN CHARACTER VALUES.
** 1.4 MODIFIED: 31-AUG-2010 BY: ROBERT PENRIDGE
** - ADDED SOME DEBUG VALUES TO DETERMINE WHY IT SOMETIMES LOCKS TABLES.
*****************************************************************************/
%macro table_parse(iDs=, iField=, iDsOptions=, iPrefix=);
%local dsid pos rc cnt cell_value type;
%let cnt=0;
/*
** OPEN THE TABLE (AND MAKE SURE IT EXISTS)
*/
%let dsid=%sysfunc(open(&iDs(&iDsOptions),i));
%if &dsid eq 0 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
/*
** GET THE POSITION OF THE FIELD (AND MAKE SURE IT EXISTS)
*/
%let pos=%sysfunc(varnum(&dsid,&iField));
%if &pos eq 0 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
%else %do;
/*
** DETERMINE THE TYPE OF THE FIELD
*/
%let type = %upcase(%sysfunc(vartype(&dsid,&pos)));
%end;
/*
** READ THROUGH EACH OBSERVATION IN THE TABLE
*/
%let rc=%sysfunc(fetch(&dsid));
%do %while (&rc eq 0);
%let cnt = %eval(&cnt + 1);
%if "&type" = "C" %then %do;
%let cell_value = %qsysfunc(getvarc(&dsid,&pos));
%if "%trim(&cell_value)" ne "" %then %do;
%let cell_value = %qsysfunc(cats(%nrstr(&cell_value)));
%end;
%end;
%else %do;
%let cell_value = %sysfunc(getvarn(&dsid,&pos));
%end;
%global &iPrefix.&cnt ;
%let &iPrefix.&cnt = &cell_value ;
%let rc=%sysfunc(fetch(&dsid));
%end;
/*
** CHECK FOR ABNORMAL TERMINATION OF LOOP
*/
%if &rc ne -1 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
/*
** ENSURE THE TABLE IS CLOSED SUCCESSFULLY
*/
%let rc=%sysfunc(close(&dsid));
%if &rc %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
%global &iPrefix;
%let &iPrefix = &cnt ;
%mend;
Other examples of calling this macro:
%table_parse(iDs=sashelp.class, iField=sex, iPrefix=myTable, iDsOptions=%str(where=(sex='F')));
%put &mytable &myTable1 &myTable2 &myTable3; *etc...;
I'd be tempted to use the direct access method POINT= here, this will only read the required row numbers rather than the whole dataset.
Here is the code, which uses the same create data code as in Rob's answer.
data want;
set lookup;
do i=begrec to endrec;
set big point=i;
output;
end;
drop begrec endrec;
run;
If you have the code column already in the big dataset and you just wanted to update the values from the lookup dataset, then you could do this using MODIFY.
data big;
set lookup (rename=(code=code1));
do i=begrec to endrec;
modify big point=i;
code=code1;
replace;
end;
run;
Here's my solution, using proc format. This is also done in-memory, much like a hash table, but requires less structural code to work.
(Data input code also borrowed from #Rob_Penridge.)
data big;
do rec = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
ZZ 0 20
JJ 40 60
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
data lookup_f;
set lookup;
rename
BegRec = start
EndRec = end
Code = label;
retain fmtname 'CodeRecFormat';
run;
proc format library = work cntlin=lookup_f; run;
data big_formatted;
format rec CodeRecFormat.;
format rec2 8.;
length code $5.;
set big;
code = putn(rec, "CodeRecFormat.");
rec2 = rec;
run;

sas macro index or other?

I have 169 towns for which I want to iterate a macro. I need the output files to be saved using the town-name (rather than a town-code). I have a dataset (TOWN) with town-code and town-name. Is it possible to have a %let statement that is set to the town-name for each iteration where i=town-code?
I know that I can list out the town-names using the index function, but I'd like a way to set the index function so that it sets a %let statement to the TOWN.town-name when i=TOWN.town-code.
All the answers below seem possible. I have used the %let = %scan( ,&i) option for now. A limitation is that the town names can be more than one word, so I've substituted underscores for spaces that I correct later.
This is my macro. I output proc report to excel for each of the 169 towns. I need the excel file to be saved as the name of the town and for the header to include the name of the town. Then, in excel, I merge all 169 worksheets into a single workbook.
%MACRO BY_YEAR;
%let townname=Andover Ansonia Ashford Avon ... Woodbury Woodstock;
%do i = 1999 %to 2006;
%do j = 1 %to 169;
%let name = %scan(&townname,&j);
ods tagsets.msoffice2k file="&ASR.\Town_Annual\&i.\&name..xls" style=minimal;
proc report data=ASR nofs nowd split='/';
where YR=&i and TWNRES=&j;
column CODNUM AGENUM SEX,(dths_sum asr_sum seasr_sum);
define CODNUM / group ;
define agenum / group ;
define sex / across ;
define dths_sum / analysis ;
define asr_sum / analysis ;
define seasr_sum / analysis ;
break after CODNUM / ul;
TITLE1 "&name Resident Age-Specific Mortality Rates by Sex, &i";
TITLE2 "per 100,000 population for selected causes of death";
run;
ods html close;
%end;
%end;
%MEND;
My guess is that the reason why you want to look up the town name by town index is to repeatedly call a macro with each town name. If this is the case, then you don't even need to get involved with the town index business at all. Just call the macro with each town name. There are many ways to do this. Here is one way using call execute().
data towns;
infile cards dlm=",";
input town :$char10. ##;
cards;
My Town,Your Town,His Town,Her Town
;
run;
%macro doTown(town=);
%put Town is &town..;
%mend doTown;
/* call the macro for each town */
data _null_;
set towns;
m = catx(town, '%doTown(town=', ')');
call execute(m);
run;
/* on log
Town is My Town.
Town is Your Town.
Town is His Town.
Town is Her Town.
*/
If you do need to do a table lookup, then one way is to convert your town names into a numeric format and write a simple macro to retrieve the name, given an index value. Something like:
data towns;
infile cards dlm=",";
input town :$char10. ##;
cards;
My Town,Your Town,His Town,Her Town
;
run;
/* make a numeric format */
data townfmt;
set towns end=end;
start = _n_;
rename town = label;
retain fmtname 'townfmt' type 'n';
run;
proc format cntlin=townfmt;
run;
%macro town(index);
%trim(%sysfunc(putn(&index,townfmt)))
%mend town;
%*-- check --*;
%put %town(1),%town(2),%town(3),%town(4);
/* on log
My Town,Your Town,His Town,Her Town
*/
Or how about you just pass both the code and the name to the macro as parameters? Like this?
%MACRO DOSTUFF(CODE=, NAME=);
DO STUFF...;
PROC EXPORT DATA=XYZ OUTFILE="&NAME."; RUN;
%MEND;
DATA _NULL_;
SET TOWNS;
CALL EXECUTE("%DOSTUFF(CODE=" || STRIP(CODE) || ", NAME=" || STRIP(NAME) || ");");
RUN;