subset data in sas using macro properly and append them to a file - macros

I have some data sets that I wrote some code to clean according to some methods according to some biological literature and then I want to split it into day and night (because they must be analyzed separately). It worked but now I need to do this for the full set which is WAY to many files for me to want to deal with one by one. So I am now trying to write a macro to split it into days and nights for me..
My data looks like so
Hour var1 var2 var3
1 123 90 100
2 122 99 108
...........
4 156 80 120
4 156 80 145
4 143 82 132
basically night has 1 obs per hour day 3. I also have this for many days.
Each dataset is named STUDYIDID#_first or STUDYID_ID#_last. I want to generate four datasets per dataset.
So MYID111_first would create: MYID111_first_day_var1, MYID111_first_day_var2, MYID111_first_night_var1 , and MYID111_first_night_var2.
I would then LIKE to append them into 4 datasets:
MYID_A_first_day_var1, MYID_A_first_day_var2, MYID_A_first_night_var1 , and MYID_A_first_night_var2.
MY CODE SO FAR:
%macro datacut(libname,worklib=work, grp = _A ,time1 = _night , time2 = _day type1 = _var1 , type2 = _var2);
%local num i;
proc datasets library=&libname memtype=data nodetails;
contents out=&worklib..temp1(keep=memname) data=_all_ noprint;
run;
data _null_;
set &worklib..temp1 end=final;
by memname notsorted;
if last.memname;
n+1;
call symput('ds'||left(put(n,8.)),trim(memname));
if final then call symput('num',put(n,8.));
run;
%do i=1 %to #
/* do the artifact removing method */
DATA &libname..&&ds&i;
SET &libname..&&ds&i;
PT_ID = '&ds&i' ;
IF var1< 60 OR var1> 230 then delete;
IF var2< 30 OR var2> 230 THEN delete;
IF var3< 60OR var3 > 135 THEN DELETE;
IF var2 > var1 then delete;
run;
/* get just the night values */
PROC SQL;
CREATE TABLE &libname..&&ds&i&time1 as
SELECT *
FROM &libname..&&ds&i
WHERE Hour BETWEEN 0 and 6 OR Hour BETWEEN 22 and 24
order by systolic
;
QUIT;
/* trim off the proper number of observations for variable 1 */
DATA &libname..&&ds&i&time1&type1;
SET &libname..&&ds&i&time1 end=eof;
IF _N_ =1 then delete;
if eof then delete;
run;
PROC append base= &libname..&&ds&time1&type1
data= &libname..&&ds&i&time1;
run;
QUIT;
%end;
%mend datacut;
%datacut(work)
Now the initial datastep works correctly but the later ones don't rename the data as planned. I get a bunch of datasets called Ds10_night_var1 with the wrong field names (memtype, nodetails, data)
I get the warning:
WARNING: Apparent symbolic reference DS1_NIGHT not resolved.
NOTE: Line generated by the macro variable "TIME1".
1 work.&ds1_night
-
22
200
ERROR 22-322: Expecting a name.
ERROR 200-322: The symbol is not recognized and will be ignored.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
WARNING: Apparent symbolic reference DS1_NIGHT_SYS not resolved.
22: LINE and COLUMN cannot be determined.
NOTE 242-205: NOSPOOL is on. Rerunning with OPTION SPOOL might allow recovery of the LINE and
COLUMN where the error has occurred.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, /, ;,
_DATA_, _LAST_, _NULL_.
201: LINE and COLUMN cannot be determined.
NOTE: NOSPOOL is on. Rerunning with OPTION SPOOL might allow recovery of the LINE and COLUMN
where the error has occurred.
ERROR 201-322: The option is not recognized and will be ignored.
So I want the right names for my file AND my datasets to actually have data I and I don't understand why they don't.

As you know, you write a macro variable as & followed by its name, optionally followed by .. With this . you can explicitly end the macro variable reference, so you can use a macro variable as a prefix, like in
%let prefix = fore;
Aspect = &prefix.Ground;
which evaluates to Aspect = foreGround;. And that is why
%let myLib = abc;
%let mymember = xyz;
data &myLib.&myMember;
is an error, as it evaluates to data abcxyz;, and you must write
data &myLib..&myMember;
** or as I prefer **;
data &myLib..&myMember.;
to get data abc.xyz;.
For the case you need macro variables to create macro variable names from, SAS allows to write double ampersands &&, which evaluate to a single & and continues evaluating until all ampersands are consumed. So suppose
%let i = 1;
%let ds1 = myData;
%let time = _nigth;
This is how SAS evaluates &&ds&i&time1 :
&&ds&i&time1
&ds1_night
ERROR because a macro variable ds1_night is not defined
This is how SAS evaluates &&ds&i..&time1. :
&&ds&i..&time1.
&ds1._night
myData_night

Related

How to pad a number with leading zero in a SAS Macro loop counter?

So I have a range of datasets in a specific library. These datasets are named in the format DATASET_YYYYMM, with one dataset for each month. I am trying to append a range of these datasets based on user input for the date range. i.e. If start_date is 01NOV2019 and the end_date is 31JAN2020, I want to append the three datasets: LIBRARY.DATASET_201911, LIBRARY.DATASET_201912 and LIBRARY.DATASET_202001.
The range is obviously variable, so I can't simply name the datasets manually in a set function. Since I need to loop through the years and months in the date range, I believe a macro is the best way to do this. I'm using a loop within the SET statement to append all the datasets. I have copied my example code below. It does work in theory. But in practice, only if we are looping over the months of November and December. As the format of the dataset name has a two digit month, for Jan-Sept it will be 01-09. The month function returns 1-9 however, and of course a 'File DATASET_NAME does not exist' error is thrown.
Problem is I cannot figure out a way to get it to interpret the month with leading 0, without ruining functionality of another part of the loop/macro.
I have tried numerous approaches to format the number as z2, cannot get any to work.
i.e. Including PUTN functions in the DO line for quote_month as follows, it ignores the leading zero when generating the dataset name in the line below.
%DO quote_month = %SYSFUNC(IFN(&quote_year. = &start_year.,%SYSFUNC(PUTN(&start_month.,z2.)),1,.)) %TO %SYSFUNC(IFN(&quote_year. = &end_year.,%SYSFUNC(PUTN(&end_month.,z2.)),12,.));
Below is example code (without any attempt to reformat it to z2) - it will throw an error because it cannot find 'dataset_20201' because it is actually called 'dataset_202001'. The dataset called dataset_combined_example produces the desired output of the code by manually referencing the dataset names which it will be unable to do in practice. Does anyone know how to go about this?
DATA _NULL_;
FORMAT start_date end_date DATE9.;
start_date = '01NOV2019'd;
end_date = '31JAN2020'd;
CALL symput('start_date',start_date);
CALL symput('end_date',end_date);
RUN;
DATA dataset_201911;
input name $;
datalines;
Nov1
Nov2
;
RUN;
DATA dataset_201912;
input name $;
datalines;
Dec1
Dec2
;
RUN;
DATA dataset_202001;
input name $;
datalines;
Jan1
Jan2
;
RUN;
DATA dataset_combined_example;
SET dataset_201911 dataset_201912 dataset_202001;
RUN;
%MACRO get_table(start_date, end_date);
%LET start_year = %SYSFUNC(year(&start_date.));
%LET end_year = %SYSFUNC(year(&end_date.));
%LET start_month = %SYSFUNC(month(&start_date.));
%LET end_month = %SYSFUNC(month(&end_date.));
DATA dataset_combined;
SET
%DO quote_year = &start_year. %TO &end_year.;
%DO quote_month = %SYSFUNC(IFN(&quote_year. = &start_year.,&start_month.,1,.)) %TO %SYSFUNC(IFN(&quote_year. = &end_year.,&end_month.,12,.));
dataset_&quote_year.&quote_month.
%END;
%END;
;
RUN;
%MEND;
%get_table(&start_date.,&end_date.);
You could do this using putn and z2. format.
%DO quote_year = &start_year. %TO &end_year.;
%DO quote_month = %SYSFUNC(IFN(&quote_year. = &start_year.,&start_month.,1,.)) %TO %SYSFUNC(IFN(&quote_year. = &end_year.,&end_month.,12,.));
dataset_&quote_year.%sysfunc(putn(&quote_month.,z2.))
%END;
%END;
You can also do this using the metadata tables without having to resort to macro loops in the first place:
/* A few datasets to combine */
data
DATASET_201910
DATASET_201911
DATASET_201912
DATASET_202001
;
run;
%let START_DATE = '01dec2019'd;
%let END_DATE = '31jan2020'd;
proc sql noprint;
select catx('.', libname, memname) into :DS_LIST separated by ' '
from dictionary.tables
where
&START_DATE <=
case
when prxmatch('/DATASET_\d{6}/', memname)
then input(scan(memname, -1, '_'), yymmn6.)
else -99999
end
<= &END_DATE
and libname = 'WORK'
;
quit;
data combined_datasets /view=combined_datasets;
set &DS_LIST;
run;
The case-when in the where clause ensures that any other datasets present in the same library that don't match the expected naming scheme are ignored.
One key difference with this approach is that you will never end up attempting to read a dataset that doesn't exist if one of the expected datasets in your range is missing.
You can use the Z format to generate strings with leading zeros.
But your problem is much easier if you use SAS date functions and formats to generate the YYYYMM strings. Just use a normal iterative %DO loop to cycle the month offset from zero to the number of months between the two dates.
%macro get_table(start_date, end_date);
%local offset dsname ;
data dataset_combined;
set
%do offset=0 %to %sysfunc(intck(month,&start_date,&end_date));
%let dsname=dataset_%sysfunc(intnx(month,&start_date,&offset),yymmn6);
&dsname.
%end;
;
run;
%mend get_table;
Result:
445 options mprint;
446 %get_table(start_date='01NOV2019'd,end_date='31JAN2020'd);
MPRINT(GET_TABLE): data dataset_combined;
MPRINT(GET_TABLE): set dataset_201911 dataset_201912 dataset_202001 ;
MPRINT(GET_TABLE): run;
In a macro
Use INTNX to compute the bounds for a loop over date values. Within the loop:
Compute the candidate data set name according to specified lib, prefix and desired date value format. <yyyy><mm> is output by format yymmn6.
Use EXIST to check candidate data sets for existence.
Alternatively, do not check, but make sure to set OPTIONS NODSNFERR prior to combining. The setting will prevent errors when specifying a non-existent data set.
Update the loop index to the end of the month so the next increment takes the index to the start of the next month.
%macro names_by_month(lib=work, prefix=data_, start_date=today(), end_date=today(), format=yymmn6.);
%local index name;
%* loop over first-of-the-month date values;
%do index = %sysfunc(intnx(month, &start_date, 0)) %to %sysfunc(intnx(month, &end_date, 0));
%* compute month dependent name;
%let name = &lib..&prefix.%sysfunc(putn(&index,&format));
%* emit name if it exists;
%if %sysfunc(exist(&name)) or %sysfunc(exist(&name,VIEW)) %then %str(&name);
%* prepare index for loop +1 increment so it goes to start of next month;
%let index = %sysfunc(intnx(month, &index, 0, E));
%end;
%mend;
* example usage:
data combined_imports(label="nov2019 to jan2020");
set
%names_by_month(
prefix=import_,
start_date='01NOV2019'd,
end_date = '31JAN2020'd
)
;
run;

SAS syntax error 22 and 200 when using variable inside dataset name inside macro

I am trying to sort several datasets in SAS using a loop within a macro, using a list of numbers, some of which have leading zeros, within dataset names (eg., in the example code, a list of 01,02), and this code will also guide some other loops I would like to build.
I used the SAS guidance on looping through a nonsequential list of values with a macro DO loop code as a starting point from here: http://support.sas.com/kb/26/155.html.
data dir1201;
input compid directorid X;
format ;
datalines;
01 12 11
02 15 5
;
run;
data dir1202;
input compid directorid X;
format ;
datalines;
01 12 1
03 18 8
;
run;
%macro loops1(values);
/* Count the number of values in the string */
%let count=%sysfunc(countw(&values));
/* Loop through the total number of values */
%do i = 1 %to &count;
%let value=%qscan(&values,&i,%str(,));
proc sort data=dir12&value out=dir12sorted&value nodupkey;
by directorid compid;
run;
%end;
%mend;
options mprint mlogic;
%loops1(%str(01,02))
I assume str is needed for nonsequential lists, but that is also useful when I want to retain leading zeros;
I see the macro variable seems to incorporate the 01 or 02 in the log, but then I receive the error 22 and 200 right afterward. Here is a snippet of the log error using this example code:
339 %macro loops1(values);
340 /* Count the number of values in the string */
341 %let count=%sysfunc(countw(&values));
342 /* Loop through the total number of values */
343 %do i = 1 %to &count;
344 %let value=%qscan(&values,&i,%str(,));
345 proc sort data=dir12&value out=dir12sorted&value nodupkey;
346 by directorid compid;
347 run;
348 %end;
349 %mend;
350 options mprint mlogic;
351 %loops1(%str(01,02))
MLOGIC(LOOPS1): Beginning execution.
MLOGIC(LOOPS1): Parameter VALUES has value 0102
MLOGIC(LOOPS1): %LET (variable name is COUNT)
MLOGIC(LOOPS1): %DO loop beginning; index variable I; start value is 1; stop value is 2; by
value is 1.
MLOGIC(LOOPS1): %LET (variable name is VALUE)
NOTE: Line generated by the macro variable "VALUE".
1 dir1201
--
22
--
200
ERROR: File WORK.DIR12.DATA does not exist.
I don't understand why dir1201 is showing, but then the error is referencing the dataset work.dir12 (ignoring the 01)
The macro quoting is confusing the parser into thinking the macro quoting signals the start of a new token. You can either add %unquote() to remove the macro quoting.
proc sort data=%unquote(dir12&value) out=%unquote(dir12sorted&value) nodupkey;
Or just don't add the macro quoting to begin with.
%let value=%scan(&values,&i,%str(,));
It will be much easier to use your macro if you design it to take space delimited values rather than comma delimited. Then there is no need to add macro quoting to the call either.
%macro loops1(values);
%local i value ;
%do i = 1 %to %sysfunc(countw(&values,%str( )));
%let value=%scan(&values,&i,%str( ));
proc sort data=dir12&value out=dir12sorted&value nodupkey;
by directorid compid;
run;
%end;
%mend loops1;
%loops1(values=01 02)
The macro declaration option /PARMBUFF is used to make the automatic macro variable SYSPBUFF available for scanning an arbitrary number of comma separated values passed as arguments.
%macro loops1/parmbuff;
%local index token;
%do index = 1 %to %length(&syspbuff);
%let token=%scan(&syspbuff,&index);
%if %Length(&token) = 0 %then %goto leave1;
proc sort data=dir12&token out=dir12sorted&token nodupkey;
by directorid compid;
run;
%end;
%leave1:
%mend;
options mprint nomlogic;
%loops1(01,02)
SYSPBUFF
MACRO / PARMBUFF
Since you're looping over dates, I think something like this may be more helpful in the long run:
%macro date_loop(start, end);
%let start=%sysfunc(inputn(&start, anydtdte9.));
%let end=%sysfunc(inputn(&end, anydtdte9.));
%let dif=%sysfunc(intck(month, &start, &end));
%do i=0 %to &dif;
%let date=%sysfunc(intnx(month, &start, &i, b), yymmdd4.);
%put &date;
%end;
%mend date_loop;
%date_loop(01Jan2012, 01Jan2015);
This is a slightly modified version from the one in the SAS documentation (macro Appendix, example 11).
http://documentation.sas.com/?docsetId=mcrolref&docsetTarget=n01vuhy8h909xgn16p0x6rddpoj9.htm&docsetVersion=9.4&locale=en

SAS Error trying to loop through multiple datasets

I'm trying to run some code which will hopefully concatenate multiple months or years worth of data. I am trying to figure out when a field was populated with a value. I.e. there is field XYZ in my data set and it is populated with value A in November 2016. If I run my code from Jan - Dec I would like a new field populated with the date that SAS encounters a non-blank value in that field.
Here's my code:
options mprint symbolgen source mlogic merror syntaxcheck ;
%macro append_monthly(iStart_date=, iEnd_date=);
%local tmp_date i;
%let tmp_date = %sysfunc(intnx(month,&iStart_date,0,beginning)) ;
%do %while (&tmp_date le &iEnd_date);
%let i = %sysfunc(sum(&tmp_date),yymmn4.);
%put &i.;
%let tmp_date = %sysfunc(intnx(month,&tmp_date,1,beginning)) ;
libname note "my.qualifiers.fords.note&i." disp=shr;
data new ;
set note.file ;
%if ln_note_crbur_date_delinq ne '' %then spc_cmt_date = &i.;
run;
%end;
%mend;
%append_monthly(iStart_date=%sysfunc(mdy(5,1,2016)), iEnd_date=%sysfunc(mdy(10,1,2016)) );
LIBNAME _ALL_ CLEAR;
Here's a sample from log with errors :
SYMBOLGEN: Macro variable TMP_DATE resolves to 20606
SYMBOLGEN: Macro variable IEND_DATE resolves to 20728
MLOGIC(APPEND_MONTHLY): %DO %WHILE(&tmp_date le &iEnd_date) condition is TRUE; loop will iterate again.
MLOGIC(APPEND_MONTHLY): %LET (variable name is I)
SYMBOLGEN: Macro variable TMP_DATE resolves to 20606
MLOGIC(APPEND_MONTHLY): %PUT &i.
SYMBOLGEN: Macro variable I resolves to 1606
1606
MLOGIC(APPEND_MONTHLY): %LET (variable name is TMP_DATE)
SYMBOLGEN: Macro variable TMP_DATE resolves to 20606
MPRINT(APPEND_MONTHLY): spc_cmt_date = 1605 run;
SYMBOLGEN: Macro variable I resolves to 1606
MPRINT(APPEND_MONTHLY): libname note "my.qualifiers.fords.note1606" disp=shr;
ERROR: Unable to clear or re-assign the library NOTE because it is still in use.
ERROR: Error in the LIBNAME statement.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.NEW may be incomplete. When this step was stopped there were 0 observations and 622 variables.
WARNING: Data set WORK.NEW was not replaced because this step was stopped.
NOTE: The DATA statement used 0.01 CPU seconds and 49483K.
NOTE: The address space has used a maximum of 4292K below the line and 240388K above the line.
I can't figure out why this isn't working. Maybe this could work using Proc append.
Basically, I just want my output with a field that returns a date in the form of YYMM for when field ln_note_crbur_date_delinq was non-blank.
Any help would be greatly appreciated
I'd guess the reason for your error is that the handle is not being cleared on your source file before the next libname statement tries to re-assign.
An easy fix would be to use a different alias (libref) each time, as follows:
libname note&i "my.qualifiers.fords.note&i." disp=shr;
Then adjust your data step like so:
data new ;
set note&i..file ;
The next part appears to be confusion between macro logic and data step. Simply remove the % symbols as follows:
if ln_note_crbur_date_delinq ne '' then spc_cmt_date = &i.;
Finally, add a proc append before the %end as follows:
proc append base=work.final data=new; run;
If work.final does not exist, it will be created in the same format as new.
EDIT:
following discussion in comments, here is a revised approach:
%macro append_monthly(iStart_date=, iEnd_date=);
%local tmp_date i set_statement;
%let tmp_date = %sysfunc(intnx(month,&iStart_date,0,beginning)) ;
%do %while (&tmp_date le &iEnd_date);
%let i = %sysfunc(sum(&tmp_date),yymmn4.);
%let tmp_date = %sysfunc(intnx(month,&tmp_date,1,beginning)) ;
%let set_statement=&set_statement &i..file;
libname note&i "my.qualifiers.fords.note&i." disp=shr;
%end;
data new ;
set &set_statement;
if ln_note_crbur_date_delinq ne '' then spc_cmt_date = &i.;
run;
%mend;
%append_monthly(iStart_date=%sysfunc(mdy(5,1,2016)), iEnd_date=%sysfunc(mdy(10,1,2016)) );
LIBNAME _ALL_ CLEAR;

SAS: Split input file over many output datasets using macro variables

I've struggled with this for some time and am not sure if this is entirely possible (perhaps the macro can't resolve properly within the data step..?)
I'm using a data step to input several text files into one SAS data set. At the same time, i'd like to split them back out again based on a different parameter in the data.
Ideally, I'd like to use a macro variable in the output library name but the macro variable won't resolve:
WARNING: Apparent symbolic reference LVEL not resolved.
And the same data are output to all the files (as if the output &Test statement isn't there).
The data look like:
TIME LEVEL LAT LON HGT
1586616 1000 90 5 229
And the code:
%let lower_bound = 1979;
%let upper_bound = 1981;
%MACRO FILELOOP ;
%DO J = &lower_bound %TO &upper_bound ;
data library.file_1000
library.file_2000
library.file_3000
;
infile ".../hgt&J..txt" delimiter='09'x firstobs=2 obs=5;
input time level lat lon hgt;
// I want to use the level variable to determine the output SAS file;
call symputx ("lvel", level);
%let Test = library_&lvel.;
output &Test;
run;
%END ;
%MEND ;
%FILELOOP ;

SAS: put format in macro

I am trying to create a new variable by assigning a format to an existing variable. I'm doing this from within a macro. I'm getting the following error: ": Expecting a format name." Any thoughts on how to resolve? Thanks!
/* macro to loop thru a list of vars and execute a code block on each. This is working fine. */
%macro iterlist
(
code =
,list =
)
;
%*** ASSIGN EACH ITEM IN THE LIST TO AN INDEXED MACRO VARIABLE &&ITEM&I ;
%let i = 1;
%do %while (%cmpres(%scan(&list., &i.)) ne );
%let item&i. = %cmpres(%scan(&list., &i.));
%let i = %eval((&i. + 1);
%end;
%*** STORE THE COUNT OF THE NUMBER OF ITEMS IN A MACRO VARIABLE: &CNTITEM;
%let cntitem = %eval((&i. - 1);
%*** EXPRESS CODE, REPLACING TOKENS WITH ELEMENTS OF THE LIST, IN SEQUENCE;
%do i = 1 %to &cntitem.;
%let codeprp = %qsysfunc(tranwrd(&code.,?,%nrstr(&&item&i..)));
%unquote(&codeprp.)
%end;
%mend iterlist;
/* set the list of variables to iterate thru */
%let mylist = v1 v2 v3 v4;
/* create a contents table to look up format info to assign in macro below*/
proc contents data=a.recode1 noprint out=contents;
run;
/* macro to create freq and chisq tables for each var */
%macro runfreqs (variabl = );
proc freq data=a.recode1 noprint ;
tables &variabl.*improved /out=&variabl._1 chisq;
output out=&variabl.chisq n pchi ;
run;
/* do some more stuff with the freq tables, then grab format for variable from contents */
data _null_;
set contents;
if name="&variabl." then CALL SYMPUT("classformat", format);
run;
data &variabl._3;
length classvalue $ 30 ;
set &variabl._2; ;
/* output a new var using the macro variable for format that we pulled from contents above. Here's where the error occurs. */
classvalue=put(class, %quote(&classformat.));
run;
%mend runfreqs;
* run the macro, iterating thru var list and creating freq tables;
%ITERLIST(list = &mylist., code = %nrstr(%runfreqs(variabl = ?);));
Just guessing, the line
classvalue=put(class, %quote(&classformat.));
should be
classvalue=put(class, &classformat..);
Two points because one is "eaten" by macro processor to mark end of macro variable name, the second one is needed to complete format name.
I believe you won't need %quote() in your case - format name cannot contain strings quoted by %quote().
EDIT: Again not tried, just based on the code I see you also need to change CALL SYMPUT("classformat", format);
to CALL SYMPUTX("classformat", format);
CALL SYMPUTX() is advanced version of CALL SYMPUT(), it removes trailing blanks in macro variable value while the original version keeps blanks. Effectively this will be same as your solution, just simpler.
So the problem is indeed with extra blanks between format name and the period.
No idea why this works and vasja's idea wouldn't, but the problem was clearly with the period on the end of the format name (or perhaps some extra white space?). I changed the data step to add the period before the SYMPUT call:
data _null_;
set contents;
myformat=catt(format,'.');
if name="&variabl." then CALL SYMPUT("classformat", myformat);
run;