SAS Error trying to loop through multiple datasets - date

I'm trying to run some code which will hopefully concatenate multiple months or years worth of data. I am trying to figure out when a field was populated with a value. I.e. there is field XYZ in my data set and it is populated with value A in November 2016. If I run my code from Jan - Dec I would like a new field populated with the date that SAS encounters a non-blank value in that field.
Here's my code:
options mprint symbolgen source mlogic merror syntaxcheck ;
%macro append_monthly(iStart_date=, iEnd_date=);
%local tmp_date i;
%let tmp_date = %sysfunc(intnx(month,&iStart_date,0,beginning)) ;
%do %while (&tmp_date le &iEnd_date);
%let i = %sysfunc(sum(&tmp_date),yymmn4.);
%put &i.;
%let tmp_date = %sysfunc(intnx(month,&tmp_date,1,beginning)) ;
libname note "my.qualifiers.fords.note&i." disp=shr;
data new ;
set note.file ;
%if ln_note_crbur_date_delinq ne '' %then spc_cmt_date = &i.;
run;
%end;
%mend;
%append_monthly(iStart_date=%sysfunc(mdy(5,1,2016)), iEnd_date=%sysfunc(mdy(10,1,2016)) );
LIBNAME _ALL_ CLEAR;
Here's a sample from log with errors :
SYMBOLGEN: Macro variable TMP_DATE resolves to 20606
SYMBOLGEN: Macro variable IEND_DATE resolves to 20728
MLOGIC(APPEND_MONTHLY): %DO %WHILE(&tmp_date le &iEnd_date) condition is TRUE; loop will iterate again.
MLOGIC(APPEND_MONTHLY): %LET (variable name is I)
SYMBOLGEN: Macro variable TMP_DATE resolves to 20606
MLOGIC(APPEND_MONTHLY): %PUT &i.
SYMBOLGEN: Macro variable I resolves to 1606
1606
MLOGIC(APPEND_MONTHLY): %LET (variable name is TMP_DATE)
SYMBOLGEN: Macro variable TMP_DATE resolves to 20606
MPRINT(APPEND_MONTHLY): spc_cmt_date = 1605 run;
SYMBOLGEN: Macro variable I resolves to 1606
MPRINT(APPEND_MONTHLY): libname note "my.qualifiers.fords.note1606" disp=shr;
ERROR: Unable to clear or re-assign the library NOTE because it is still in use.
ERROR: Error in the LIBNAME statement.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.NEW may be incomplete. When this step was stopped there were 0 observations and 622 variables.
WARNING: Data set WORK.NEW was not replaced because this step was stopped.
NOTE: The DATA statement used 0.01 CPU seconds and 49483K.
NOTE: The address space has used a maximum of 4292K below the line and 240388K above the line.
I can't figure out why this isn't working. Maybe this could work using Proc append.
Basically, I just want my output with a field that returns a date in the form of YYMM for when field ln_note_crbur_date_delinq was non-blank.
Any help would be greatly appreciated

I'd guess the reason for your error is that the handle is not being cleared on your source file before the next libname statement tries to re-assign.
An easy fix would be to use a different alias (libref) each time, as follows:
libname note&i "my.qualifiers.fords.note&i." disp=shr;
Then adjust your data step like so:
data new ;
set note&i..file ;
The next part appears to be confusion between macro logic and data step. Simply remove the % symbols as follows:
if ln_note_crbur_date_delinq ne '' then spc_cmt_date = &i.;
Finally, add a proc append before the %end as follows:
proc append base=work.final data=new; run;
If work.final does not exist, it will be created in the same format as new.
EDIT:
following discussion in comments, here is a revised approach:
%macro append_monthly(iStart_date=, iEnd_date=);
%local tmp_date i set_statement;
%let tmp_date = %sysfunc(intnx(month,&iStart_date,0,beginning)) ;
%do %while (&tmp_date le &iEnd_date);
%let i = %sysfunc(sum(&tmp_date),yymmn4.);
%let tmp_date = %sysfunc(intnx(month,&tmp_date,1,beginning)) ;
%let set_statement=&set_statement &i..file;
libname note&i "my.qualifiers.fords.note&i." disp=shr;
%end;
data new ;
set &set_statement;
if ln_note_crbur_date_delinq ne '' then spc_cmt_date = &i.;
run;
%mend;
%append_monthly(iStart_date=%sysfunc(mdy(5,1,2016)), iEnd_date=%sysfunc(mdy(10,1,2016)) );
LIBNAME _ALL_ CLEAR;

Related

How to pad a number with leading zero in a SAS Macro loop counter?

So I have a range of datasets in a specific library. These datasets are named in the format DATASET_YYYYMM, with one dataset for each month. I am trying to append a range of these datasets based on user input for the date range. i.e. If start_date is 01NOV2019 and the end_date is 31JAN2020, I want to append the three datasets: LIBRARY.DATASET_201911, LIBRARY.DATASET_201912 and LIBRARY.DATASET_202001.
The range is obviously variable, so I can't simply name the datasets manually in a set function. Since I need to loop through the years and months in the date range, I believe a macro is the best way to do this. I'm using a loop within the SET statement to append all the datasets. I have copied my example code below. It does work in theory. But in practice, only if we are looping over the months of November and December. As the format of the dataset name has a two digit month, for Jan-Sept it will be 01-09. The month function returns 1-9 however, and of course a 'File DATASET_NAME does not exist' error is thrown.
Problem is I cannot figure out a way to get it to interpret the month with leading 0, without ruining functionality of another part of the loop/macro.
I have tried numerous approaches to format the number as z2, cannot get any to work.
i.e. Including PUTN functions in the DO line for quote_month as follows, it ignores the leading zero when generating the dataset name in the line below.
%DO quote_month = %SYSFUNC(IFN(&quote_year. = &start_year.,%SYSFUNC(PUTN(&start_month.,z2.)),1,.)) %TO %SYSFUNC(IFN(&quote_year. = &end_year.,%SYSFUNC(PUTN(&end_month.,z2.)),12,.));
Below is example code (without any attempt to reformat it to z2) - it will throw an error because it cannot find 'dataset_20201' because it is actually called 'dataset_202001'. The dataset called dataset_combined_example produces the desired output of the code by manually referencing the dataset names which it will be unable to do in practice. Does anyone know how to go about this?
DATA _NULL_;
FORMAT start_date end_date DATE9.;
start_date = '01NOV2019'd;
end_date = '31JAN2020'd;
CALL symput('start_date',start_date);
CALL symput('end_date',end_date);
RUN;
DATA dataset_201911;
input name $;
datalines;
Nov1
Nov2
;
RUN;
DATA dataset_201912;
input name $;
datalines;
Dec1
Dec2
;
RUN;
DATA dataset_202001;
input name $;
datalines;
Jan1
Jan2
;
RUN;
DATA dataset_combined_example;
SET dataset_201911 dataset_201912 dataset_202001;
RUN;
%MACRO get_table(start_date, end_date);
%LET start_year = %SYSFUNC(year(&start_date.));
%LET end_year = %SYSFUNC(year(&end_date.));
%LET start_month = %SYSFUNC(month(&start_date.));
%LET end_month = %SYSFUNC(month(&end_date.));
DATA dataset_combined;
SET
%DO quote_year = &start_year. %TO &end_year.;
%DO quote_month = %SYSFUNC(IFN(&quote_year. = &start_year.,&start_month.,1,.)) %TO %SYSFUNC(IFN(&quote_year. = &end_year.,&end_month.,12,.));
dataset_&quote_year.&quote_month.
%END;
%END;
;
RUN;
%MEND;
%get_table(&start_date.,&end_date.);
You could do this using putn and z2. format.
%DO quote_year = &start_year. %TO &end_year.;
%DO quote_month = %SYSFUNC(IFN(&quote_year. = &start_year.,&start_month.,1,.)) %TO %SYSFUNC(IFN(&quote_year. = &end_year.,&end_month.,12,.));
dataset_&quote_year.%sysfunc(putn(&quote_month.,z2.))
%END;
%END;
You can also do this using the metadata tables without having to resort to macro loops in the first place:
/* A few datasets to combine */
data
DATASET_201910
DATASET_201911
DATASET_201912
DATASET_202001
;
run;
%let START_DATE = '01dec2019'd;
%let END_DATE = '31jan2020'd;
proc sql noprint;
select catx('.', libname, memname) into :DS_LIST separated by ' '
from dictionary.tables
where
&START_DATE <=
case
when prxmatch('/DATASET_\d{6}/', memname)
then input(scan(memname, -1, '_'), yymmn6.)
else -99999
end
<= &END_DATE
and libname = 'WORK'
;
quit;
data combined_datasets /view=combined_datasets;
set &DS_LIST;
run;
The case-when in the where clause ensures that any other datasets present in the same library that don't match the expected naming scheme are ignored.
One key difference with this approach is that you will never end up attempting to read a dataset that doesn't exist if one of the expected datasets in your range is missing.
You can use the Z format to generate strings with leading zeros.
But your problem is much easier if you use SAS date functions and formats to generate the YYYYMM strings. Just use a normal iterative %DO loop to cycle the month offset from zero to the number of months between the two dates.
%macro get_table(start_date, end_date);
%local offset dsname ;
data dataset_combined;
set
%do offset=0 %to %sysfunc(intck(month,&start_date,&end_date));
%let dsname=dataset_%sysfunc(intnx(month,&start_date,&offset),yymmn6);
&dsname.
%end;
;
run;
%mend get_table;
Result:
445 options mprint;
446 %get_table(start_date='01NOV2019'd,end_date='31JAN2020'd);
MPRINT(GET_TABLE): data dataset_combined;
MPRINT(GET_TABLE): set dataset_201911 dataset_201912 dataset_202001 ;
MPRINT(GET_TABLE): run;
In a macro
Use INTNX to compute the bounds for a loop over date values. Within the loop:
Compute the candidate data set name according to specified lib, prefix and desired date value format. <yyyy><mm> is output by format yymmn6.
Use EXIST to check candidate data sets for existence.
Alternatively, do not check, but make sure to set OPTIONS NODSNFERR prior to combining. The setting will prevent errors when specifying a non-existent data set.
Update the loop index to the end of the month so the next increment takes the index to the start of the next month.
%macro names_by_month(lib=work, prefix=data_, start_date=today(), end_date=today(), format=yymmn6.);
%local index name;
%* loop over first-of-the-month date values;
%do index = %sysfunc(intnx(month, &start_date, 0)) %to %sysfunc(intnx(month, &end_date, 0));
%* compute month dependent name;
%let name = &lib..&prefix.%sysfunc(putn(&index,&format));
%* emit name if it exists;
%if %sysfunc(exist(&name)) or %sysfunc(exist(&name,VIEW)) %then %str(&name);
%* prepare index for loop +1 increment so it goes to start of next month;
%let index = %sysfunc(intnx(month, &index, 0, E));
%end;
%mend;
* example usage:
data combined_imports(label="nov2019 to jan2020");
set
%names_by_month(
prefix=import_,
start_date='01NOV2019'd,
end_date = '31JAN2020'd
)
;
run;

Multiple values for a sas macro parameter

I'm new to SAS Macro programming and need to enable the following macro to be able to handle and process multiple values for its macro parameters.Hello,
data have;
input name $ ACCOUNT_ID $ cust_id;
cards;
ARTHUR CC1234 1234
TOM eil1235 1235
MIKEZ tb1236 1236
MATT mb1237 1237
LIZ TB1238 1238
PIZ VB1239 1239
TAN MB1240 1240
PANDA . 1241
;
run;
%MACRO algo (IN_DS=,VAR_LIST=,DATA_TYPE_LIST=,OUT_DS=);
DATA &OUT_DS;
SET &IN_DS;
%If &data_type_LIST = num %then
&var_LIST=sum(&VAR_LIST,2);
%else &var_LIST=cats(&var_LIST,'re');;
run;
%mend;
%algo(IN_DS=HAVE,VAR_LIST=CUST_ID,DATA_TYPE_LIST=num,OUT_DS=out1);`
I now need to enable this macro to be able to pass multiple values for the macro parameters. Something like this :
%algo(IN_DS=HAVE,VAR_LIST='CUST_ID,ACCT_ID',DATA_TYPE_LIST='num,char',OUT_DS=out1);
Can someone help me enable this functionality in the macro code.
The parameter argument should be macro quoted with %STR() in the macro invocation.
Try
%algo
( IN_DS=HAVE
, VAR_LIST= %STR (CUST_ID, ACCT_ID)
, DATA_TYPE_LIST=num
, OUT_DS=out1
);
Macro quoting is different than DATA step quoting used for character literals.
Make sure the macro can handle multiple values. In general it is not a good idea to use comma as the delimiter in your list of values when calling a macro.
Usually space is the best delimiter since then you can use the macro value directly in the generated code. For example if your variables are all of the same type you can just use data step ARRAY.
%MACRO algo (IN_DS=,VAR_LIST=,DATA_TYPE_LIST=,OUT_DS=);
DATA &OUT_DS;
SET &IN_DS;
array list &var_list ;
do _n_=1 to dim(list);
%if &data_type_LIST = num %then %do ;
list(_n_)=sum(list(_n_),2);
%end;
%else %do;
list(_n_)=cats(list(_n_),'re');
%end;
end;
run;
%mend algo;
If your variables are NOT all of the same type then you need to generate a separate statement for each variable. In that case you can use a different delimiter if you want, like a pipe character, that is easier to use as delimiter in calls to macro functions like %scan().
%MACRO algo (IN_DS=,VAR_LIST=,DATA_TYPE_LIST=,OUT_DS=);
%local i var;
DATA &OUT_DS;
SET &IN_DS;
%do i=1 %to %sysfunc(countw(&var_list,|));
%let var=%scan(&var_list,&i,|);
%if %scan(&data_type_LIST,&i,|) = num %then %do ;
&var=sum(&var,2);
%end;
%else %do;
&var=cats(&var,'re');
%end;
%end;
run;
%mend algo;
%algo(IN_DS=HAVE,VAR_LIST=CUST_ID|ACCT_ID,DATA_TYPE_LIST=num|char,OUT_DS=out1);
If you want to pass a list of variables and then use that list in the code you posted above, my suggestion would be to treat the &var_list as a list, and use scan to determine how many variables there are, and then loop through the list and execute the code accordingly.

SAS - Do looping from condition1 to condition2

I am looking to have a programme which cleans up some messy data I have, I am looking to do this for both the assets and liabilities side of the project i'm working on.
My question is there a way to use a do loop to use the cleaning up data to first clean up the assets then liabilities. something like this:
%do %I = Asset %to Liability;
%assetorliability= I ;
proc sort data = &assetorliability;
by price;
run;
data want&assetorliability;
set &assetorliability;
if _N_ < 50000;
run;
the actual script is quite long so a singular macro may not be the ideal solution but this loop would be great.
TIA.
EDIT : the programme includes some macros and the errors received are as follows:
%let list =Asset Liability;
%do i=1 %to %sysfunc(countw(&list,%str( )));
%let next=%scan(&list,&i,%str( ));
%Balance;
%end;
in the macro the data steps are named with a balance&list to allow for each scenario. the errors are:
13221 %let list =Asset Liability;
13222 %do i=1 %to %sysfunc(countw(&list,%str( )));
ERROR: The %DO statement is not valid in open code.
13223
13224 %let next=%scan(&list,&i,%str( ));
WARNING: Apparent symbolic reference I not resolved.
WARNING: Apparent symbolic reference I not resolved.
ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric
operand is required. The condition was: &i
ERROR: Argument 2 to macro function %SCAN is not a number.
ERROR: The %END statement is not valid in open code.
The macro %do statement is not as flexible as the data step do statement. To loop over a list of values you would want to put the list into a macro variable and use an index variable in your %do loop.
Note that macro logic needs to be inside of a macro. You cannot use it in "open" code.
%macro do_over(list);
%local i next;
%do i=1 %to %sysfunc(countw(&list,%str( )));
%let next=%scan(&list,&i,%str( ));
proc sort data = &next ;
by price;
run;
data want&next ;
...
%end;
%mend do_over ;
%do_over(Asset Liability)

SAS Macro code error with length statement

I am trying to change the length of the variables based on a list that I have and the code seems to work but the desired output is not achieved. here is the code:
%macro LEN();
Proc sql ;
select count(name) into: varnum from variab;
select name into: varname1-:varname%trim(%left(&varnum)) from Variab;
select length3 into: len from Length;
Quit;
%do i=1 %to &varnum;
data Zero;
length &&varname&i $ &&len&i.;
set desti.test;
length _numeric_ 4.;
format _numeric_ 12.2;
run;
%end;
%mend;
It gives a warning
WARNING: Multiple lengths were specified for the variable fscadl1 by
input data set(s). This can cause truncation
of data.
and it doesnt change the length of the variable. what is wrong in this code?
Are you trying to change a list of variables in one dataset? You're repeating the entire data step for each iteration, but only writing to a constant destination, which is inconsistent.
Probably what you want is:
Proc sql ;
select count(name) into: varnum from variab;
select name into: varname1-:varname%trim(%left(&varnum)) from Variab;
select length3 into: len from Length;
Quit;
%macro set_len(varnum=);
%do i=1 %to &varnum;
length &&varname&i $ &&len&i.;
%end;
%mend;
data Zero;
%set_len(&varnum);
set desti.test;
length _numeric_ 4.;
format _numeric_ 12.2;
run;
Note that you'd need to define &&len&i as you're not doing that currently.
The warning messages suggests that it is working. SAS started throwing that warning when you truncate a variable. You can suppress the warning message with the VARLENCHK option.
Below works:
options varlenchk=nowarn;
data want;
length name $ 3;
set sashelp.class;
length _numeric_ 4;
run;
If your code isn't working, I would turn on MPRINT to see make sure your macro is generating the SAS code you expect.

sas macros for incrementing date

My codes are:
libname " Cp/mydata"
options ;
%let yyyymmdd=20050210;
%let offset=0;
%let startrange=0;
%let endrange=0;
/* MACRO FOR INCREMENTING THE DATE */
%macro base(yyyymmdd=, offset=);
%local date x ds; /* declare macro variables with local scope */
%let date=%sysfunc(mdy(%substr(&yyyymmdd,5,2)
,%substr(&yyyymmdd,7,2)
,%substr(&yyyymmdd,1,4))); /* convert yyyymmdd to SAS date */
%let loopout=100;/* hardcoded - number of times to check whether ds exists */
%do x=&offset %to &loopout; /* begin loop */
/* convert &date to yyyymmdd format */
%let ds=AQ.CO_%sysfunc(intnx(day,&date,&offset),yymmddn8.);
%if %sysfunc(exist( &ds )) %then %do;
%put &ds exists!;
&ds /* write out the dataset, if it exists */
%let x=&loopout; /* exit loop */
%end;
%else %do;
%put &ds does not exist - checking subsequent day;
%let date=&date+1;
%end;
%end;
%mend;
%macro loop(yyyymmdd=, startrange=, endrange=);
%local date x ds;
%let date=%sysfunc(mdy(%substr(&yyyymmdd,5,2)
,%substr(&yyyymmdd,7,2)
,%substr(&yyyymmdd,1,4)));
data x;
set set %base(yyyymmdd=&yyyymmdd, offset=0)
/* loop through each specific dataset, checking first whether it exists.. */
%do x=&startrange %to &endrange;
%let ds=AQ.CO_%sysfunc(intnx(day,&date,&x),yymmddn8.);
%if %sysfunc(exist( &ds )) %then %do;
&ds
%end;
%end;
;
run;
%mend;
This was the error generated when I tried to run this macro.
data temp;
58 set %loop(yyyymmdd=&yyyymmdd, startrange=&startrange,
58 ! endrange=&endrange);
ERROR: File WORK.DATA.DATA does not exist.
ERROR: File WORK.X.DATA does not exist.
AQ.CO_20050210 does not exist - checking subsequent day
AQ.CO_20050211 does not exist - checking subsequent day
AQ.CO_20050212 exists!
NOTE: The system stopped processing this step because of errors.
I want help on two things:
1) Here, I'm trying to increment my date by 1 or 2 or so on if that date is not there in my original dataset. Please help to make this macro work fine.
2)I would like to have another column ie work.date in my data that will have 0 or 1(1 if the specified date yyyymmdd exist in our original data and 0 if I'm incrementing). Please make the specified changes in my macro.
Thanks in advance!!
I wasn't quite sure exactly what your %base() macro was trying to achieve but there were a couple of things I noticed.
First try turning on option mprint; to help with debugging. If you still need more debugging info you can also try turning on the following options (I'd suggest 1 at a time until you know which ones you need):
option symbolgen macrogen mlogic;
Secondly, you have set set instead of just set in your example code. I don't think that is helping any =).
When I tried the code quickly on my machine I noticed I was getting a strange error (different from yours) when I called the %base() macro. It seemed like an error that shouldn't be occurring so I wrapped the call in an %unquote() function just to make sure and I started to receive the error your post mentioned. You may want to try this as well:
set %unquote(%base(yyyymmdd=&yyyymmdd, offset=0))
Normally the %unquote() function isn't required unless you are explicitly using macro quoting functions and getting strange errors, but SAS macros sometimes seem to have a mind of their own. I only ever add this when I know it is required.
Also, your libname call is missing a semicolon at the end of the line.
Finally, some advice on working with dates in the SAS macro language. Don't keep converting between the date value, and the formatted value. It will make your code bigger, more error prone and more difficult to read. I know because I used to do it that way too. Try instead to always work with variables that contain the actual date value (by using the result from %sysfunc(mdy()) ) and then if you need a formatted value then create a new variable (eg. %let yyyymmdd = %sysfunc(putn(&mydate),yymmddn8.);. When you pass values from one macro to another, don't pass the formatted values even if it seems easier, pass the actual values.
Making the above changes removed all errors on my machine. Final code:
libname " Cp/mydata";
%let yyyymmdd=20050210;
%let offset=0;
%let startrange=0;
%let endrange=0;
/* MACRO FOR INCREMENTING THE DATE */
%macro base(yyyymmdd=, offset=);
%local date x ds; /* declare macro variables with local scope */
%let date=%sysfunc(mdy(%substr(&yyyymmdd,5,2)
,%substr(&yyyymmdd,7,2)
,%substr(&yyyymmdd,1,4))); /* convert yyyymmdd to SAS date */
%let loopout=100;/* hardcoded - number of times to check whether ds exists */
%do x=&offset %to &loopout; /* begin loop */
/* convert &date to yyyymmdd format */
%let ds=AQ.CO_%sysfunc(intnx(day,&date,&offset),yymmddn8.);
%if %sysfunc(exist( &ds )) %then %do;
%put &ds exists!;
&ds /* write out the dataset, if it exists */
%let x=&loopout; /* exit loop */
%end;
%else %do;
%put &ds does not exist - checking subsequent day;
%let date=&date+1;
%end;
%end;
%mend;
%macro loop(yyyymmdd=, startrange=, endrange=);
%local date x ds;
%let date=%sysfunc(mdy(%substr(&yyyymmdd,5,2)
,%substr(&yyyymmdd,7,2)
,%substr(&yyyymmdd,1,4)));
data x;
set %unquote( %base(yyyymmdd=&yyyymmdd, offset=0))
/* loop through each specific dataset, checking first whether it exists.. */
%do x=&startrange %to &endrange;
%let ds=AQ.CO_%sysfunc(intnx(day,&date,&x),yymmddn8.);
%if %sysfunc(exist( &ds )) %then %do;
&ds
%end;
%end;
;
run;
%mend;
%loop(yyyymmdd=&yyyymmdd, startrange=&startrange, endrange=&endrange);
Seems to me that your solution is quite complex.
But i believe that at least one issue is the variable x in our second macro (%loop): i do not see where you define it.
You can probably do all of this much easier, IF you do not need to limit the loopout. If you just want all datasets beyond the offset, you can simplify all this by making use of the SASHELP library to find the datasets you need. And then just loop over that result.
DEPRECATED REPLY BELOW, misread the need
You are partially reinventing the wheel, have a deeper look at the intnx and intck functions.
http://support.sas.com/documentation/cdl/en/etsug/60372/HTML/default/viewer.htm#etsug_tsdata_sect038.htm
https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212700.htm