How does %If %else work in SAS Macro - macros

I have the following code:
%macro TEST();
%let prev=3;
%do i=1 %to 4;
%if &i>2 %then %do;
%put prev = 5;
%end;
%else;
%put prev = 0;
%end;
%end;
%mend;
Which, when executed, returns:
prev = 0
prev = 0
prev = 5
prev = 0
prev = 5
prev = 0
My question is - how does the if-else statements work in SAS Macro - why is the else statment always executed?

Your code doesn't run for me, it generates an error.
ERROR: There is no matching %DO statement for the %END. This statement will be ignored.
I believe you intended the following, which is close to the other solution but not quite. Rather than add a %do, move the %put statement.
%macro TEST();
%let prev=3;
%do i=1 %to 4;
%if &i>2 %then
%do;
%put prev = 5;
%end;
%else
%put prev = 0;
%end;
%mend;
%test;

Your code has an error in it. The %ELSE statement doesn't do anything because of a missing %DO. Therefore, the second %PUT statement is always executed.
It should read:
%macro TEST();
%let prev=3;
%do i=1 %to 4;
%if &i>2 %then %do;
%put prev = 5;
%end;
%else %do; /* <=== */
%put prev = 0;
%end;
%end;
%mend;

Related

create a macro only with macro variables creation (%let) in sas

Hi I was trying to create a macro that only contains macro variable creation but it failed. Here is an example:
%macro createvariable;
%let a = 5;
%let b = 6;
%mend createvariable;
%createvariable;
data test;
c = &a + &b;
run;
But it will work as:
%macro createvariable;
%let a = 5;
%let b = 6;
data test;
c = &a + &b;
run;
%mend createvariable;
%createvariable;
So I was wondering if SAS won't be able to create a macro with only macro variables creation in it? Or there is a way to solve this problem. Thanks.
Try
%macro createvariable;
%global a b;
%let a = 5;
%let b = 6;
%mend createvariable;
%createvariable;
data test;
c = &a + &b;
run;

How to recode variables in table 1 using info from table 2 (in SAS)

The overal goal is to stratify quantitative variables based on their percentile. I would like to break it up into 10 levels (e.g. 10th, 20th, ...100th percentile) and recode it as 1 if it falls into the 10th percentile, 2 if it falls into the 20th percentile, etc. This method needs to be applicable across any data set I plug in and I want this process to be as automated as possible. Below I have generated some test data:
data test (drop=i);
do i=1 to 1000;
a=round(uniform(1)*4,.01);
b=round(uniform(1)*10,.01);
c=round(uniform(1)*7.5,.01);
output;
end;
stop;
run;
The following macro is used to create a table of values that tells you the cut off for the 10 percentiles of each variable. I have added a picture of the example output below the code.
/*Recode variables based on quartiles from boxplot*/
%macro percentiles(var);
/* Count the number of values in the strinrecode */
%let count=%sysfunc(countw(&var));
/* Loop throurecodeh the total number of values */
%do i = 1 %to &count;
%let variables=%qscan(&var,&i,%str(,));
proc univariate data=test noprint;
var &variables;
output out=pcts pctlpts = 10 20 30 40 50 60 70 80 90 100
pctlpre = &variables;
run;
proc transpose data=pcts out=&variables (rename=(col1=&variables) drop=_NAME_ _LABEL_);
run;
%end;
data percentiles (drop=i);
do i=1 to 10;
recode=i;
percentile=i*10;
output;
end;
stop;
run;
data pcts;
merge percentiles %sysfunc(tranwrd(&var.,%str(,),%str( )));
run;
%mend;
%percentiles(%str(a,b,c));
output from above macro
The following code is how I am currently recoding my variables. I use the table generated in the above macro to fill in the cut off points for each percentile for each variable. As you can see, this is very tedious and will become prohibitive if I have a large number of variables to recode. Is there a better process for this or preferably a way I could automate this part?
data test;
set test;
if a <= .415 then recode_a = 1; else if a <= .785 then recode_a = 2; else if a <= 1.255 then recode_a = 3;
else if a <= 1.61 then recode_a = 4; else if a <= 2.03 then recode_a = 5; else if a <= 2.42 then recode_a = 6;
else if a <= 2.76 then recode_a = 7; else if a <= 3.18 then recode_a = 8; else if a <= 3.64 then recode_a = 9;
else if a <= 3.99 then recode_a = 10;
if b <= .845 then recode_b = 1; else if b <= 1.88 then recode_b = 2; else if b <= 2.86 then recode_b = 3;
else if b <= 4.005 then recode_b = 4; else if b <= 5.03 then recode_b = 5; else if b <= 6.07 then recode_b = 6;
else if b <= 6.995 then recode_b = 7; else if b <= 8.035 then recode_b = 8; else if b <= 9.16 then recode_b = 9;
else if b <= 10 then recode_b = 10;
if c <= .86 then recode_c = 1; else if c <= 1.58 then recode_c = 2; else if c <= 2.34 then recode_c = 3;
else if c <= 3.15 then recode_c = 4; else if c <= 3.85 then recode_c = 5; else if c <= 4.615 then recode_c = 6;
else if c <= 5.315 then recode_c = 7; else if c <= 5.96 then recode_c = 8; else if c <= 6.75 then recode_c = 9;
else if c <= 7.5 then recode_c = 10;
run;
proc print data=test (obs=5);
run;
sample of desired output
A different option - PROC RANK. You could probably make it more 'automated' but it's pretty straightforward. Using PROC RANK you could also specify different ways of dealing with ties. Note that it would go from 0 to 9 rather than 1 to 10 but that's trivial to change.
data test (drop=i);
do i=1 to 1000;
a=round(uniform(1)*4,.01);
b=round(uniform(1)*10,.01);
c=round(uniform(1)*7.5,.01);
output;
end;
stop;
run;
proc rank data=test out=want groups=10;
var a b c;
ranks rankA rankB rankC;
run;
The following should work for you dynamically with no hard-coding -- I edited to compact it into a single macro. Essentially it puts your desired variables into a list, creates a dataset using your output, and then uses the variable contents to put your data steps into long strings. These strings are then put into a macro variable and you can call it in your final data step. Again, no hard-coding involved.
%MACRO stratify(library=,input=,output=);
%local varlist varlist_space data_step_list;
** get vars into comma-separated list and space-separated list **;
proc sql noprint;
select NAME
into: varlist separated by ","
from dictionary.columns
where libname=upcase("&library.") and memname=upcase("&input.");
select NAME
into: varlist_space separated by " "
from dictionary.columns
where libname=upcase("&library.") and memname=upcase("&input.");
quit;
%percentiles(%bquote(&varlist.));
** put data into long format **;
proc transpose data = pcts out=pcts_long;
by recode percentile;
var &varlist_space.;
run;
** sort to get if-else order **;
proc sort data = pcts_long;
by _NAME_ percentile;
run;
** create your if-then strings using data itself **;
data str;
length STR $100;
set pcts_long;
bin = percentile/10;
by _NAME_;
if first._NAME_ then do;
STR = "if "||strip(_NAME_)||" <= "||strip(put(COL1,best.))||" then "||catx("_","recode",_NAME_)||" = "||strip(put(bin,best.))||";";
end;
else do;
STR = "else if "||strip(_NAME_)||" <= "||strip(put(COL1,best.))||" then "||catx("_","recode",_NAME_)||" = "||strip(put(bin,best.))||";";
end;
run;
** put strings into a list **;
proc sql noprint;
select STR
into: data_step_list separated by " "
from STR;
quit;
** call data step list in final data **;
data &output.; set &input.;
&data_step_list.;
run;
proc print data = &output.(obs=5);
run;
%MEND;
%stratify(library=work,input=test,output=final);
No need for all of that code generation. Just use an array. Basically load the percentiles from the dataset generated by PROC UNIVARIATE into an two dimensional array and then find the decile rank for your actual values.
%macro stratify(varlist,in=,out=,pcts=pcts);
%local nvars pctls droplist recodes ;
%let varlist=%sysfunc(compbl(&varlist));
%let nvars=%sysfunc(countw(&varlist));
%let pctls=pctl_%sysfunc(tranwrd(&varlist,%str( ),%str( pctl_)));
%let droplist=pctl_%sysfunc(tranwrd(&varlist,%str( ),%str(: pctl_))):;
%let recodes=recode_%sysfunc(tranwrd(&varlist,%str( ),%str( recode_)));
proc univariate data=&in noprint ;
var &varlist;
output out=&pcts pctlpre=&pctls
pctlpts = 10 20 30 40 50 60 70 80 90 100
;
run;
data want ;
if _n_=1 then set &pcts ;
array _pcts (10,&nvars) _numeric_;
set test;
array _in &varlist ;
array out &recodes ;
do i=1 to dim(_in);
do j=1 to 10 while(_in(i) > _pcts(j,i));
end;
out(i)=j;
end;
drop i j &droplist;
run;
%mend stratify;
So if I use your generated sample here is what the log looks like with the MPRINT option turned on.
1093 %stratify(a b c,in=test,out=want);
MPRINT(STRATIFY): proc univariate data=test noprint ;
MPRINT(STRATIFY): var a b c;
MPRINT(STRATIFY): output out=pcts pctlpre=pctl_a pctl_b pctl_c pctlpts = 10 20 30 40 50
60 70 80 90 100 ;
MPRINT(STRATIFY): run;
NOTE: The data set WORK.PCTS has 1 observations and 30 variables.
NOTE: PROCEDURE UNIVARIATE used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
MPRINT(STRATIFY): data want ;
MPRINT(STRATIFY): if _n_=1 then set pcts ;
MPRINT(STRATIFY): array _pcts (10,3) _numeric_;
MPRINT(STRATIFY): set test;
MPRINT(STRATIFY): array _in a b c ;
MPRINT(STRATIFY): array out recode_a recode_b recode_c ;
MPRINT(STRATIFY): do i=1 to dim(_in);
MPRINT(STRATIFY): do j=1 to 10 while(_in(i) > _pcts(j,i));
MPRINT(STRATIFY): end;
MPRINT(STRATIFY): out(i)=j;
MPRINT(STRATIFY): end;
MPRINT(STRATIFY): drop i j pctl_a: pctl_b: pctl_c:;
MPRINT(STRATIFY): run;
NOTE: There were 1 observations read from the data set WORK.PCTS.
NOTE: There were 1000 observations read from the data set WORK.TEST.
NOTE: The data set WORK.WANT has 1000 observations and 6 variables
And the first five observations are:

Huffman dictionary does not have the codes for all the input signals

I seem to be having a problem passing the string and dictionary through the huffmanenco function. I've tried almost everything, but the I keep getting the error that the Huffman dictionary does not have all the input codes. Yet I'm positive it does.
%% HUFFMAN TEST
clear all; close all; clc;
sig = ['a'; 'b'; 'c'; 'd'; 'e'; 'f'; 'g'; 'h'; 'i'; 'j';...
'k'; 'l'; 'm'; 'n'; 'o'; 'p'; 'q'; 'r'; 's'; 't';...
'u'; 'v'; 'w'; 'x'; 'y'; 'z'; ':'; ' '; ','; '.'];
% Get probability
char_count = zeros(30,1);
for i = 1:30
for c = sig(i)
char_count(i,1) = length(find(sig == c));
end
end
sym_prob = char_count / sum(char_count);
% Huffman Dictionary
% symbols = cellstr(symbols); % Still doesn't work in huffmandict, so try manually typing out again with curly braces
sig = {'a'; 'b'; 'c'; 'd'; 'e'; 'f'; 'g'; 'h'; 'i'; 'j';...
'k'; 'l'; 'm'; 'n'; 'o'; 'p'; 'q'; 'r'; 's'; 't';...
'u'; 'v'; 'w'; 'x'; 'y'; 'z'; ':'; ' '; ','; '.'};
[dict, aveLength] = huffmandict(sig, sym_prob);
% Process signal
str = 'A technique is developed to construct a representation of planar objects undergoing a general affine transformation. The representation can be used to describe planar or nearly planar objects in a three-dimensional space, observed by a camera under arbitrary orientations.';
str_int = bin2dec(dec2bin(str));
sig = cell(size(str));
for i = 1:length(str)
sig{i} = char(str_int(i));
end
% Encode & Decode
sig_enco = huffmanenco(sig, dict);
dsig = huffmandeco(sig_enco, dict);
You don't have all the characters present in your dictionary. You can easily check this using ismember on your dictionary symbols and your input signal. I get the following list of characters which are not present in your dictionary.
dictionary_symbols = { ...
'a'; 'b'; 'c'; 'd'; 'e'; 'f'; 'g'; 'h'; 'i'; 'j';...
'k'; 'l'; 'm'; 'n'; 'o'; 'p'; 'q'; 'r'; 's'; 't';...
'u'; 'v'; 'w'; 'x'; 'y'; 'z'; ':'; ' '; ','; '.'};
[isListed, ind] = ismember(sig, dictionary_symbols);
sig(~isListed)
'A' 'T' '-'
It may be easier (and possibly more robust) to use an ASCII code range to generate your dictionary so you can ensure that you capture all the basic characters you intend to catch.
dictionary_symbols = num2cell(char(' ':'~')).';
probabilities = ones(size(dictionary_symbols)) ./ numel(dictionary_symbols);
Addendum
I'm not completely sure what you're doing with this code chunk
% Process signal
str_int = bin2dec(dec2bin(str));
sig = cell(size(str));
for i = 1:length(str)
sig{i} = char(str_int(i));
end
If you want the numeric representation of your string, you can always cast it using the desired datatype.
uint8(str);
double(str);
Then if you want to split a string up so that it's a cell array where each element is a separate character, you can use num2cell.
cellArray = num2cell(str);

using retain to keep track of maximum value

I have a macro CVI that return a y for a given x. To simpliy it, assume
%macro CVI(Nt);
%local result;
%let result = %sysevalf(2*&Nt**2-&Nt);
&Result;
%mend;
This works as expected
%macro run;
data _null_;
%do i = 1 %to 5;
%let s = %CVI(&i);
%put &i &s;
%end;
run;
%mend;
But I tried to find the maximum in a given interval, say between 9 and 25.
I modified %run a bit but no luck.
%macro run2;
data _null_;
retain max;
%do i = 9 %to 25;
%if max < %CVI(&i) %then max = %CVI(&i);
%else max = max;
%end;
run;
%mend;
Did I miss anything inside macro?
try this:
%macro run2;
data a;
drop x;
max = %CVI(9);
%do i = 9 %to 25;
x = %CVI(&i);
if max < x then max = x;
%end;
run;
%mend;
also you should change the name of your macro %run to something else. run is a reserved word .
Is this what you are trying to generate? If so, then start creating a macro from here.
data _null_;
do i = 9 to 25;
if max < 2*i**2-i then max = 2*i**2-i;
end;
Put MAX=;
run;

Matching on multiple variables; SAS macro do loop when using _n_ as a variable

I would like to split my observations in a "parent" dataset into their own unique "child" datasets. I need to do this for several parent datasets, so I am trying to create a macro with a do loop inside to generate these datasets. But my code is not working (perhaps for multiple reasons).
Here is manual code as an example of what I want to automate (this code works fine, the "parent" dataset ta220092 has four observations in this case, but in other "parent" datasets it may be larger or smaller):
data ta2200921 ta2200922 ta2200923 ta2200924;
set ta220092;
if _n_ = 1 then output ta2200921;
if _n_ = 2 then output ta2200922;
if _n_ = 3 then output ta2200923;
if _n_ = 4 then output ta2200924;
run;
In trying to automate this. I thought I should use the automatic "n" variable to add to the dataset name and for the %to statement since the number of observations in each "parent" dataset varies, but I am not sure how to do it. I have created the following code, which has an issue which I am hoping someone can help with:
%macro treatmentsplit(j);
%do i = 1 %to &j.;
&j. = _n_;
data tatest220092&i.;
set ta220092 (where = (_n_ = &i.));
run;
%end;
%mend treatmentsplit;
%treatmentsplit;
Thank you.
Besides editing the above for some clarity, I need to edit my question to address why I don't believe this is a duplicate question as Joe tagged. His proposed duplicate question is What's the fastest way to partition a sas dataset for batch processing?
There are two reasons why I don't think this question is a duplicate. First, the underlying reason for wanting to split is different. For my problem, this is not an issue of trying to break up a large dataset for reasonable batch processing. I will address my underlying reason for wanting to split in the next paragraph. The second reason I don't consider this a duplicate is the code to resolve the "What's the fastest way to partition a SAS dataset for batch processing" does not work for my situation. The two code answers provided specify a number of datasets the parent dataset is to be split into. I do not know in advance the number of splits for each dataset I want to split, since the number of observations vary in each dataset. I tried to modify the second answer (by RWill) for my situation, and have been unsuccessful with that so far. Here is my best attempt to modify the second answer to my situation so far (have tried variants):
%macro nobs(dsn);
%local nobs dsid rc;
%let nobs=0;
%let dsid = %sysfunc(open(&dsn));
%if &dsid %then %do;
%let nobs = %sysfunc(attrn(&dsid,NOBS));
%end;
%else %put Open for dataset &dsn failed - %sysfunc(sysmsg());
%let rc = %sysfunc(close(&dsid));
%mend nobs;
%macro batch_process(dsn_in,dsn_out_prefix);
%let dsn_obs = %nobs(&dsn_in);
%let obs_per_dsn = 1;
data
%do i = 1 %to &dsn_obs;
&dsn_out_prefix.&i
%end; ;
set &dsn_in;
drop _count;
retain _count 0;
_count = _count + 1;
%do i = 1 %to &dsn_obs;
if (1 + ((&i - 1) * 1) <= _count <= (&i * 1) then do;
output &dsn_out_prefix.&i;
end;
%end;
run;
%mend batch_process;
%batch_process( dsn_in=tmp1.ta220092 , dsn_out_prefix = ta220092);
The error from the log seems to indicate that there is an issue with the DSN_OBS variable in the do loop (5th line down in the second macro):
SYMBOLGEN: Macro variable DSN_OBS resolves to
ERROR: %EVAL function has no expression to evaluate, or %IF statement has no condition.
ERROR: The %TO value of the %DO I loop is invalid.
To address my underlying reason for wanting to split my dataset to be one observation per dataset, I have modified a macro which almost works the way I need to, with one issue. The original macro I modified is for propensity score matching http://www.biostat.umn.edu/~will/6470stuff/Class25-12/PSmatching.sas. I modify it to address my dataset better (changing variable names), and I also added a method I call "CC" for calculated caliper, because I want to capture all controls which are within 10 or 20% of the matching variable of my treatment group (there will be a second matching variable which is selected by nearest neighbor, but I don't have an issue with the code for that step down the line). The issue is that in a treatment dataset (such as ta220092 above), there are two observations who have matching variables that have overlapping calculated caliper "zones"--one has assets that are 62, and one has 64. The macro has a replacement option; if I select "yes", then I get the treatment matched to the same control 100 times (not what I want, I want all controls within the calculated caliper). If I select "no" for the replacement option, then the macro almost works how I want, but the control observations that are a potential match two treatment observations that have overlapping calculated calipers are split between the two treatment observations, instead of being allowed to be within the caliper of each treatment. So the macro is not allowing replacement at the dataset level, when what I want is for it to not allow replacement at the observation level. Stated another way, I do want there to be replacement between observations, but am not sure how to modify the macro. I thought it would be easier (but granted much less elegant solution) to split each treatment observation into its own data set (I have less than 600 treatments). Here is the macro I have that is functioning, but not doing quite what I want it to. (Since I am new to Stack Overflow, you can kindly point out to me if this edit is TMI, if I should have opened another question, or just given all this information in the original question--I so much appreciate your help and would like to be as little of a burden as possible).
%macro Matching(datatreatment=, datacontrol=, method=, numberofcontrols=, caliper=, ccpercent=,
replacement=, out=);
/* Create copies of the treated units if N > 1 */;
data _Treatment0(drop= i);
set &datatreatment;
do i= 1 to &numberofcontrols;
RandomNumber= ranuni(12345);
output;
end;
run;
/* Randomly sort both datasets */
proc sort data= _Treatment0 out= _Treatment(drop= RandomNumber);
by RandomNumber;
run;
data _Control0;
set &datacontrol;
RandomNumber= ranuni(45678);
run;
proc sort data= _Control0 out= _Control(drop= RandomNumber);
by RandomNumber;
run;
data Matched (keep = cikSelectedControl atControl roacontrol roatreat fyear industry MatchedToTreatcik atTreat);
length atC 8;
length cikC 8;
/* Load Control dataset into the hash object */
if _N_= 1 then do;
declare hash h(dataset: "_Control", ordered: 'no');
declare hiter iter('h');
h.defineKey('cikC');
h.defineData('roac','atC','cikC');
h.defineDone();
call missing(cikC, atC, roac);
end;
/* Open the treatment */
set _Treatment;
%if %upcase(&method) ~= RADIUS %then %do;
retain BestDistance 99;
%end;
/* Iterate over the hash */
rc= iter.first();
if (rc=0) then BestDistance= 99;
do while (rc = 0);
/* Caliper */
%if %upcase(&method) = CALIPER %then %do;
if (atT - &caliper) <= atC <= (atT + &caliper) then do;
ScoreDistance = abs(atT - atC);
if ScoreDistance < BestDistance then do;
BestDistance = ScoreDistance;
cikSelectedControl = cikC;
atControl = atC;
MatchedToTreatcik = cikT;
atTreat = atT;
end;
end;
%end;
/* Calculated caliper */
%if %upcase(&method) = CC %then %do;
ccdist = &ccpercent*atT;
if (atT - ccdist) <= atC <= (atT + ccdist) then do;
ScoreDistance = abs(atT - atC);
if ScoreDistance < BestDistance then do;
BestDistance = ScoreDistance;
cikSelectedControl = cikC;
atControl = atC;
MatchedToTreatcik = cikT;
atTreat = atT;
ROAControl = roaC;
ROATreat=roat;
end;
end;
%end;
/* NN */
%if %upcase(&method) = NN %then %do;
ScoreDistance = abs(atT - atC);
if ScoreDistance < BestDistance then do;
BestDistance = ScoreDistance;
cikSelectedControl = cikC;
atControl = atC;
MatchedToTreatcik = cikT;
atTreat = atT;
end;
%end;
%if %upcase(&method) = NN or %upcase(&method) = CALIPER or %upcase(&method) = CC %then %do;
rc = iter.next();
/* Output the best control and remove it */
if (rc ~= 0) and BestDistance ~=99 then do;
output;
%if %upcase(&replacement) = NO %then %do;
rc1 = h.remove(key: cikSelectedControl);
%end;
end;
%end;
/* Radius */
%if %upcase(&method) = RADIUS %then %do;
if (atT - &caliper) <= atC <= (atT + &caliper) then do;
cikSelectedControl = cikC;
atControl = atC;
MatchedToTreatcik = cikT;
atTreat = atT;
output;
end;
rc = iter.next();
%end;
end;
run;
/*to download datasets from wrds to investigate*/
proc download data=matched; run;
proc download data=_Control; run;
/* Delete temporary tables. Quote for debugging */
proc datasets NOLIST; /*Nolist option should prevent printing of dataset list*/
delete _:(gennum=all);
run;
data &out;
set Matched;
run;
proc datasets NOLIST; /*Nolist option should prevent printing of dataset list*/
delete Matched;
%mend Matching;
%Matching(datatreatment= Ta220092, datacontrol= ca220092, method= cc,
numberofcontrols= 100, caliper=1, ccpercent=.2, replacement= no, out= matchtest4);
One other note is I will be running this match via PC SAS on the WRDS system, which is faster and won't freeze up my computer during processing.
I have improved my understanding of the macro and modified the macro to make it work. It turns out the calculated caliper was basically a nearest neighbor match with a radius restriction. So when I modified the macro to include a calculated radius, then the macro was able to match how I need it to (see above question). Below is the modified macro:
/************************************************
matching.sas adapted from
Paper 185-2007 SAS Global Forum 2007
Local and Global Optimal Propensity Score Matching
Marcelo Coca-Perraillon
Health Care Policy Department, Harvard Medical School, Boston, MA
-------------------------------
Treatment and Control observations must be in separate datasets such that
Control data includes: cikC = subject_cik, atC = total assets
Treatment data includes: cikT, atT = total assets
cik must be numeric
method = NN (nearest neighbor), caliper, or radius, or CC or RC -- CC/RC added by
MRL calcpercent= percentage to be applied to ccvariable or rcvariable to create
calculated caliper or calculated radius
caliper value = max for matching
replacement = yes/no whether controls can be matched to more than one case
out = output data set name
example call:
%Matching(datatreatment= T, datacontrol= C, method= RC,
numberofcontrols= 1, caliper=, calcpercent=.20, replacement= no, out= matches);
************************************************/
rsubmit;
%macro Matching(datatreatment=, datacontrol=, method=, numberofcontrols=, caliper=,
calcpercent=, replacement=, out=);
/* Create copies of the treated units if N > 1 */;
data _Treatment0(drop= i);
set &datatreatment;
do i= 1 to &numberofcontrols;
RandomNumber= ranuni(12345);
output;
end;
run;
/* Randomly sort both datasets */
proc sort data= _Treatment0 out= _Treatment(drop= RandomNumber);
by RandomNumber;
run;
data _Control0;
set &datacontrol;
RandomNumber= ranuni(45678);
run;
proc sort data= _Control0 out= _Control(drop= RandomNumber);
by RandomNumber;
run;
data Matched (keep = cikSelectedControl atControl roacontrol roatreat fyear industry MatchedToTreatcik atTreat);
length atC 8;
length cikC 8;
/* Load Control dataset into the hash object */
if _N_= 1 then do;
declare hash h(dataset: "_Control", ordered: 'no');
declare hiter iter('h');
h.defineKey('cikC');
h.defineData('roac','atC','cikC');
h.defineDone();
call missing(cikC, atC, roac);
end;
/* Open the treatment */
set _Treatment;
%if %upcase(&method) ~= RADIUS or %upcase(&method) ~= CR %then %do;
retain BestDistance 99;
%end;
/* Iterate over the hash */
rc= iter.first();
if (rc=0) then BestDistance= 99;
do while (rc = 0);
/* Caliper */
%if %upcase(&method) = CALIPER %then %do;
if (atT - &caliper) <= atC <= (atT + &caliper) then do;
ScoreDistance = abs(atT - atC);
if ScoreDistance < BestDistance then do;
BestDistance = ScoreDistance;
cikSelectedControl = cikC;
atControl = atC;
MatchedToTreatcik = cikT;
atTreat = atT;
end;
end;
%end;
/* Calculated caliper */
%if %upcase(&method) = CC %then %do;
ccdist = &calcpercent*atT;
if (atT - ccdist) <= atC <= (atT + ccdist) then do;
ScoreDistance = abs(atT - atC);
if ScoreDistance < BestDistance then do;
BestDistance = ScoreDistance;
cikSelectedControl = cikC;
atControl = atC;
MatchedToTreatcik = cikT;
atTreat = atT;
ROAControl = roaC;
ROATreat=roat;
end;
end;
%end;
/* NN */
%if %upcase(&method) = NN %then %do;
ScoreDistance = abs(atT - atC);
if ScoreDistance < BestDistance then do;
BestDistance = ScoreDistance;
cikSelectedControl = cikC;
atControl = atC;
MatchedToTreatcik = cikT;
atTreat = atT;
end;
%end;
%if %upcase(&method) = NN or %upcase(&method) = CALIPER or %upcase(&method) = CC %then %do;
rc = iter.next();
/* Output the best control and remove it */
if (rc ~= 0) and BestDistance ~=99 then do;
output;
%if %upcase(&replacement) = NO %then %do;
rc1 = h.remove(key: cikSelectedControl);
%end;
end;
%end;
/* Radius */
%if %upcase(&method) = RADIUS %then %do;
if (atT - &caliper) <= atC <= (atT + &caliper) then do;
cikSelectedControl = cikC;
atControl = atC;
MatchedToTreatcik = cikT;
atTreat = atT;
ROAControl = roaC;
ROATreat=roat;
output;
end;
rc = iter.next();
%end;
/* Calculated Radius */
%if %upcase(&method) = CR %then %do;
rcdist = &calcpercent*atT;
if (atT - rcdist) <= atC <= (atT + rcdist) then do;
cikSelectedControl = cikC;
atControl = atC;
MatchedToTreatcik = cikT;
atTreat = atT;
ROAControl = roaC;
ROATreat=roat;
output;
end;
rc = iter.next();
%end;
end;
run;
/*for when testing and using wrds
proc download data=matched; run;
proc download data=_Control; run;*/
/* Delete temporary tables. Quote for debugging */
proc datasets NOLIST; /*Nolist option should prevent printing of dataset list*/
delete _:(gennum=all);
run;
data &out;
set Matched;
run;
proc datasets NOLIST; /*Nolist option should prevent printing of dataset list*/
delete Matched;
%mend Matching;