SAS Macro Rename Variables - macros

I have a macro I’m trying to write....
X1-X50
But I want to rename the variable to
A1 A2 A3 A4 A5 B1 B2 B3 and so on all the way to E5.
X1 - X5 will be A1- A5
X6 - X10 will be B1 - B5 and so on.
How do I rename the variables with a macro in SAS?

Make a data set with the old name and new name in two columns
data names;
letter=64;
counter=0;
do i=1 to 50;
counter=ifn(counter=5, 1, counter+1);
if mod(i, 5)=1 then
do;
letter+1;
letter_char=byte(letter);
end;
old=catt('X', put(i, 2. -l));
new=catt(letter_char, counter);
output;
end;
run;
Create a macro variable that has the old and new names in the form of old = new, for all variables, ie X1=A1 X2=A2 X6=B1...etc.
proc sql;
select catx(' = ', old, new) into :rename_list separated by " "
from names;
quit;
Apply the rename statement within PROC DATASETS - this changes the data but does not do a full pass of the data.
proc datasets lib=work;
modify dataSetName;
rename &rename_list.;
run;quit;

Related

Calling the column values when the column names are date macro variables

In SAS, I have a dataset which has 5 columns and 4 rows. The column names are date macro variables.
I want to subtract the values in one column from another. (Date of column 4 - date of column 3) doesn't work. This subtracts the date itself and not the values in those columns.
How do I call the values of the columns?
Please help.
Example-- There are five columns-12/1/2019,12/1/2020,12/1/2021,12/1/2022 12/1/2023 and four rows-A,B,C,D and some values are stored in them.
In the above table, I want to add a column which prints the difference between the values on all dates for all the rows (A,B,C,D).
Also sim_date= 12/1/20, f_starting=12/1/2019, f_1=12/1/2021, f_2=12/1/2022, f_3=12/1/2023. These dates are all macro variables.
But when I write the code as
data test;
set test;
format g0 g1 g2 g3 percent5.2 ;
g0 = (&sim_date - &f_starting)/&f_starting;
g1 = (&f_1 - &sim_date)/&sim_date ;
g2 = (&f_2 - &f_1)/&f_1 ;
g3 = (&f_3- &f_2)/&f_2 ;
run;
`````
This code subtracts the two dates instead of the values stored in the dates. How do I call the values?
Use the "varname"n syntax so that SAS knows you are referring to the variable instead of the value.
data test;
set test;
format g0 g1 g2 g3 percent5.2 ;
g0 = ("&sim_date"n - "&f_starting"n)/"&f_starting"n;
g1 = ("&f_1"n - "&sim_date"n)/"&sim_date"n ;
g2 = ("&f_2"n - "&f_1"n)/"&f_1"n ;
g3 = ("&f_3"n - "&f_2"n)/"&f_2"n ;
run;
You reference a variable by its name. So if your variables are named sim_date and f_starting then your code might be:
g0 = (sim_date - f_starting)/f_starting;
If your variables are actually using those non-standard names that start with digits or have slashes or other non standard characters in them then you need to use a name literal. That is a quoted string suffixed with the letter n. So if the variables are named 2022/01/01 and 2022/02/01 for the first days of the first two months of 2022 then your code needs to look like:
g0 = ("2022/02/01"n - "2022/01/01"n)/"2022/01/01"n;
So either set the macro variables to name literals.
%let sim_date="2022/02/01"n;
%let f_starting="2022/02/01"n;
and then your current code will work.
g0 = (&sim_date - &f_starting)/&f_starting;
Or leave your macro variables with strings that match the actual variable names and convert them to name literals when you use them in your code:
%let sim_date = 2022/02/01;
%let f_starting = 2022/01/01;
g0 = ("&sim_date"n - "&f_starting"n)/"&f_starting"n;

recode and add prefix to sas variables

Lets's say I have a bunch of variables named the same way and I'd like to recode them and add a prefix to each (the variables are all numeric).
In Stata I would do something like (let's say the variables start with eq)
foreach var of varlist eq* {
recode var (1/4=1) (else=0), pre(r_)
}
How can I do this in SAS? I'd like to use the %DO macros, but I'm not familiar with them (I want to avoid SQL). I'd appreciate if you could include comments explaining each step!
SAS syntax for this would be easier if your variables are named using numeric suffix. That is, if you had ten variables with names of eq1, eq2, .... , eq10, then you could just use variable lists to define both sets of variables.
There are a number of ways to translate your recode logic. If we assume you have clean variables then we can just use a boolean expression to generate a 0/1 result. So if 4 and 5 map to 1 and the rest map to 0 you could use x in (4,5) or x > 3 as the boolean expresson.
data want;
set have;
array old eq1-eq10 ;
array new r_eq1-r_eq10 ;
do i=1 to dim(old);
new(i) = old(i) in (4,5);
end;
run;
If you have missing values or other complications you might want to use IF/THEN logic or a SELECT statement or you could define a format you could use to convert the values.
If your list of names is more random then you might need to use some code generation, such as macro code, to generate the new variable names.
Here is one method that use the eq: variable list syntax in SAS that is similar to the syntax of your variable selection before. Use PROC TRANSPOSE on an empty (obs=0) version of your source dataset to get a dataset with the variable names that match your name pattern.
proc transpose data=have(obs=0) out=names;
var eq: ;
run;
Then generate two macro variables with the list of old and new names.
proc sql noprint ;
select _name_
, cats('r_',_name_)
into :old_list separated by ' '
, :new_list separated by ' '
from names
;
quit;
You can then use the two macro variables in your ARRAY statements.
array old &old_list ;
array new &new_list ;
You can do this with rename and a dash indicating which variables you want to rename. Note the following only renames the col variables, and not the other one:
data have;
col1=1;
col2=2;
col3=3;
col5=5;
other=99;
col12=12;
run;
%macro recoder(dsn = , varname = , prefix = );
/*select all variables that include the string "varname"*/
/*(you can change this if you want to be more specific on the conditions that need to be met to be renamed)*/
proc sql noprint;
select distinct name into: varnames
separated by " "
from dictionary.columns where memname = upcase("&dsn.") and index(name, "&varname.") > 0;
quit;
data want;
set have;
/*loop through that list of variables to recode*/
%do i = 1 %to %sysfunc(countw(&varnames.));
%let this_varname = %scan(&varnames., &i.);
/*create a new variable with desired prefix based on value of old variable*/
if &this_varname. in (1 2 3) then &prefix.&this_varname. = 0;
else if &this_varname. in (4 5) then &prefix.&this_varname. = 1;
%end;
run;
%mend recoder;
%recoder(dsn = have, varname = col, prefix = r_);
PROC TRANSPOSE will give you good flexibility with regards to the way your variables are named.
proc transpose data=have(obs=0) out=vars;
var col1-numeric-col12;
copy col1;
run;
proc transpose data=vars out=revars(drop=_:) prefix=RE_;
id _name_;
run;
data recode;
set have;
if 0 then set revars;
array c[*] col1-numeric-col12;
array r[*] re_:;
call missing(of r[*]);
do _n_ = 1 to dim(c);
if c[_n_] in(1 2 3) then r[_n_] = 0;
else if c[_n_] in(4 5) then r[_n_] = 1;
else r[_n_] = c[_n_];
end;
run;
proc print;
run;
It would be nearly trivial to write a macro to parse almost that exact syntax.
I wouldn't necessarily use this - I like both the transpose and the array methods better, both are more 'SASsy' (think 'pythonic' but for SAS) - but this is more or less exactly what you're doing above.
First set up a dataset:
data class;
set sashelp.class;
age_ly = age-1;
age_ny = age+1;
run;
Then the macro:
%macro do_count(data=, out=, prefix=, condition=, recode=, else=, var_start=);
%local dsid varcount varname rc; *declare local for safety;
%let dsid = %sysfunc(open(&data.,i)); *open the dataset;
%let varcount = %sysfunc(attrn(&dsid,nvars)); *get the count of variables to access;
data &out.; *now start the main data step;
set &data.; *set the original data set;
%do i = 1 %to &varcount; *iterate over the variables;
%let varname= %sysfunc(varname(&dsid.,&i.)); *determine the variable name;
%if %upcase(%substr(&varname.,1,%length(&var_start.))) = %upcase(&var_start.) %then %do; *if it matches your pattern then recode it;
&prefix.&varname. = ifn(&varname. &condition., &recode., &else.); *this uses IFN - only recodes numerics. More complicated code would work if this could be character.;
%end;
%end;
%let rc = %sysfunc(close(&dsid)); *clean up after yourself;
run;
%mend do_count;
%do_count(data=class, out=class_r, var_start=age, condition= > 14, recode=1, else=0, prefix=p_);
The expression (1/4=1) means values {1,2,3,4} should be recoded into
1.
Perhaps you do not need to make new variables at all? If have variables with values 1,2,3,4,5 and you want to treat them as if they have only two groups you could do it with a format.
First define your grouping using a format.
proc format ;
value newgrp 1-4='Group 1' 5='Group 2' ;
run;
Then you can just use a FORMAT statement in your analysis step to have SAS treat your five level variable as it if had only two levels.
proc freq ;
tables eq: ;
format eq: NEWGRP. ;
run;

How to deal with subsetting in SAS

I am very new to SAS and I am very eager to learn it. My question is about subsetting. I have 2 data sets; a and b namely consisting og two columns a and b respectively:
a b
3 4
5
6
data a;
set a;
run;
data b;
set b;
run;
data merged;
merge a b;
run;
proc print data=merged(firstobs= a[1] obs=a[1] keep= b);
run;
In this code I get invalid conversion type error and I could not figure out why I am getting this error because when I write like:
proc print data=merged(firstobs= 3 obs= 3 keep= b);
run;
I get the result as 6.
I know it seems very simple but I am stuck with this error. If you help me I would really appreciate. Thanks
You want to print the row from the dataset b whose number is the same as the value of a in row 1 of the dataset a.
You can't pass a value into a proc directly like that, but you can generate a macro variable from your dataset and pass it into the proc, e.g.
data _null_;
set a(obs = 1);
call symput('ROW_NUMBER',a);
run;
proc print data = b(keep = b obs = &ROW_NUMBER firstobs = &ROW_NUMBER);
run;

SAS Macro Proc Logistic put P-value in a dataset

I've googled lots papers on the subject but don't seem to find what I want. I'm a beginner at SAS Macro, hoping to get some help here.
Here is what I want:
I have a dataset with 1200 variables. I want a macro to run those 1199 variables as OUTCOME, and store the P-values of logistic regression in a dataset. Also the dependent variable "gender" is character, and so are the outcome variables. But I don't know how to put class statement in the macro. Here is an example of how I run it as a single procedure.
proc logistic data=Baseline_gender ;
class gender(ref="Male") / param=ref;
model N284(event='1')=gender ;
ods output ParameterEstimates=ok;
run;
My idea was to create ODS output and delete the unnecessary variables other than the P-value and merge them into one dataset according to the OUTCOME variable names in the model: e.g.
Variable P-value
A1 0.005
A2 0.018
.. ....
I tried to play with some proc macro but I just cant get it work!!!
I really need help on this, Thank you very much.
SRSwift might be onto something (don't know enough about his method to tell), but here's a way to do it using a macro.
First, count the number of variables in your dataset. Do this by selecting your table from the dictionary.columns table. This puts the number of variables into &sqlobs. Now read the variable names from the dictionary table into macro variables var1-var&sqlobs.
%macro logitall;
proc sql;
create table count as
select name from dictionary.columns
where upcase(libname) = 'WORK'
and upcase(memname) = 'BASELINE_GENDER'
and upcase(name) ne 'GENDER'
;
select name into :var1 - :var&sqlobs
from dictionary.columns
where upcase(libname) = 'WORK'
and upcase(memname) = 'BASELINE_GENDER'
and upcase(name) ne 'GENDER'
;
quit;
Then run proc logistic for each dependent variable, each time outputting a dataset named after dependent variable.;
%do I = 1 %to &sqlobs;
proc logistic data=Baseline_gender ;
class gender(ref="Male") / param=ref;
model &&var&I.(event='1')=gender ;
ods output ParameterEstimates=&&var&I.;
run;
%end;
Now put all the output datasets together, creating a new variable with the dataset name using indsname= in the set statement.
data allvars;
format indsname dsname varname $25.;
set
%do I = 1 %to &sqlobs;
&&var&I.
%end;
indsname=dsname;
varname=dsname;
keep varname ProbChiSq;
where variable ne 'Intercept';
run;
%mend logitall;
%logitall;
Here is a macro free approach. It restructures the data in advance and uses SAS's by grouping. The data is stored in a deep format where the all the outcome variable values are stored in one new variable.
Create some sample data:
data have;
input
outcome1
outcome2
outcome3
gender $;
datalines;
1 1 1 Male
0 1 1 Male
1 0 1 Female
0 1 0 Male
1 1 0 Female
0 0 0 Female
;
run;
Next transpose the data into a deep format using an array:
data trans;
set have;
/* Create an array of all the outcome variables */
array o{*} outcome:;
/* Loop over the outcome variables */
do i = 1 to dim(o);
/* Store the variable name for grouping */
_NAME_ = vname(o[i]);
/* Store the outcome value in the */
outcome = o[i];
output;
end;
keep _NAME_ outcome gender;
run;
proc sort data = trans;
by _NAME_;
run;
Reusing your logistic procedure but with an additional by statement:
proc logistic data = trans;
/* Use the grouping variable to select multiple analyses */
by _NAME_;
class gender(ref = "Male");
/* Use the new variable for the dependant variable */
model outcome = gender / noint;
ods output ParameterEstimates = ok;
run;
Here is another way to do it using macro. First define all the variables to be used as outcome in a global variable and then write the macro script.
%let var = var1 var2 var3 ..... var1199;
%macro log_regression;
%do i=1 %to %eval(%sysfunc(countc(&var., " "))+1);
%let outcome_var = %scan(&var, &i);
%put &outcome_var.;
proc logistic data = baseline_gender desc;
class gender (ref = "Male") / param = ref;
model &outcome_var. = gender;
ods output ParameterEstimates = ParEst_&outcome_var.;
run;
%if %sysfunc(exist(univar_result)) %then %do;
data univar_result;
set univar_result ParEst_&outcome_var.;
run;
%end;
%else %do;
data univar_result;
set ParEst_&outcome_var.;
run;
%end;
%end;
%mend;

Get out the value of a variable in each observation to a macro variable

I have a table called term_table containing the below columns
comp, term_type, term, score, rank
I go through every observation and at each obs, I want to store the value of variable rank to a macro variable called curr_r. The code I created below does not work
Data Work.term_table;
input Comp $
Term_type $
Term $
Score
Rank
;
datalines;
comp1 term_type1 A 1 1
comp2 term_type2 A 2 10
comp3 term_type3 A 3 20
comp4 term_type4 B 4 20
comp5 term_type5 B 5 40
comp6 term_type6 B 6 100
;
Run;
%local j;
DATA tmp;
SET term_table;
LENGTH freq 8;
BY &by_var term_type term;
RETAIN freq;
CALL SYMPUT('curr_r', rank);
IF first.term THEN DO;
%do j = 1 %to &curr_r;
do some thing
%end;
END;
RUN;
Could you help me to solve the problem
Thanks a lot
Hung
The call symput statement does create the macro var &curr_r with the value of rank, but it is not available until after the data step.
However, I don't think you need to create the macro var &curr_r. I don't think a macro is needed at all.
I think the below should work: (Untested)
DATA tmp;
SET term_table;
LENGTH freq 8;
BY &by_var term_type term;
RETAIN freq;
IF first.term THEN DO;
do j = 1 to rank;
<do some thing>
end;
END;
RUN;
If you needed to use the rank from a prior obs, use the LAG function.
Start=Lag(rank);
To store each value of RANK in a macro variable, the below will do that:
Proc Sql noprint;
select count(rank)
into :cnt
from term_table;
%Let cnt=&cnt;
select rank
into :curr_r1 - :curr_r&cnt
from term_table;
quit;