How to deal with subsetting in SAS - merge

I am very new to SAS and I am very eager to learn it. My question is about subsetting. I have 2 data sets; a and b namely consisting og two columns a and b respectively:
a b
3 4
5
6
data a;
set a;
run;
data b;
set b;
run;
data merged;
merge a b;
run;
proc print data=merged(firstobs= a[1] obs=a[1] keep= b);
run;
In this code I get invalid conversion type error and I could not figure out why I am getting this error because when I write like:
proc print data=merged(firstobs= 3 obs= 3 keep= b);
run;
I get the result as 6.
I know it seems very simple but I am stuck with this error. If you help me I would really appreciate. Thanks

You want to print the row from the dataset b whose number is the same as the value of a in row 1 of the dataset a.
You can't pass a value into a proc directly like that, but you can generate a macro variable from your dataset and pass it into the proc, e.g.
data _null_;
set a(obs = 1);
call symput('ROW_NUMBER',a);
run;
proc print data = b(keep = b obs = &ROW_NUMBER firstobs = &ROW_NUMBER);
run;

Related

Merging datasets only if they exist

So I'm trying to create a macro in sas and I'm attempting to merge multiple data sets in one data step. This macro also creates a variety of different data sets dynamically so I have no idea what data sets are going to be created and which ones aren't going to. I'm trying to merge four data sets in one data step and I'm trying to only merge the ones that exist and don't merge the ones that don't.
Haven't really tried anything but what I'm trying to do kind of be seen below.
DATA Something;
MERGE Something SomethingElse AnotherThing EXIST(YetAnotherThing)*YetAnotherThing;
RUN;
Well obviously that doesn't work because SAS doesn't work like that but I'm trying to do something like that where YetAnotherThing is one of the data sets that I am testing to see whether or not it exists and to merge it to Something if it does.
If you have a systematic naming convention this can be simplified. For example if you have a common prefix it becomes:
data want;
merge prefix: ;
run;
If they're all in the same library it's also easy. But otherwise you're stuck checking every single name as above.
Something along these lines:
data test1;
do i = 1 to 10;
val1 = i;
output;
end;
run;
data test2;
do i = 1 to 10;
val2 = i*2;
output;
end;
run;
data test3;
do i = 1 to 10;
val3 = i*3;
output;
end;
run;
data test5;
do i = 1 to 10;
val5 = i*4;
output;
end;
run;
%macro multi_merge(varlist);
%local j;
data test_merge;
set %scan(&varlist,1);
run;
%put %sysfunc(countw(&varlist));
%if %sysfunc(countw(&varlist)) > 1 %then %do;
%do j = 2 %to %sysfunc(countw(&varlist));
%if %sysfunc(exist(%scan(&varlist,&j))) %then %do;
data test_merge;
merge test_merge %scan(&varlist,&j);
by i;
run;
%end;
%end;
%end;
%mend;
%multi_merge(test1 test2 test3 test4 test5);
Test4 does not exist.
Same thing with no loop:
if you don't want to loop, you can do this:
%macro if_exists(varlist);
%if %sysfunc(exist(%scan(&varlist,1))) %then %scan(&varlist,1);
%mend;
data test_merge2;
merge test1
%if_exists(test2)
%if_exists(test3)
%if_exists(test4)
%if_exists(test5)
%if_exists(test6);
by i;
run;
I can think of two options:
Loop through the list of input datasets, check if each exists, then merge only those that do.
At the start of your macro, before you conditionally create each of the potential input datasets, create a dummy dataset with the same name containing no rows or columns. Then when you attempt to merge them, they will always exist, without messing up the output with lots of missing values.
Sample code for creating an empty dataset:
data want;
stop;
run;

Can I do hash merge by multiple keys in SAS

I would like to hash merge in SAS using two keys;
The variable names for the lookup dataset called link_id 8. and ref_date 8.;
The variable names for the merged dataset called link_id 8. and drug_date 8.;
The code I used is as following:
data elig_bene_pres;
length link_id ref_date 8.;
call missing(link_id,ref_date):
if _N_=1 then do;
declare hash elig_bene(dataset:"bene.elig_bene_uid");
elig_bene.defineKey("link_id","ref_date");
elig_bene.defineDone();
end;
set data;
if elig_bene.find(key:Link_ID,key:drug_dt)=0 then output;
run;
But it seems that it is not found by these two keys. I just want to know whether my method is doable.
Thanks!
There are no obvious problems with the code.
To troubleshoot, try merge-sort: PROC SORT both data sets, then merge them by the two key variables. This will show which values look similar but are not exactly the same.
This sample shows you have the correct approach.
data elig;
input lukey1 lukey2;
datalines;
1 1
1 2
2 4
3 6
3 7
run;
data all;
do key1 = 1 to 10; do key2 = 1 to 10;
array x(5) (1:5);
output;
end; end;
run;
data all_elig;
length lukey1 lukey2 8;
call missing (lukey1,lukey2);
if _n_ = 1 then do;
declare hash elig (dataset:"elig");
elig.defineKey ('lukey1','lukey2');
elig.defineDone ();
end;
set all;
if 0 = elig.find(key:key1, key:key2);
run;
The process as shown is not really a merge because the lookup hash has no explicit data elements. The keys are implicit data when no data is specified.
If you are selecting all data rows, the first item to troubleshoot is the bene.elig_bene_uid. Are it's keys accidentally a superset of data's ?

recode and add prefix to sas variables

Lets's say I have a bunch of variables named the same way and I'd like to recode them and add a prefix to each (the variables are all numeric).
In Stata I would do something like (let's say the variables start with eq)
foreach var of varlist eq* {
recode var (1/4=1) (else=0), pre(r_)
}
How can I do this in SAS? I'd like to use the %DO macros, but I'm not familiar with them (I want to avoid SQL). I'd appreciate if you could include comments explaining each step!
SAS syntax for this would be easier if your variables are named using numeric suffix. That is, if you had ten variables with names of eq1, eq2, .... , eq10, then you could just use variable lists to define both sets of variables.
There are a number of ways to translate your recode logic. If we assume you have clean variables then we can just use a boolean expression to generate a 0/1 result. So if 4 and 5 map to 1 and the rest map to 0 you could use x in (4,5) or x > 3 as the boolean expresson.
data want;
set have;
array old eq1-eq10 ;
array new r_eq1-r_eq10 ;
do i=1 to dim(old);
new(i) = old(i) in (4,5);
end;
run;
If you have missing values or other complications you might want to use IF/THEN logic or a SELECT statement or you could define a format you could use to convert the values.
If your list of names is more random then you might need to use some code generation, such as macro code, to generate the new variable names.
Here is one method that use the eq: variable list syntax in SAS that is similar to the syntax of your variable selection before. Use PROC TRANSPOSE on an empty (obs=0) version of your source dataset to get a dataset with the variable names that match your name pattern.
proc transpose data=have(obs=0) out=names;
var eq: ;
run;
Then generate two macro variables with the list of old and new names.
proc sql noprint ;
select _name_
, cats('r_',_name_)
into :old_list separated by ' '
, :new_list separated by ' '
from names
;
quit;
You can then use the two macro variables in your ARRAY statements.
array old &old_list ;
array new &new_list ;
You can do this with rename and a dash indicating which variables you want to rename. Note the following only renames the col variables, and not the other one:
data have;
col1=1;
col2=2;
col3=3;
col5=5;
other=99;
col12=12;
run;
%macro recoder(dsn = , varname = , prefix = );
/*select all variables that include the string "varname"*/
/*(you can change this if you want to be more specific on the conditions that need to be met to be renamed)*/
proc sql noprint;
select distinct name into: varnames
separated by " "
from dictionary.columns where memname = upcase("&dsn.") and index(name, "&varname.") > 0;
quit;
data want;
set have;
/*loop through that list of variables to recode*/
%do i = 1 %to %sysfunc(countw(&varnames.));
%let this_varname = %scan(&varnames., &i.);
/*create a new variable with desired prefix based on value of old variable*/
if &this_varname. in (1 2 3) then &prefix.&this_varname. = 0;
else if &this_varname. in (4 5) then &prefix.&this_varname. = 1;
%end;
run;
%mend recoder;
%recoder(dsn = have, varname = col, prefix = r_);
PROC TRANSPOSE will give you good flexibility with regards to the way your variables are named.
proc transpose data=have(obs=0) out=vars;
var col1-numeric-col12;
copy col1;
run;
proc transpose data=vars out=revars(drop=_:) prefix=RE_;
id _name_;
run;
data recode;
set have;
if 0 then set revars;
array c[*] col1-numeric-col12;
array r[*] re_:;
call missing(of r[*]);
do _n_ = 1 to dim(c);
if c[_n_] in(1 2 3) then r[_n_] = 0;
else if c[_n_] in(4 5) then r[_n_] = 1;
else r[_n_] = c[_n_];
end;
run;
proc print;
run;
It would be nearly trivial to write a macro to parse almost that exact syntax.
I wouldn't necessarily use this - I like both the transpose and the array methods better, both are more 'SASsy' (think 'pythonic' but for SAS) - but this is more or less exactly what you're doing above.
First set up a dataset:
data class;
set sashelp.class;
age_ly = age-1;
age_ny = age+1;
run;
Then the macro:
%macro do_count(data=, out=, prefix=, condition=, recode=, else=, var_start=);
%local dsid varcount varname rc; *declare local for safety;
%let dsid = %sysfunc(open(&data.,i)); *open the dataset;
%let varcount = %sysfunc(attrn(&dsid,nvars)); *get the count of variables to access;
data &out.; *now start the main data step;
set &data.; *set the original data set;
%do i = 1 %to &varcount; *iterate over the variables;
%let varname= %sysfunc(varname(&dsid.,&i.)); *determine the variable name;
%if %upcase(%substr(&varname.,1,%length(&var_start.))) = %upcase(&var_start.) %then %do; *if it matches your pattern then recode it;
&prefix.&varname. = ifn(&varname. &condition., &recode., &else.); *this uses IFN - only recodes numerics. More complicated code would work if this could be character.;
%end;
%end;
%let rc = %sysfunc(close(&dsid)); *clean up after yourself;
run;
%mend do_count;
%do_count(data=class, out=class_r, var_start=age, condition= > 14, recode=1, else=0, prefix=p_);
The expression (1/4=1) means values {1,2,3,4} should be recoded into
1.
Perhaps you do not need to make new variables at all? If have variables with values 1,2,3,4,5 and you want to treat them as if they have only two groups you could do it with a format.
First define your grouping using a format.
proc format ;
value newgrp 1-4='Group 1' 5='Group 2' ;
run;
Then you can just use a FORMAT statement in your analysis step to have SAS treat your five level variable as it if had only two levels.
proc freq ;
tables eq: ;
format eq: NEWGRP. ;
run;

SAS Macro Proc Logistic put P-value in a dataset

I've googled lots papers on the subject but don't seem to find what I want. I'm a beginner at SAS Macro, hoping to get some help here.
Here is what I want:
I have a dataset with 1200 variables. I want a macro to run those 1199 variables as OUTCOME, and store the P-values of logistic regression in a dataset. Also the dependent variable "gender" is character, and so are the outcome variables. But I don't know how to put class statement in the macro. Here is an example of how I run it as a single procedure.
proc logistic data=Baseline_gender ;
class gender(ref="Male") / param=ref;
model N284(event='1')=gender ;
ods output ParameterEstimates=ok;
run;
My idea was to create ODS output and delete the unnecessary variables other than the P-value and merge them into one dataset according to the OUTCOME variable names in the model: e.g.
Variable P-value
A1 0.005
A2 0.018
.. ....
I tried to play with some proc macro but I just cant get it work!!!
I really need help on this, Thank you very much.
SRSwift might be onto something (don't know enough about his method to tell), but here's a way to do it using a macro.
First, count the number of variables in your dataset. Do this by selecting your table from the dictionary.columns table. This puts the number of variables into &sqlobs. Now read the variable names from the dictionary table into macro variables var1-var&sqlobs.
%macro logitall;
proc sql;
create table count as
select name from dictionary.columns
where upcase(libname) = 'WORK'
and upcase(memname) = 'BASELINE_GENDER'
and upcase(name) ne 'GENDER'
;
select name into :var1 - :var&sqlobs
from dictionary.columns
where upcase(libname) = 'WORK'
and upcase(memname) = 'BASELINE_GENDER'
and upcase(name) ne 'GENDER'
;
quit;
Then run proc logistic for each dependent variable, each time outputting a dataset named after dependent variable.;
%do I = 1 %to &sqlobs;
proc logistic data=Baseline_gender ;
class gender(ref="Male") / param=ref;
model &&var&I.(event='1')=gender ;
ods output ParameterEstimates=&&var&I.;
run;
%end;
Now put all the output datasets together, creating a new variable with the dataset name using indsname= in the set statement.
data allvars;
format indsname dsname varname $25.;
set
%do I = 1 %to &sqlobs;
&&var&I.
%end;
indsname=dsname;
varname=dsname;
keep varname ProbChiSq;
where variable ne 'Intercept';
run;
%mend logitall;
%logitall;
Here is a macro free approach. It restructures the data in advance and uses SAS's by grouping. The data is stored in a deep format where the all the outcome variable values are stored in one new variable.
Create some sample data:
data have;
input
outcome1
outcome2
outcome3
gender $;
datalines;
1 1 1 Male
0 1 1 Male
1 0 1 Female
0 1 0 Male
1 1 0 Female
0 0 0 Female
;
run;
Next transpose the data into a deep format using an array:
data trans;
set have;
/* Create an array of all the outcome variables */
array o{*} outcome:;
/* Loop over the outcome variables */
do i = 1 to dim(o);
/* Store the variable name for grouping */
_NAME_ = vname(o[i]);
/* Store the outcome value in the */
outcome = o[i];
output;
end;
keep _NAME_ outcome gender;
run;
proc sort data = trans;
by _NAME_;
run;
Reusing your logistic procedure but with an additional by statement:
proc logistic data = trans;
/* Use the grouping variable to select multiple analyses */
by _NAME_;
class gender(ref = "Male");
/* Use the new variable for the dependant variable */
model outcome = gender / noint;
ods output ParameterEstimates = ok;
run;
Here is another way to do it using macro. First define all the variables to be used as outcome in a global variable and then write the macro script.
%let var = var1 var2 var3 ..... var1199;
%macro log_regression;
%do i=1 %to %eval(%sysfunc(countc(&var., " "))+1);
%let outcome_var = %scan(&var, &i);
%put &outcome_var.;
proc logistic data = baseline_gender desc;
class gender (ref = "Male") / param = ref;
model &outcome_var. = gender;
ods output ParameterEstimates = ParEst_&outcome_var.;
run;
%if %sysfunc(exist(univar_result)) %then %do;
data univar_result;
set univar_result ParEst_&outcome_var.;
run;
%end;
%else %do;
data univar_result;
set ParEst_&outcome_var.;
run;
%end;
%end;
%mend;

Get out the value of a variable in each observation to a macro variable

I have a table called term_table containing the below columns
comp, term_type, term, score, rank
I go through every observation and at each obs, I want to store the value of variable rank to a macro variable called curr_r. The code I created below does not work
Data Work.term_table;
input Comp $
Term_type $
Term $
Score
Rank
;
datalines;
comp1 term_type1 A 1 1
comp2 term_type2 A 2 10
comp3 term_type3 A 3 20
comp4 term_type4 B 4 20
comp5 term_type5 B 5 40
comp6 term_type6 B 6 100
;
Run;
%local j;
DATA tmp;
SET term_table;
LENGTH freq 8;
BY &by_var term_type term;
RETAIN freq;
CALL SYMPUT('curr_r', rank);
IF first.term THEN DO;
%do j = 1 %to &curr_r;
do some thing
%end;
END;
RUN;
Could you help me to solve the problem
Thanks a lot
Hung
The call symput statement does create the macro var &curr_r with the value of rank, but it is not available until after the data step.
However, I don't think you need to create the macro var &curr_r. I don't think a macro is needed at all.
I think the below should work: (Untested)
DATA tmp;
SET term_table;
LENGTH freq 8;
BY &by_var term_type term;
RETAIN freq;
IF first.term THEN DO;
do j = 1 to rank;
<do some thing>
end;
END;
RUN;
If you needed to use the rank from a prior obs, use the LAG function.
Start=Lag(rank);
To store each value of RANK in a macro variable, the below will do that:
Proc Sql noprint;
select count(rank)
into :cnt
from term_table;
%Let cnt=&cnt;
select rank
into :curr_r1 - :curr_r&cnt
from term_table;
quit;