Copy a dataset B for each variable in dataset A - merge

I have the two following datasets:
Dataset A:
ID
A
B
C
Dataset B:
Age
35
49
53
And I want to copy paste B to each ID of A:
ID Age
A 35
A 49
A 53
B 35
B 49
...
For the moment I do this with a %do cicle but is there a more elegant way to do this? With a single PROC SQL or Datastep for example?
Thanks in advance

You can use SQL to perform a Cartesian product to get all combinations.
For example:
/* setup id data */
data have1;
input id $char1.;
datalines;
A
B
C
;
/* setup age data */
data have2;
input age 8.;
datalines;
35
49
53
;
/* perform Cartesian product */
proc sql noprint;
create table
want
as
select
*
from
have1
,have2
;
quit;

There is no need for macro code. You can use the POINT= option on the SET statement to do this in a data step.
data want;
set a;
do p=1 to nobs;
set b point=p nobs=nobs;
output;
end;
run;

Related

Is there a way to pass a list under a macro code?

I have a customer survey data like this:
data feedback;
length customer score comment $50.;
input customer $ score comment & $;
datalines;
A 3 The is no parking
A 5 The food is expensive
B . I like the food
C 5 It tastes good
C . blank
C 3 I like the drink
D 4 The dessert is tasty
D 2 I don't like the service
;
run;
There is a macro code like this:
%macro subset( cust=);
proc print data= feedback;
where customer = "&cust";
run;
%mend;
I am trying to write a program that call the %subset for each customer value in feedback data. Note that we do not know how many unique values of customer there are in the data set. Also, we cant change the %subset code.
I tried to achieve that by using proc sql to create a unique list of customers to pass into macro code but I think you cannot pass a list in a macro code.
Is there a way to do that? p.s I am beginner in macro
I like to keep things simple. Take a look at the following:
data feedback;
length customer score comment $50.;
input customer $ score comment & $;
datalines;
A 3 The is no parking
A 5 The food is expensive
B . I like the food
C 5 It tastes good
C . blank
C 3 I like the drink
D 4 The dessert is tasty
D 2 I don't like the service
;
run;
%macro subset( cust=);
proc print data= feedback;
where customer = "&cust";
run;
%mend subset;
%macro test;
/* first get the count of distinct customers */
proc sql noprint;
select count(distinct customer) into : cnt
from feedback;quit;
/* do this to remove leading spaces */
%let cnt = &cnt;
/* now get each of the customer names into macro variables
proc sql noprint;
select distinct customer into: cust1 - :cust&cnt
from feedback;quit;
/* use a loop to call other macro program, notice the use of &&cust&i */
%do i = 1 %to &cnt;
%subset(cust=&&cust&i);
%end;
%mend test;
%test;
of course if you want short and sweet you can use (just make sure your data is sorted by customer):
data _null_;
set feedback;
by customer;
if(first.customer)then call execute('%subset(cust='||customer||')');
run;
First fix the SAS code. To test if a value is in a list using the IN operator, not the = operator.
where customer in ('A' 'B')
Then you can pass that list into your macro and use it in your code.
%macro subset(custlist);
proc print data= feedback;
where customer in (&custlist);
run;
%mend;
%subset(custlist='A' 'B')
Notice a few things:
Use quotes around the values since the variable is character.
Use spaces between the values. The IN operator in SAS accepts either spaces or comma (or both) as the delimiter in the list. It is a pain to pass in comma delimited lists in a macro call since the comma is used to delimit the parameters.
You can defined a macro parameter as positional and still call it by name in the macro call.
If the list is in a dataset you can easily generate the list of values into a macro variable using PROC SQL. Just make sure the resulting list is not too long for a macro variable (maximum of 64K bytes).
proc sql noprint;
select distinct quote(trim(customer))
into :custlist separated by ' '
from my_subset
;
quit;
%subset(&custlist)

Outputting conditionally from merge

I want to update a history file in SAS. I have new observations, which may overlap with existing data lines.
What is needed, is a file, which would have lines from dataset (new_data) where they exist and in case the lines do not exist, then from old set (old_data). What I've come up is a clunky merge operation, which is conditional on the order of the datasets. (==Works only if New_data is after Old_data. :?)
data new_data;
input key value;
datalines;
1 10
1 11
2 20
2 21
;
run;
data old_data;
input key value;
datalines;
2 50
2 51
3 30
3 31
;
run;
So I'd like to have the following:
key value
1 10
1 11
2 20
2 21
3 30
3 31
However the following does not work. It produces the output below it.
data updated_history;
merge New_data(in=a) old_data(in=b) ;
by key;
if a or (b and not a );
run;
....
2 50
2 51
...
But for some reason this does:
data updated_history;
merge old_data(in=b) New_data(in=a);
by key;
if a or (b and not a );
run;
Question: Is there an intelligent way to manage from which dataset the values are select from. Something like: if a then value_from_dataset a;
The order in which you list the data sets in the MERGE is the order the data is taken. So when the order is old, new values from old are read and then values from new overwrite the values from old. This is why your second version works and the first does not.
Since you have multiple observations per key value you probably do NOT want to use MERGE to combine these files. You could do it using SET by reading the data twice using two DOW loops. In that case it won't matter the order of the dataset in the SET statement since the records are interleaved instead of being joined. This first loop will calculate which of the two input datasets have any observations for this KEY value.
data want ;
anyold=0;
anynew=0;
do until (last.key);
set old_data (in=inold) new_data(in=innew);
by key ;
if inold then anyold=1;
if innew then anynew=1;
end;
do until (last.key);
set old_data (in=inold) new_data(in=innew);
by key ;
if not (anyold and anynew and inold) then output;
end;
drop anyold anynew;
run;
This type of combination is probably easier to code using SQL.
proc sql ;
create table want as
select key,value from new_data
union
select key,value from old_data
where key in (select key from old_data except select key from new_data)
order by 1
;
quit;

How to Merge the given two SAS datasets

I have two SAS datasets of following type
Dataset1, D1 is as follows
ID Date Amount
x1 10/12/2015 100
x2 200
x2 150
x3 10/10/2014 90
x4 60
Dataset, D2 is of the form
ID Date
x2 10/12/2016
x4 1/1/2016
Dataset D1 can have duplicate values of ID. Dataset D2 has only unique values of ID. Further, D2 consists of only IDs in Dataset D1 which has missing values of variable date (x2 and x4 have date missing in D1). I want to merge D1 with D2 such that the output is as follows
ID Date Amount
x1 10/12/2015 100
x2 10/12/2016 200
x2 10/12/2016 150
x3 10/10/2014 90
x4 1/1/2016 60
Is this doable without using proc sql in SAS. Can I use merge?
I tried using the following but to no use (and it should not work either because D1 has duplicate IDs)
data x;
merge D1 (in=in1) D2(in=in2);
by ID;
if in1;
run;
DATA step merge of two datasets works fine when only one of the datasets has a unique ID. The problem with your merge is that the DATE variables from each dataset will collide:
231 options msglevel=i;
232 data x;
233 merge D1 D2;
234 by ID;
235 put (ID Date Amount)(=);
236 run;
INFO: The variable Date on data set WORK.D1 will be overwritten by data set WORK.D2.
ID=x1 Date=10/12/2015 Amount=100
ID=x2 Date=10/12/2016 Amount=200
ID=x2 Date=. Amount=150
ID=x3 Date=10/10/2014 Amount=90
ID=x4 Date=01/01/2016 Amount=60
The MSGLEVEL=i option generates the INFO: line in the log which alerts you to the collision. In this case you almost get the results you want, despite the collision. The problem is the third record, where DATE is missing. This is a side-effect of having a collision in a one-to-many merge.
I would suggest you avoid the collision by renaming the DATE variable in each dataset. You can then compute a new DATE variable by using the COALESCE() function, which returns the first value that is not missing:
237 data want;
238 merge d1 (keep=ID Date Amount rename=(Date=Date1))
239 d2 (keep=ID Date rename=(Date=Date2))
240 ;
241 by ID;
242 Date=coalesce(Date1,Date2);
243 put (ID Date1 Date2 Date Amount)(=);
244 format Date mmddyy10.;
245 run;
ID=x1 Date1=10/12/2015 Date2=. Date=10/12/2015 Amount=100
ID=x2 Date1=. Date2=10/12/2016 Date=10/12/2016 Amount=200
ID=x2 Date1=. Date2=10/12/2016 Date=10/12/2016 Amount=150
ID=x3 Date1=10/10/2014 Date2=. Date=10/10/2014 Amount=90
ID=x4 Date1=. Date2=01/01/2016 Date=01/01/2016 Amount=60
Is this doable without using proc sql in SAS. Can I use merge?
Yes, any SQL step can be done in a Data step, but may take up more or less code space.
Here is a potential solution:
data DateN DateY;
set D1;
if date=. then output step1;
else output step2;
run;
data merge;
DateN(keep=ID Amount) D2;
by id;
run;
data x;
set merge DateY;
run;
proc sort data=x;
by ID;
run;
This assumes that the missing values from D1 have unique ID.

References to SAS date macros not working due to differing data types

hoping for some help on this one.
Currently, the query uses the below to create references for m1-m6 and d1-d6.
%let m1=1114;
%let d1 ='30NOV2014'd;
%let m2=1214;
%let d2='31DEC2014'd;
%let m3=0115;
%let d3='31JAN2015'd;
%let m4=0215;
%let d4='28FEB2015'd;
%let m5=0315;
%let d5='31MAR2015'd;
%let m6=0415;
%let d6='30APR2015'd;
Based on the rest of the code, the m1-m6 dates must be formatted as mmyy. I have tried to swap the above out with this:
data _datemacro_;
m1 = put(intnx('day','01NOV2014'd,0),mmyyn4.);
call symput('m1',"'"||put(m1,9.)) ;
d1 = put(intnx('day','30NOV2014'd,0),date9.);
call symput('d1',"'"||put(d1,9.)) ;
m2 = put(intnx('day',&d1,+1),mmyyn4.);
call symput('m2',"'"||put(m2,9.)) ;
d2 = put(intnx('month',&d1,+1,'e'),date9.);
call symput('d2',"'"||put(d2,9.)||"'"||"d") ;
...etc through m6 and d6
run;
Below is the rest of the code that yields a garden variety of errors, including
ERROR 22-322: Syntax error, expecting one of the following: a name, a
quoted string, (, /, ;, DATA, LAST, NULL.
ERROR 200-322: The symbol is not recognized and will be ignored.
proc sql;
create table perf as
select a. field, a. field, a. field, a. reportingdate,
a. field, a. field,
e. field,
f. field
from table a, table2 e, table3 f
where a. reportingdate between &d1. and &d6.
and (a. field=1 or a. field=1)
and a. field = e. field and a. field = f. field;
quit;
/*Creates performance file by month*/
%macro month (mon,date);
data m&mon. (rename=(field=active&mon. field=co&mon. field=es&mon. field=sr&mon.));
set perf;
where datepart(reportingdate)=&date.;
run;
proc sort data=m&mon.; by field descending co&mon.; run;
proc sort data=m&mon. nodupkey out=m2&mon.; by field; run;
%mend month;
%month (&m1.,&d1.);
%month (&m2.,&d2.);
%month (&m3.,&d3.);
%month (&m4.,&d4.);
%month (&m5.,&d5.);
%month (&m6.,&d6.);
I am able to get it to run accurately until the last 6 lines, where it comes up with 78 errors just from running those 6 lines.
Any suggestions on how to write the macro to keep the correct data type while accurately defining the month start and end dates? When I try and change the start date and end date of each month to the same format, something within the rest of the code causes an error stating that it cannot work with two variables of different formats, even when they are clearly the same format as defined in the code.
Please let me know if there is anything I can clarify, as this was a little harder to explain than I intended.
Thank you for your help.
So you're basically trying to build a macro that will run a report month-by-month. I think using a macro is a good idea, but your structure could benefit from a re-org.
The first thing to fix is the hardcoded dates. Hardcoding is bad 99% of the time. Why not use a loop instead?
Initialise the start and end dates at the top of your program. In future they're easy to find and change if they're at the top, and you won't need to search through your code trying to figure out what else needs to change:
* PICK ANY DATES IN THE MONTHS YOU WANT TO START AND END. IE. DOESNT MATTER IF YOU CHOOSE THE FIRST OR THE 20TH. IT WILL RUN FOR THAT MONTH;
%let rpt_start = %sysfunc(mdy(11,1,2014));
%let rpt_end = %sysfunc(mdy( 4,1,2015));
Go and get the data between the start and end dates:
proc sql;
create table perf as
select a. field, a. field, a. field, a. reportingdate,
a. field, a. field,
e. field,
f. field
from table a, table2 e, table3 f
where a. reportingdate between &rpt_start and &rpt_end
and (a. field=1 or a. field=1)
and a. field = e. field and a. field = f. field;
quit;
Now loop over each month inbetween the start and end dates. Create the desired datasets as we go.
%macro create_monthly_datasets;
%local tmp_dt tmp_end rpt_dt;
%let tmp_end = %sysfunc(intnx(month,&rpt_end,0,end)); *CALC END-DATE DESIRED;
%let tmp_dt = %sysfunc(intnx(month,&rpt_start,0,beginning)); *INITIALISE LOOP;
%do %while (&tmp_dt le &tmp_end);
* CALC ACUTAL DATE WANTED AND STORE IT IN RPT_DT;
* CALC THE MMYY VAL YOU NEED;
* PRINT OUT BOTH VALUES TO MAKE SURE THEYRE CORRECT;
%let rpt_dt = %sysfunc(intnx(month,&tmp_dt,0,end));
%let mmyy = %sysfunc(month(&rpt_dt),z2.)%substr(%sysfunc(year(&rpt_dt)),3,2);
%put %sysfunc(sum(&rpt_dt),date9.) &mmyy;
* DO THE WORK;
data m&mmyy (rename=(field=active&mmyy field=co&mmyy field=es&mmyy field=sr&mmyy));
set perf;
where datepart(reportingdate)=&rpt_dt;
run;
proc sort data=m&mmyy; by field descending co&mmyy; run;
proc sort data=m&mmyy nodupkey out=m2&mmyy; by field; run;
%let tmp_dt = %sysfunc(intnx(month,&tmp_dt,1,beginning)); * ITERATE LOOP;
%end;
%mend;
%create_monthly_datasets;
Try the following for creating the macro variables:
data _null_;
START_MTH = '01nov2014'd;
do i = 1 to 6;
T_DATE = intnx('month',START_MTH,i)-1; /*Shift forwards i months then back 1 day*/
call symput(cats('m',i),put(T_DATE,mmyyn4.));
call symput(cats('d',i),cats("'",put(T_DATE,date9.),"'d"));
end;
run;

SAS Macro - Combining multiple tables into one, controlled by another table

I've come in late to a project and want to write a macro that normalises some data for export to a SQL Server.
There are two control tables...
- Table 1 (customers) has a list of customer unique identifiers
- Table 2 (hierarchy) has a list of table names
There are then n additional tables. One for each record in (hierarchy) (named in the SourceTableName field). With the form of...
- CustomerURN, Value1, Value2
I want to combine all of these tables into a single table (sample_results), with the form of...
- SourceTableName, CustomerURN, Value1, Value2
The only records that should be copied, however, should be for CustomerURNs that exist in the (customers) table.
I could do this in a hard coded format using proc sql, something like...
proc sql;
insert into
SAMPLE_RESULTS
select
'TABLE1',
data.*
from
Table1 data
INNER JOIN
customers
ON data.CustomerURN = customers.CustomerURN
<repeat for every table>
But every week new records are added to the hierarchy table.
Is there any way to write a loop that picks up the table name from the hierarchy table, then calls the proc sql to copy the data into sample_results?
You could concatenate all the hierarchy tables together, and do a single SQL join
proc sql ;
drop table all_hier_tables ;
quit ;
%MACRO FLAG_APPEND(DSN) ;
/* Create new var with tablename */
data &DSN._b ;
length SourceTableName $32. ;
SourceTableName = "&DSN" ;
set &DSN ;
run ;
/* Append to master */
proc append data=&DSN._b base=all_hier_tables force ;
run ;
%MEND ;
/* Append all hierarchy tables together */
data _null_ ;
set hierarchy ;
code = cats('%FLAG_APPEND(' , SourceTableName , ');') ;
call execute(code); /* run the macro */
run ;
/* Now merge in... */
proc sql;
insert into
SAMPLE_RESULTS
select
data.*
from
all_hier_tables data
INNER JOIN
customers
ON data.CustomerURN = customers.CustomerURN
quit;
Another way is to create a view so that it will always reflect the latest data in the metadata tables. The call execute function is used to read in the table names from the hierarchy dataset. Here is an example which you should be able to modify to suit your data, the last bit of code is the relevant one to you.
data class1 class2 class3;
set sashelp.class;
run;
data hierarchy;
input table_name $;
cards;
class1
class2
class3
;
run;
data ages;
input age;
cards;
11
13
15
;
run;
data _null_;
set hierarchy end=last;
if _n_=1 then call execute('proc sql; create view sample_results_view as ' );
if not last then call execute('select * from '||trim(table_name)||' where age in (select age from ages) union all ');
if last then call execute('select * from '||trim(table_name)||' where age in (select age from ages); quit;');
run;