SAS Macro - Combining multiple tables into one, controlled by another table - macros

I've come in late to a project and want to write a macro that normalises some data for export to a SQL Server.
There are two control tables...
- Table 1 (customers) has a list of customer unique identifiers
- Table 2 (hierarchy) has a list of table names
There are then n additional tables. One for each record in (hierarchy) (named in the SourceTableName field). With the form of...
- CustomerURN, Value1, Value2
I want to combine all of these tables into a single table (sample_results), with the form of...
- SourceTableName, CustomerURN, Value1, Value2
The only records that should be copied, however, should be for CustomerURNs that exist in the (customers) table.
I could do this in a hard coded format using proc sql, something like...
proc sql;
insert into
SAMPLE_RESULTS
select
'TABLE1',
data.*
from
Table1 data
INNER JOIN
customers
ON data.CustomerURN = customers.CustomerURN
<repeat for every table>
But every week new records are added to the hierarchy table.
Is there any way to write a loop that picks up the table name from the hierarchy table, then calls the proc sql to copy the data into sample_results?

You could concatenate all the hierarchy tables together, and do a single SQL join
proc sql ;
drop table all_hier_tables ;
quit ;
%MACRO FLAG_APPEND(DSN) ;
/* Create new var with tablename */
data &DSN._b ;
length SourceTableName $32. ;
SourceTableName = "&DSN" ;
set &DSN ;
run ;
/* Append to master */
proc append data=&DSN._b base=all_hier_tables force ;
run ;
%MEND ;
/* Append all hierarchy tables together */
data _null_ ;
set hierarchy ;
code = cats('%FLAG_APPEND(' , SourceTableName , ');') ;
call execute(code); /* run the macro */
run ;
/* Now merge in... */
proc sql;
insert into
SAMPLE_RESULTS
select
data.*
from
all_hier_tables data
INNER JOIN
customers
ON data.CustomerURN = customers.CustomerURN
quit;

Another way is to create a view so that it will always reflect the latest data in the metadata tables. The call execute function is used to read in the table names from the hierarchy dataset. Here is an example which you should be able to modify to suit your data, the last bit of code is the relevant one to you.
data class1 class2 class3;
set sashelp.class;
run;
data hierarchy;
input table_name $;
cards;
class1
class2
class3
;
run;
data ages;
input age;
cards;
11
13
15
;
run;
data _null_;
set hierarchy end=last;
if _n_=1 then call execute('proc sql; create view sample_results_view as ' );
if not last then call execute('select * from '||trim(table_name)||' where age in (select age from ages) union all ');
if last then call execute('select * from '||trim(table_name)||' where age in (select age from ages); quit;');
run;

Related

Is there a way to pass a list under a macro code?

I have a customer survey data like this:
data feedback;
length customer score comment $50.;
input customer $ score comment & $;
datalines;
A 3 The is no parking
A 5 The food is expensive
B . I like the food
C 5 It tastes good
C . blank
C 3 I like the drink
D 4 The dessert is tasty
D 2 I don't like the service
;
run;
There is a macro code like this:
%macro subset( cust=);
proc print data= feedback;
where customer = "&cust";
run;
%mend;
I am trying to write a program that call the %subset for each customer value in feedback data. Note that we do not know how many unique values of customer there are in the data set. Also, we cant change the %subset code.
I tried to achieve that by using proc sql to create a unique list of customers to pass into macro code but I think you cannot pass a list in a macro code.
Is there a way to do that? p.s I am beginner in macro
I like to keep things simple. Take a look at the following:
data feedback;
length customer score comment $50.;
input customer $ score comment & $;
datalines;
A 3 The is no parking
A 5 The food is expensive
B . I like the food
C 5 It tastes good
C . blank
C 3 I like the drink
D 4 The dessert is tasty
D 2 I don't like the service
;
run;
%macro subset( cust=);
proc print data= feedback;
where customer = "&cust";
run;
%mend subset;
%macro test;
/* first get the count of distinct customers */
proc sql noprint;
select count(distinct customer) into : cnt
from feedback;quit;
/* do this to remove leading spaces */
%let cnt = &cnt;
/* now get each of the customer names into macro variables
proc sql noprint;
select distinct customer into: cust1 - :cust&cnt
from feedback;quit;
/* use a loop to call other macro program, notice the use of &&cust&i */
%do i = 1 %to &cnt;
%subset(cust=&&cust&i);
%end;
%mend test;
%test;
of course if you want short and sweet you can use (just make sure your data is sorted by customer):
data _null_;
set feedback;
by customer;
if(first.customer)then call execute('%subset(cust='||customer||')');
run;
First fix the SAS code. To test if a value is in a list using the IN operator, not the = operator.
where customer in ('A' 'B')
Then you can pass that list into your macro and use it in your code.
%macro subset(custlist);
proc print data= feedback;
where customer in (&custlist);
run;
%mend;
%subset(custlist='A' 'B')
Notice a few things:
Use quotes around the values since the variable is character.
Use spaces between the values. The IN operator in SAS accepts either spaces or comma (or both) as the delimiter in the list. It is a pain to pass in comma delimited lists in a macro call since the comma is used to delimit the parameters.
You can defined a macro parameter as positional and still call it by name in the macro call.
If the list is in a dataset you can easily generate the list of values into a macro variable using PROC SQL. Just make sure the resulting list is not too long for a macro variable (maximum of 64K bytes).
proc sql noprint;
select distinct quote(trim(customer))
into :custlist separated by ' '
from my_subset
;
quit;
%subset(&custlist)

Merging SAS datasets by different column names across several columns

I have 2 data sets that I want to merge by territory #...the first dataset has territory information including territory #, the second dataset has territory #'s but they are across 4 different columns titled drug_terr1, drug_terr2, drug_terr3, and drug_Terr4...I need to merge on all 4 columns because they each have different territory #'s and I want those numbers to be included in my merge with the dataset that has all the territory information...I tried a rename but that didn't work because it only changed the first column...is there a way to combine all this data, and rename it by territory # so I can do the merge?
ultimately would like it to look like this, but need to get the 4 columns from 'terrfile' to become 1 column called territory_nbr so I can merge.
%let output = E:\Horizon\Adhoc\AH\;
%let terrs =\\uslsasas1\E$\Horizon\IMS Processing\Weekly Data\20161230\Demo\;
libname terrs "&terrs.";
%let curr_process_wk = '12-30-2016';
%let curr_quarter =_q1;
**0 Grab pskw;
data pskw_data;
set PSKW.PSKWMaster ;
where week in ('12-16-2016','12-23-2016','12-30-2016','01-06-2017') and CopayType ="FBD" and FNRX=1 and pme_id in (46,42,55,38) and product in ('DUEXIS','VIMOVO','PENNSAID')
and
(COBPrimaryRejectCode1 in ('75','76') or COBPrimaryRejectCode2 in ('75', '76') or COBPrimaryRejectCode3 in ('75' , '76'));
run;
proc sort data=pskw_data;
by imsid;
run;
** 01 Grab tbl HCP;
proc sort data=ims.tblhcp (where = (week = &curr_process_wk.) keep = week imsid first_name last_name address1 address2 city state zip spec)
out = IMS_demo (drop = week);
by IMSID;
run;
** 02 Grab tbl terrs_by_imsid;
data terrfile;
set terrs.wd2_terrs_by_imsid&curr_quarter.;
run;
proc sort data = terrfile;
by imsid;
run;
** 03 Grab tbl roster;
data roster (keep = territorycode repname territoryname teamname);
set ims.tblRoster;
repname = trim(left(FirstName))||" "||trim(left(LastName));
run;
**04 link ;
data combine_dbs;
merge pskw_data (in=in1)
ims.tblhcp (in=in2);
by imsid;
if in1;
run;
data territories; ***can't merge because territory code is not in terrfile, just 4 columns as I mentioned above***;
merge terrfile (in=in1)
roster (in=in2);
by territorycode;
if in2;
run;
You need to merge the fact table with the lookup table four times. Let's say your territory identifier is called ID in your lookup table you want to take the field IMS_ID from it. Let's also assume your four fields in your fact table are named ID1-ID4.
proc sql ;
create table want as
select a.*
, b.ims_id as ims_id1
, c.ims_id as ims_id2
, d.ims_id as ims_id3
, e.ims_id as ims_id4
from FACT a
left join LU b on a.id1=b.id
left join LU c on a.id2=c.id
left join LU d on a.id3=d.id
left join LU e on a.id4=e.id
;
quit;
In your example it looks ROSTER is your FACT table and TERRFILES is your LU table. Your ID variable looks like it is name TERRITORYCODE, at least in your lookup file. Hard to tell what the four variables in ROSTER are named.

How to code a flexible numbered table list for DATA MERGE in SAS?

I have a program that should merge any number of tables numbered consecutively. I tried to use macro variables but to no avail. The error " Missing numeric suffix on a numbered data set list" keeps popping up.
Here is the defective code:
DATA INPUTF;
INPUT DSN $;
CARDS;
forum1
forum2
forum3
;
RUN;
DATA forum1;
INPUT contact $ forum1 $;
CARDS;
Mash HERE
Greg HERE
Bob HERE
;
PROC SORT DATA=forum1;
BY contact;
RUN;
DATA forum2;
INPUT contact $ forum2 $;
CARDS;
Mash HERE
Sid HERE
Bob HERE
;
RUN;
PROC SORT DATA=forum2;
BY contact;
RUN;
DATA forum3;
INPUT contact $ forum3 $;
CARDS;
Mash HERE
Sid HERE
Jim HERE
;
RUN;
PROC SORT DATA=forum3;
BY contact;
RUN;
PROC SQL NOPRINT;
SELECT COUNT(*) INTO :n FROM INPUTF;
QUIT;
%MACRO COMBINE(N);
DATA ALLIN;
MERGE forum1-forum&n.;
BY contact;
RUN;
%MEND COMBINE;
%COMBINE;
PROC PRINT DATA=ALLIN;
The code however, works fine when i used a %LET statement as follows:
%let n=3;
DATA ALLIN;
MERGE forum1-forum&n.;
BY contact;
RUN;
PROC PRINT DATA=ALLIN;
The problem is I won't know how many forums are there, and I prefer that the number be based on the input file.
Any help is appreciated! Thanks!
Macro variable scope.
You've created a macro variable N that exists in the global table. When you create the macro, it takes a parameter, also called N which is local and undefined because you didn't pass a valid parameter.
Call your macro with the created parameter N or move the proc SQL into the macro.
%COMBINE(&N);
OR
%MACRO COMBINE;
PROC SQL NOPRINT;
SELECT COUNT(*) INTO :n FROM INPUTF;
QUIT;
DATA ALLIN;
MERGE forum1-forum&n.;
BY contact;
RUN;
%MEND COMBINE;
%COMBINE;
OR
If you only have tables that start with FORUM that you're trying to merge:
DATA ALLIN;
MERGE FORUM: ;
BY contact;
RUN;
So if your dataset InputF has the list of datasets that you want to merge, then put that list into a macro variable. If you always have at least two datasets then no macro logic is required.
proc sql noprint ;
select dsn into :dsnlist separated by ' '
from inputf;
quit;
data allin;
merge &dsnlist ;
by contact;
run;
To handle the case when you have 0 or 1 dataset name in the list you would need to add macro logic. When there is just one you need to use SET instead of MERGE. You could handle that with the IFC() function.
data allin;
%sysfunc(ifc(1=&sqlobs,set,merge)) &dsnlist ;
by contact;
run;

MONYY7. and DATE9. operations

I'm working on a very big data set, (more than 100 variables and 11 millions observations). In this data set, i have a variable named DTDSI (simulation date) in DATE9. format. (For example: 01APR2015 , 02MAR2015...). I have a macro-program to analyse this data set by comparing the observations in 2 different months:
%macro analysis (data_input , m , m_1);
.....
%mend;
The 2 macro-variables m and m_1 are months that i want to compare. Their format is MONYY7.(APR2015 , MAR2015...). Keep in mind that i cannot modify my data_input (its the data of my company). In the beginning of my macro program, i want to create a new data set with only the observations of the &m and &m_1 month. I can easily do that by creating a new date variable from DTDSI (real_month for ex) but in the format MONYY7. Then i just select the observations where real_month equal &m or real_month equal &m:
Data new;
Set &data_input;
mois_real = input(DTDSI,MONYY7);
RUN;
PROC SQL;
CREATE TABLE NEW AS;
SELECT *
WHERE mois_real in ("&m" , "&m_1")
FROM NEW;
....
The problem is that in my first Data Statement, i duplicated my data_input; which is bad because it took 30 minutes. So can you tell me how can i make my selection (DTDSI = m and DTDSI=m_1) right in my first Statement?
You can use formula's in your where/if condition, so apply your formula from step 1 into step 2 or vice versa.
Data new;
set &data_input;
WHERE put(DTDSI,MONYY7) in ("&m" , "&m_1");
run;

Splitting comma delimited cell data

I have a spreadsheet with multiple columns, one of which is an owner_id column. The problem is that this column contains a comma delimited list of owner id's and not just a single one.
I've imported this spreadsheet into my sql database (2008) and have completed other importing tasks and now have a parcel_id column as a result of this process.
I need to create an entry in my parcelOwners table for each parcelID/ownerID pair, but I'm not sure how to go about this with the owner id's being in the comma delimited list.
My tables look like this:
ImportData
=================
owner_id varchar,
parcelID int
sample row (owner_id = '13782, 21431', parcelID = 319)
ParcelOwners
=================
ownerID int,
parcelID int
row from ImportData table should look like:
ownerID = 13782, parcelID = 319
ownerID = 21431, parcelID = 319
Is this a common situation for anybody and if so, how do you go about getting around this?
The below function will split you comma sep column into a table. You will then need to iterate through the temp table and insert 1 row into your parcelOwners table using the data from your single column. To get this to work you will need an outer loop to iterate through the parcelOwners table and an inner loop to iterate through the #temptable for each row. Also, don't forget, if you come to a row in your outer loop with no comma's in the owner_id column you won't want to do anything.
CREATE FUNCTION dbo.Split(#String varchar(8000), #Delimiter char(1))
returns #temptable TABLE (items varchar(8000))
as
begin
declare #idx int
declare #slice varchar(8000)
select #idx = 1
if len(#String)<1 or #String is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#String)
if #idx!=0
set #slice = left(#String,#idx - 1)
else
set #slice = #String
if(len(#slice)>0)
insert into #temptable(Items) values(#slice)
set #String = right(#String,len(#String) - #idx)
if len(#String) = 0 break
end
return
end
You can do this easily leveraging SQL Server's XML functions:
WITH xmlData (xml_owner_id,parecelID) AS (
/* make into xml */
SELECT cast('<x>'+replace(owner_id,',','</x><x>')+'</x>' as XML) AS xml_owner_id, parecelID
FROM ImportData
)
SELECT x.value('.','int') AS owner_id, parecelID /* split up */
FROM xmlData
CROSS APPLY xmlData.xml_owner_id.nodes('//x') AS func(x)
(In response to #senloe's question about how to use the function supplied by #RandomBen)
This answer to a previous question shows how to use OUTER APPLY to apply a function to every row in a table. In your case, and assuming you have already run #RandomBen's code to create the dbo.Split function, the syntax would look something like this:
INSERT INTO ParcelOwners (ownerId, parcelID)
SELECT CONVERT(int, Results.items), ImportData.parcelID
FROM ImportData
OUTER APPLY dbo.Split(ImportData.owner_id, ',') AS Results
(I don't have access to SQL Server right now, so I haven't tried it yet. You can run it without the first line, i.e. just from SELECT onwards, to see what output it is going to generate before you actually do the INSERT).