SAS: merging on first match only - merge

I'm looking to do a one-to-many merge in SAS, where I would like to only keep the first match.
Example data below:
data one;
input id $ fruit $;
datalines;
a apple
b apple
c banana
d coconut
;
data two;
input id $ color $;
datalines;
a amber
b brown
c cream
c cocoa
c carmel
;
data both;
merge one two;
by id;
run;
proc print data=both;
run;
As you can see, this is a one-to-many merge.
Is there a way to make it keep only the first match? i.e. the output would be as below:
a apple amber
b apple brown
c banana cream
d coconut .
The background here is that the first dataset contains properties, and the second contains leases, and I am looking to find only the first lease on a property. I've only just started learning SAS, so it might be that there is a function better suited to this?
Many thanks!
Mike

Check this out:-
/*Creating Datasets*/
data one;
input id $ fruit $;
datalines;
a apple
b apple
c banana
d coconut
;
data two;
input id $ color $;
datalines;
a amber
b brown
c cream
c cocoa
c carmel
;
/*Just insert first.Id=1 in your code, it should do the job*/
data both;
merge one two;
by id;
if first.id =1;
run;
proc print data=both;
run;
Hope this helps:-)

Related

Copy a dataset B for each variable in dataset A

I have the two following datasets:
Dataset A:
ID
A
B
C
Dataset B:
Age
35
49
53
And I want to copy paste B to each ID of A:
ID Age
A 35
A 49
A 53
B 35
B 49
...
For the moment I do this with a %do cicle but is there a more elegant way to do this? With a single PROC SQL or Datastep for example?
Thanks in advance
You can use SQL to perform a Cartesian product to get all combinations.
For example:
/* setup id data */
data have1;
input id $char1.;
datalines;
A
B
C
;
/* setup age data */
data have2;
input age 8.;
datalines;
35
49
53
;
/* perform Cartesian product */
proc sql noprint;
create table
want
as
select
*
from
have1
,have2
;
quit;
There is no need for macro code. You can use the POINT= option on the SET statement to do this in a data step.
data want;
set a;
do p=1 to nobs;
set b point=p nobs=nobs;
output;
end;
run;

Is there a way to pass a list under a macro code?

I have a customer survey data like this:
data feedback;
length customer score comment $50.;
input customer $ score comment & $;
datalines;
A 3 The is no parking
A 5 The food is expensive
B . I like the food
C 5 It tastes good
C . blank
C 3 I like the drink
D 4 The dessert is tasty
D 2 I don't like the service
;
run;
There is a macro code like this:
%macro subset( cust=);
proc print data= feedback;
where customer = "&cust";
run;
%mend;
I am trying to write a program that call the %subset for each customer value in feedback data. Note that we do not know how many unique values of customer there are in the data set. Also, we cant change the %subset code.
I tried to achieve that by using proc sql to create a unique list of customers to pass into macro code but I think you cannot pass a list in a macro code.
Is there a way to do that? p.s I am beginner in macro
I like to keep things simple. Take a look at the following:
data feedback;
length customer score comment $50.;
input customer $ score comment & $;
datalines;
A 3 The is no parking
A 5 The food is expensive
B . I like the food
C 5 It tastes good
C . blank
C 3 I like the drink
D 4 The dessert is tasty
D 2 I don't like the service
;
run;
%macro subset( cust=);
proc print data= feedback;
where customer = "&cust";
run;
%mend subset;
%macro test;
/* first get the count of distinct customers */
proc sql noprint;
select count(distinct customer) into : cnt
from feedback;quit;
/* do this to remove leading spaces */
%let cnt = &cnt;
/* now get each of the customer names into macro variables
proc sql noprint;
select distinct customer into: cust1 - :cust&cnt
from feedback;quit;
/* use a loop to call other macro program, notice the use of &&cust&i */
%do i = 1 %to &cnt;
%subset(cust=&&cust&i);
%end;
%mend test;
%test;
of course if you want short and sweet you can use (just make sure your data is sorted by customer):
data _null_;
set feedback;
by customer;
if(first.customer)then call execute('%subset(cust='||customer||')');
run;
First fix the SAS code. To test if a value is in a list using the IN operator, not the = operator.
where customer in ('A' 'B')
Then you can pass that list into your macro and use it in your code.
%macro subset(custlist);
proc print data= feedback;
where customer in (&custlist);
run;
%mend;
%subset(custlist='A' 'B')
Notice a few things:
Use quotes around the values since the variable is character.
Use spaces between the values. The IN operator in SAS accepts either spaces or comma (or both) as the delimiter in the list. It is a pain to pass in comma delimited lists in a macro call since the comma is used to delimit the parameters.
You can defined a macro parameter as positional and still call it by name in the macro call.
If the list is in a dataset you can easily generate the list of values into a macro variable using PROC SQL. Just make sure the resulting list is not too long for a macro variable (maximum of 64K bytes).
proc sql noprint;
select distinct quote(trim(customer))
into :custlist separated by ' '
from my_subset
;
quit;
%subset(&custlist)

Merging SAS datasets by different column names across several columns

I have 2 data sets that I want to merge by territory #...the first dataset has territory information including territory #, the second dataset has territory #'s but they are across 4 different columns titled drug_terr1, drug_terr2, drug_terr3, and drug_Terr4...I need to merge on all 4 columns because they each have different territory #'s and I want those numbers to be included in my merge with the dataset that has all the territory information...I tried a rename but that didn't work because it only changed the first column...is there a way to combine all this data, and rename it by territory # so I can do the merge?
ultimately would like it to look like this, but need to get the 4 columns from 'terrfile' to become 1 column called territory_nbr so I can merge.
%let output = E:\Horizon\Adhoc\AH\;
%let terrs =\\uslsasas1\E$\Horizon\IMS Processing\Weekly Data\20161230\Demo\;
libname terrs "&terrs.";
%let curr_process_wk = '12-30-2016';
%let curr_quarter =_q1;
**0 Grab pskw;
data pskw_data;
set PSKW.PSKWMaster ;
where week in ('12-16-2016','12-23-2016','12-30-2016','01-06-2017') and CopayType ="FBD" and FNRX=1 and pme_id in (46,42,55,38) and product in ('DUEXIS','VIMOVO','PENNSAID')
and
(COBPrimaryRejectCode1 in ('75','76') or COBPrimaryRejectCode2 in ('75', '76') or COBPrimaryRejectCode3 in ('75' , '76'));
run;
proc sort data=pskw_data;
by imsid;
run;
** 01 Grab tbl HCP;
proc sort data=ims.tblhcp (where = (week = &curr_process_wk.) keep = week imsid first_name last_name address1 address2 city state zip spec)
out = IMS_demo (drop = week);
by IMSID;
run;
** 02 Grab tbl terrs_by_imsid;
data terrfile;
set terrs.wd2_terrs_by_imsid&curr_quarter.;
run;
proc sort data = terrfile;
by imsid;
run;
** 03 Grab tbl roster;
data roster (keep = territorycode repname territoryname teamname);
set ims.tblRoster;
repname = trim(left(FirstName))||" "||trim(left(LastName));
run;
**04 link ;
data combine_dbs;
merge pskw_data (in=in1)
ims.tblhcp (in=in2);
by imsid;
if in1;
run;
data territories; ***can't merge because territory code is not in terrfile, just 4 columns as I mentioned above***;
merge terrfile (in=in1)
roster (in=in2);
by territorycode;
if in2;
run;
You need to merge the fact table with the lookup table four times. Let's say your territory identifier is called ID in your lookup table you want to take the field IMS_ID from it. Let's also assume your four fields in your fact table are named ID1-ID4.
proc sql ;
create table want as
select a.*
, b.ims_id as ims_id1
, c.ims_id as ims_id2
, d.ims_id as ims_id3
, e.ims_id as ims_id4
from FACT a
left join LU b on a.id1=b.id
left join LU c on a.id2=c.id
left join LU d on a.id3=d.id
left join LU e on a.id4=e.id
;
quit;
In your example it looks ROSTER is your FACT table and TERRFILES is your LU table. Your ID variable looks like it is name TERRITORYCODE, at least in your lookup file. Hard to tell what the four variables in ROSTER are named.

How to code a flexible numbered table list for DATA MERGE in SAS?

I have a program that should merge any number of tables numbered consecutively. I tried to use macro variables but to no avail. The error " Missing numeric suffix on a numbered data set list" keeps popping up.
Here is the defective code:
DATA INPUTF;
INPUT DSN $;
CARDS;
forum1
forum2
forum3
;
RUN;
DATA forum1;
INPUT contact $ forum1 $;
CARDS;
Mash HERE
Greg HERE
Bob HERE
;
PROC SORT DATA=forum1;
BY contact;
RUN;
DATA forum2;
INPUT contact $ forum2 $;
CARDS;
Mash HERE
Sid HERE
Bob HERE
;
RUN;
PROC SORT DATA=forum2;
BY contact;
RUN;
DATA forum3;
INPUT contact $ forum3 $;
CARDS;
Mash HERE
Sid HERE
Jim HERE
;
RUN;
PROC SORT DATA=forum3;
BY contact;
RUN;
PROC SQL NOPRINT;
SELECT COUNT(*) INTO :n FROM INPUTF;
QUIT;
%MACRO COMBINE(N);
DATA ALLIN;
MERGE forum1-forum&n.;
BY contact;
RUN;
%MEND COMBINE;
%COMBINE;
PROC PRINT DATA=ALLIN;
The code however, works fine when i used a %LET statement as follows:
%let n=3;
DATA ALLIN;
MERGE forum1-forum&n.;
BY contact;
RUN;
PROC PRINT DATA=ALLIN;
The problem is I won't know how many forums are there, and I prefer that the number be based on the input file.
Any help is appreciated! Thanks!
Macro variable scope.
You've created a macro variable N that exists in the global table. When you create the macro, it takes a parameter, also called N which is local and undefined because you didn't pass a valid parameter.
Call your macro with the created parameter N or move the proc SQL into the macro.
%COMBINE(&N);
OR
%MACRO COMBINE;
PROC SQL NOPRINT;
SELECT COUNT(*) INTO :n FROM INPUTF;
QUIT;
DATA ALLIN;
MERGE forum1-forum&n.;
BY contact;
RUN;
%MEND COMBINE;
%COMBINE;
OR
If you only have tables that start with FORUM that you're trying to merge:
DATA ALLIN;
MERGE FORUM: ;
BY contact;
RUN;
So if your dataset InputF has the list of datasets that you want to merge, then put that list into a macro variable. If you always have at least two datasets then no macro logic is required.
proc sql noprint ;
select dsn into :dsnlist separated by ' '
from inputf;
quit;
data allin;
merge &dsnlist ;
by contact;
run;
To handle the case when you have 0 or 1 dataset name in the list you would need to add macro logic. When there is just one you need to use SET instead of MERGE. You could handle that with the IFC() function.
data allin;
%sysfunc(ifc(1=&sqlobs,set,merge)) &dsnlist ;
by contact;
run;

Is it possible to merge two datasets where a variable's value in the first is used to select a variable in the second?

I would like to know how to merge two datasets in SAS using a variable's value in the first dataset to select and test a variable in the second dataset.
As an example consider two datasets. The first dataset contains four baby names and the days they were born. The second data set contains three doctors and an array of indicator variables noting if each doctor worked on a particular day. For example Dr. Smith worked on days 2 and 3 only. I would like to create a dataset that lists all the possible baby-doctor combinations where the doctor was working on the day the baby was born.
data babies;
input baby_name $ birth_day;
datalines;
Jake 1
Sonny 4
North 5
Apple 6
;
run;
data doctors;
input DrLastname $ day1 day2 day3 day4 day5 day6;
datalines;
Jones 1 0 0 1 1 1
Smith 0 1 1 0 0 0
Lewis 1 1 1 0 0 0
;
run;
The solution seems like it should be something like this
proc sql;
create table merged as
select babies.*, doctors.*
from babies, doctors
where doctors.day(babies.birth_day) = 1; *<--- incorrect;
quit;
The output should be:
baby_name birth_day DrLastName
Jake 1 Jones
Jake 1 Lewis
Sonny 4 Jones
North 5 Jones
Apple 6 Jones
I have run into this problem a few times and would love to know if this is kind of merge is possible in SAS. Thanks for any help you can provide.
While I probably would also transpose the dataset, it is possible to do so without transposing.
data babies_doctors;
set babies;
do _i = 1 to nobs_doctors;
set doctors point=_i nobs=nobs_doctors;
array days day1-day6;
if days[birth_Day] then output;
end;
run;
This will not be fast, as it checks all rows in the dataset, but it's possible.
Fastest is probably to load it into a vertical hash table (which you could do easily) or a temporary array.
data babies_doctors_array;
array drnames[32767] $80 _temporary_;
array drdays[32767,6] _temporary_;
if _n_=1 then do;
do _i = 1 to nobs_doctors;
set doctors point=_i nobs=nobs_doctors;
array days day1-day6;
drnames[_i]=DrLastname;
do _j = 1 to dim(days);
drdays[_i,_j]=days[_j];
end;
end;
end;
set babies;
do _k = 1 to nobs_doctors;
if drdays[_k,birth_day]=1 then do;
baby_drlastname = drnames[_k];
output;
end;
end;
run;
I might shift the second dataset and then merge on day.
Something like (in untested pseudo code):
data new_1-new_6;
set doctor;
array day_1-day_6 day_{6}
for i in 1 to 6:
if day_{i} = 1 then do;
day = i;
output new_{i};
end;
end;
run;
data stacked;
set day_1-day_6;
run;
Then simply merge based on the field day.