SAS: Hash Object Lookup within the same database - hash

This question is a follow-up to my previous question. Beside doing merges using hash objects, I am struggling with Hash Objects when it comes to do a lookup within the same database. I have this database where there is a continuous update to the order_number of a client:
Client Order_number New_number
XYZ 1000 1001
1001 1002
ABC 1006 1009
1009 1017
SST 1010 1011
1017 1020
1020 1030
1011 1050
Similarly to my previous question, I need the following:
Client Order_number New_number
XYZ 1000 1001
XYZ 1001 1002
ABC 1006 1009
ABC 1009 1017
SST 1010 1011
ABC 1017 1020
ABC 1020 1030
SST 1011 1050
In other words, when the client name is missing, I use the order_number to match with a previous new_number to find the client.
The orders are first sorted by order_number and then new_number.
I am trying to achieve the code with some changes to the code posted in my previous question but with no success.

This should work if your data is a SAS dataset. This example rewrites to the existing dataset. What is does is filter on all observations where the client is known and then loop through the chain of ordernumbers using a hash while the client is empty.
data orders (keep=c o n rename=(c=client o=order_number n=new_number));
length client $8 order_number 8 new_number 8;
* declare hash object;
if _n_ = 1 then do;
declare hash h(dataset:'orders');
h.definekey('order_number');
h.definedata('client','new_number');
h.definedone();
call missing(client, order_number, new_number);
end;
* set statement with rename of original column names;
set orders (rename=(order_number=o new_number=n client=c) where=(c ne ''));
* find in hash ;
rc = h.find(key:n);
* write first observation;
output;
* do loop through chain of order numbers while client is empty;
do while (rc = 0 and client = '');
* update values of output dataset;
o = n;
n = new_number;
rc = h.find(key:n);
* write current observation;
output;
end;
run;

Related

SAS--How to identify the event occurred in three consecutive years?

I am trying to identify traders who place transactions in the same month in each of three consecutive years in one company. Once a trader meets the criteria, these three transactions and all his subsequent transactions in that same month in that company should be identified.
Assume I have a sample data below.
data have;
input ID STOCK trandate $12.;
datalines;
1 1 10/15/2009
1 1 01/01/2010
1 1 01/10/2011
1 1 01/15/2012
1 1 01/01/2013
1 2 01/30/2011
1 2 01/30/2012
1 2 01/30/2012
1 2 01/30/2013
1 2 01/30/2014
1 2 01/30/2015
2 1 01/20/2010
2 1 01/15/2011
2 1 01/16/2012
2 1 02/01/2013
2 2 02/01/2010
2 2 02/10/2011
2 2 02/10/2012
2 2 02/10/2013
2 2 02/10/2014
2 2 01/10/2015
;
run;
What I need:
ID Stock trandate type
1 1 10/15/2009 0
1 1 01/01/2010 1
1 1 01/10/2011 1
1 1 01/15/2012 1
1 1 01/01/2013 1
1 2 01/30/2011 1
1 2 01/30/2012 1
1 2 01/30/2012 1
1 2 01/30/2013 1
1 2 01/30/2014 1
1 2 01/30/2015 1
2 1 01/20/2010 0
2 1 01/15/2011 0
2 1 01/16/2012 0
2 1 02/01/2013 0
2 2 02/01/2010 1
2 2 02/10/2011 1
2 2 02/10/2012 1
2 2 02/10/2013 1
2 2 02/10/2014 1
2 2 01/10/2015 0
I used following code to achieve this:
proc sort data=have;
by id stock trandate;
run;
data have;
set have;
month=month(trandate);
year=year(trandate);
run;
proc sort data=have;
by id stock month year;
run;
data have;
set have;
by personid secid month year;
rungroup + (first.month or not first.month and year - lag(year) > 1);
run;
data temp;
do index = 1 by 1 until (last.rungroup);
set have;
by rungroup;
* distinct number of years in rungroup;
years_runlength = sum (years_runlength, first.rungroup or year ne lag(year));
end;
do index = 1 to index;
set have;
if years_runlength >=4 then output;
end;
run;
The above codes are used to identify traders with transactions in the past three consecutive years. Since I also need the subsequent transactions of these traders. The following codes are further applied.
proc sort data=temp;
by personid secid rungroup;
run;
data temp;
set temp;
by rungroup;
if first.rungroup then fyear=year;
run;
data temp(drop=fyear rename=(Locf=fyear));
do until (last.personid);
set temp;
by id stock;
locf=coalesce(fyear,locf);
output;
end;
run;
data temp;
set temp;
by rungroup;
if first.rungroup then fmonth=month;
run;
data temp;
set temp;
gap=year-fyear;
run;
proc means data=temp;
var gap;
run;
data temp;
set temp;
if gap=3 then type2=1;
type1=1;
run;
The above codes are used to mark the first transaction after the three consecutive years. In this context, when the identified transactions combine with the original dataset, all transactions in that same month below the marked transaction could be identified. Thereby, I can achieve the objective that "these three transactions and all his subsequent transactions in that same month in that company should be identified". The following codes are used to achieve this.
proc sort data=have;
by id stock rungroup;
run;
proc sort data=temp;
by id stock rungroup;
run;
data combine;
merge have temp;
by id stock rungroup;
run;
data combine;
set combine;
month=month(trandate);
run;
data combine1 (drop=fmonth rename=(Locf=fmonth));
do until (last.personid );
set combine;
by id stock;
locf=coalesce(fmonth,locf);
output;
end;
run;
data combine2 (drop=type2 rename=(Locf=type2));
do until (last.personid);
set combine1;
by id stock;
locf=coalesce(type2,locf);
output;
end;
run;
data combine2;
set combine2;
if month^=fmonth then type2=.;
run;
data combine2;
set combine2;
if type1=1 or type2=1 then type=1;
else type=0;
run;
I tried these codes, the results looks right, but I cannot 100% sure. Additionally, as you can see, my codes are relative long and complex. So could anyone give me some suggestions about the code?
Here is a bit of brute force way. For this example I just limited it to the years 2009 to 2015 in your example, but you could just expand the pattern to allow more years. You could use macro logic to generate the wallpaper aspects of the code.
First generate an array you can index by YEAR and MONTH and populate the variables with 1 when the month it represents has a trade. Then check if the series of values for the same month across the years ever has three 1's in a row. You can use two DOW loops to process the data. The first one populates the array and the second tests the array and sets the new flag variable.
data want ;
do until(last.stock) ;
set have ;
by id stock;
array months [1:12,2009:2015]
m1y2009-m1y2015 m2y2009-m2y2015 m3y2009-m3y2015 m4y2009-m4y2015
m5y2009-m5y2015 m6y2009-m6y2015 m7y2009-m7y2015 m8y2009-m8y2015
m9y2009-m9y2015 m10y2009-m10y2015 m11y2009-m11y2015 m12y2009-m12y2015
;
months[month(trandate),year(trandate)]=1;
end;
do until(last.stock);
set have;
by id stock;
select (month(trandate));
when (1) flag=0 ne index(cats(of m1y:),'111');
when (2) flag=0 ne index(cats(of m2y:),'111');
when (3) flag=0 ne index(cats(of m3y:),'111');
when (4) flag=0 ne index(cats(of m4y:),'111');
when (5) flag=0 ne index(cats(of m5y:),'111');
when (6) flag=0 ne index(cats(of m6y:),'111');
when (7) flag=0 ne index(cats(of m7y:),'111');
when (8) flag=0 ne index(cats(of m8y:),'111');
when (9) flag=0 ne index(cats(of m9y:),'111');
when (10) flag=0 ne index(cats(of m10y:),'111');
when (11) flag=0 ne index(cats(of m11y:),'111');
when (12) flag=0 ne index(cats(of m12y:),'111');
otherwise ;
end;
output;
end;
drop m: ;
run;

T-SQL: Combining rows based on another table

I am seeking to alter a table content based on information of another table using a stored procedure. To make my point (and dodge my rusty English skills) I created the following simplification.
I have a table with fragment amounts of the form
SELECT * FROM [dbo].[obtained_fragments] ->
fragment amount
22 42
76 7
101 31
128 4
177 22
212 6
and a table that lists all possible combinations to combine these fragments to other fragments.
SELECT * FROM [dbo].[possible_combinations] ->
fragment consists_of_f1 f1_amount_needed consists_of_f2 f2_amount_needed
1001 128 1 22 3
1004 151 1 101 12
1012 128 1 177 6
1047 212 1 76 4
My aim is to alter the first table so that all possible fragment combinations are performed, leading to
SELECT * FROM [dbo].[obtained_fragments] ->
fragment amount
22 30
76 3
101 31
177 22
212 5
1001 4
1047 1
In words, combined fragments are added to the table based on [dbo].[possible_combinations], and the amount of needed fragments is reduced. Depleted fragments are removed from the table.
How do I achieve this fragment transformation in an easy way? I started writing a while loop, checking if sufficient fragments are available, inside of a for loop, interating through the fragment numbers. However, I am unable to come up with a functional amount check and begin to wonder if this is even possible in T-SQL this way.
The code doesn't have to be super efficient since both tables will always be smaller than 200 rows.
It is important to note that it doesn't matter which combinations are created.
It might come in handy that [f1_amount_needed] always has a value of 1.
UPDATE
Using the solution of iamdave, which works perfectly fine as long I don't touch it, I receive the following error message:
Column name or number of supplied values does not match table definition.
I barely changed anything really. Is there a chance that using existing tables with more than the necessary columns instead of declaring the tables (as iamdave did) makes this difference?
DECLARE #t TABLE(Binding_ID int, Exists_of_Binding_ID_2 int, Exists_of_Pieces_2 int, Binding1 int, Binding2 int);
WHILE 1=1
BEGIN
DELETE #t
INSERT INTO #t
SELECT TOP 1
k.Binding_ID
,k.Exists_of_Binding_ID_2
,k.Exists_of_Pieces_2
,g1.mat_Binding_ID AS Binding1
,g2.mat_Binding_ID AS Binding2
FROM [dbo].[vwCombiBinding] AS k
JOIN [leer].[sandbox5] AS g1
ON k.Exists_of_Binding_ID_1 = g1.mat_Binding_ID AND g1.Amount >= 1
JOIN [leer].[sandbox5] AS g2
ON k.Exists_of_Binding_ID_2 = g2.mat_Binding_ID AND g2.Amount >= k.Exists_of_Pieces_2
ORDER BY k.Binding_ID
IF (SELECT COUNT(1) FROM #t) = 1
BEGIN
UPDATE g
SET Amount = g.Amount +1
FROM [leer].[sandbox5] AS g
JOIN #t AS t
ON g.mat_Binding_ID = t.Binding_ID
INSERT INTO [leer].[sandbox5]
SELECT
t.Binding_ID
,1
FROM #t AS t
WHERE NOT EXISTS (SELECT NULL FROM [leer].[sandbox5] AS g WHERE g.mat_Binding_ID = t.Binding_ID);
UPDATE g
SET Amount = g.Amount - 1
FROM [leer].[sandbox5] AS g
JOIN #t AS t
ON g.mat_Binding_ID = t.Binding1
UPDATE g
SET Amount = g.Amount - t.Exists_of_Pieces_2
FROM [leer].[sandbox5] AS g
JOIN #t AS t
ON g.mat_Binding_ID = t.Binding2
END
ELSE
BREAK
END
SELECT * FROM [leer].[sandbox5]
You can do this with a while loop that contains several statements to handle your iterative data updates. As you need to make changes based on a re-assessment of your data each iteration this has to be done in a loop of some kind:
declare #f table(fragment int,amount int);
insert into #f values (22 ,42),(76 ,7 ),(101,31),(128,4 ),(177,22),(212,6 );
declare #c table(fragment int,consists_of_f1 int,f1_amount_needed int,consists_of_f2 int,f2_amount_needed int);
insert into #c values (1001,128,1,22,3),(1004,151,1,101,12),(1012,128,1,177,6),(1047,212,1,76,4);
declare #t table(fragment int,consists_of_f2 int,f2_amount_needed int,fragment1 int,fragment2 int);
while 1 = 1
begin
-- Clear out staging area
delete #t;
-- Populate with the latest possible combination
insert into #t
select top 1 c.fragment
,c.consists_of_f2
,c.f2_amount_needed
,f1.fragment as fragment1
,f2.fragment as fragment2
from #c as c
join #f as f1
on c.consists_of_f1 = f1.fragment
and f1.amount >= 1
join #f as f2
on c.consists_of_f2 = f2.fragment
and f2.amount >= c.f2_amount_needed
order by c.fragment;
-- Update fragments table if a new combination can be made
if (select count(1) from #t) = 1
begin
-- Update if additional fragment
update f
set amount = f.amount + 1
from #f as f
join #t as t
on f.fragment = t.fragment;
-- Insert if a new fragment
insert into #f
select t.fragment
,1
from #t as t
where not exists(select null
from #f as f
where f.fragment = t.fragment
);
-- Update fragment1 amounts
update f
set amount = f.amount - 1
from #f as f
join #t as t
on f.fragment = t.fragment1;
-- Update fragment2 amounts
update f
set amount = f.amount - t.f2_amount_needed
from #f as f
join #t as t
on f.fragment = t.fragment2;
end
else -- If no new combinations possible, break the loop
break
end;
select *
from #f;
Output:
+----------+--------+
| fragment | amount |
+----------+--------+
| 22 | 30 |
| 76 | 3 |
| 101 | 31 |
| 128 | 0 |
| 177 | 22 |
| 212 | 5 |
| 1001 | 4 |
| 1047 | 1 |
+----------+--------+

proc transpose before merge? dealing with medications in claim database

I am dealing with medications in claim database. To make this easier to understand, lets take the following as an example:
patients id dx1
1 224
2 323
3 432
4 423
dataset 2
patients id date med_id
1 10/12/2005 54678
1 01/2/2005 09849
1 05/04/2004
1
2
2
2
3
4
4
4
4
My question is regarding merging the two datasets. The first one has one observation per id, the second one can have from 1-200 or more per id. What is the best way to combine both data, would you transpose before joining the two datasets?
This will be a full outer join - no row will be deleted from either side.
Proc sort data=d1 ;
By patient_id ;
Run ;
Proc sort data=d2 ;
By patient_id ;
Run ;
Data d3 ;
Merge d1 d2 ;
By patient_id ;
Run ;
If you want a left outer join - all rows from d1 and only their match, if any, from d2 - then use this data step instead.
Data d3 ;
Merge d1 (in=in1) d2 ;
By patient_id ;
If in1 ;
Run ;

SQL - Pivot tables with 3 cross tabs

I have the raw data looks like
Class Cert Name Benefit Coverage
-------------------------------
1 1001 ABC EHC Family
1 1001 ABC DEN Family
2 1002 XYZ EHC Single
2 1002 XYZ DEN Single
3 1003 LMN EHC Couple
3 1003 LMN DEN Couple.
I want the final output to look like
**Class** **Benefit**
EHC-Single EHC-Couple EHC-Family DEN-Single DEN-Couple DEN-Family
1 1 1
2 1 1
3 1 1
Values below the columns are count of Certificates.
yes you can do it like below. See in SQL Fiddle
;WITH CTE
AS (SELECT COUNT(*) Counts,
Class,
Benefit + '-' + Coverage AS [Benefits]
FROM ##MyTemp
GROUP BY Class,
Benefit,
Coverage)
SELECT Class,
[EHC-Single],
[EHC-Couple],
[EHC-Family],
[DEN-Single],
[DEN-Couple],
[DEN-Family]
FROM CTE
PIVOT(MAX(Counts)
FOR [Benefits] IN ([EHC-Single],
[EHC-Couple],
[EHC-Family],
[DEN-Single],
[DEN-Couple],
[DEN-Family])) AS TempList;

SAS: merge in a do command

Say that I have the two following one row datasets:
data have_1;
input message $ order_num time price qty;
datalines;
A 3 34199 10 500
run;
data have_2;
input message $ order_num time delete_qty ;
datalines;
B 2 34200 100
run;
I have another dataset that aggregates previous order_numbers.
data total;
input order_num time price qty;
datalines;
1 34197 11 550
2 34198 10.5 450
run;
My objective is that I need to update the dataset total with the dataset have_1 and have_2 in a loop. When I start with have_1, a message=A implies that I have to update the dataset total by simply adding a new order to the total dataset. I must keep track the changes in the total datasets Hence the dataset total should look like this:
order_num time price qty id;
1 34197 11 550 1
2 34198 10.5 450 1
3 34199 10 500 1
Then, the dataset total needs to be updated with the dataset have_2 where message=B implies that there is an update the qty to an order_num that is already in the the total datasets. I have to update the order_num=2 by removing some of the qty. Hence, the total dataset should look like this:
order_num time price qty id;
1 34197 11 550 2
2 34198 10.5 350 2
3 34199 10 500 2
I have more than 1000 have_ datasets which corresponds to each row in a another datasets.
What's important is that I need to keep track of the changes in total for every messages with an id. Assuming that I have only have_1 and have_2, then here's my tentative code:
%macro loop()
%do i=1 %to 2;
data total_temp;
set total; run;
data total_temp;
set have_&i;
if msg_type='A' then do;
set total have_&i;
drop message;
id=&i;
end;
if msg_type='B' then do;
merge total have_&i;
by order_num;
drop message;
qty=qty-delete_qty;
drop delete_qty;
id=&i
end;
run;
data total; set total_temp; run;
%end;
%mend;
%loop();
This code, say after the first loop, keeps only one line which corresponds to what's in have_1. Hence, can we use a merge and a set command in a then do? What's the proper code that I have to use?
The final datasets should look like this:
order_num time price qty id;
1 34197 11 550 1
2 34198 10.5 450 1
3 34199 10 500 1
1 34197 11 550 2
2 34198 10.5 350 2
3 34199 10 500 2
You don't need to do this in a macro. You CAN use a macro, but it will be slower. Try this:
data have_1;
input message $ order_num time price qty;
datalines;
A 3 34199 10 500
run;
data have_2(index=(order_num));
input message $ order_num time delete_qty ;
datalines;
B 2 34200 100
run;
data total(index=(order_num));
input order_num time price qty;
datalines;
1 34197 11 550
2 34198 10.5 450
run;
/*First, add new orders*/
proc append base=total data=have_1(where=(message="A")) force;
run;
/*Now update for the deletions*/
data total;
modify total have_2(where=(message="B"));
by order_num;
qty = sum(qty,-delete_qty);
drop message delete_qty;
run;
Append the new order to the total data set with PROC APPEND. This maintains the index and allows you to do the update through the MODIFY statement.
This could be done through two modify statements, though I find adding the new records through append to be clearer.