Say that I have the two following one row datasets:
data have_1;
input message $ order_num time price qty;
datalines;
A 3 34199 10 500
run;
data have_2;
input message $ order_num time delete_qty ;
datalines;
B 2 34200 100
run;
I have another dataset that aggregates previous order_numbers.
data total;
input order_num time price qty;
datalines;
1 34197 11 550
2 34198 10.5 450
run;
My objective is that I need to update the dataset total with the dataset have_1 and have_2 in a loop. When I start with have_1, a message=A implies that I have to update the dataset total by simply adding a new order to the total dataset. I must keep track the changes in the total datasets Hence the dataset total should look like this:
order_num time price qty id;
1 34197 11 550 1
2 34198 10.5 450 1
3 34199 10 500 1
Then, the dataset total needs to be updated with the dataset have_2 where message=B implies that there is an update the qty to an order_num that is already in the the total datasets. I have to update the order_num=2 by removing some of the qty. Hence, the total dataset should look like this:
order_num time price qty id;
1 34197 11 550 2
2 34198 10.5 350 2
3 34199 10 500 2
I have more than 1000 have_ datasets which corresponds to each row in a another datasets.
What's important is that I need to keep track of the changes in total for every messages with an id. Assuming that I have only have_1 and have_2, then here's my tentative code:
%macro loop()
%do i=1 %to 2;
data total_temp;
set total; run;
data total_temp;
set have_&i;
if msg_type='A' then do;
set total have_&i;
drop message;
id=&i;
end;
if msg_type='B' then do;
merge total have_&i;
by order_num;
drop message;
qty=qty-delete_qty;
drop delete_qty;
id=&i
end;
run;
data total; set total_temp; run;
%end;
%mend;
%loop();
This code, say after the first loop, keeps only one line which corresponds to what's in have_1. Hence, can we use a merge and a set command in a then do? What's the proper code that I have to use?
The final datasets should look like this:
order_num time price qty id;
1 34197 11 550 1
2 34198 10.5 450 1
3 34199 10 500 1
1 34197 11 550 2
2 34198 10.5 350 2
3 34199 10 500 2
You don't need to do this in a macro. You CAN use a macro, but it will be slower. Try this:
data have_1;
input message $ order_num time price qty;
datalines;
A 3 34199 10 500
run;
data have_2(index=(order_num));
input message $ order_num time delete_qty ;
datalines;
B 2 34200 100
run;
data total(index=(order_num));
input order_num time price qty;
datalines;
1 34197 11 550
2 34198 10.5 450
run;
/*First, add new orders*/
proc append base=total data=have_1(where=(message="A")) force;
run;
/*Now update for the deletions*/
data total;
modify total have_2(where=(message="B"));
by order_num;
qty = sum(qty,-delete_qty);
drop message delete_qty;
run;
Append the new order to the total data set with PROC APPEND. This maintains the index and allows you to do the update through the MODIFY statement.
This could be done through two modify statements, though I find adding the new records through append to be clearer.
Related
I am trying to identify traders who place transactions in the same month in each of three consecutive years in one company. Once a trader meets the criteria, these three transactions and all his subsequent transactions in that same month in that company should be identified.
Assume I have a sample data below.
data have;
input ID STOCK trandate $12.;
datalines;
1 1 10/15/2009
1 1 01/01/2010
1 1 01/10/2011
1 1 01/15/2012
1 1 01/01/2013
1 2 01/30/2011
1 2 01/30/2012
1 2 01/30/2012
1 2 01/30/2013
1 2 01/30/2014
1 2 01/30/2015
2 1 01/20/2010
2 1 01/15/2011
2 1 01/16/2012
2 1 02/01/2013
2 2 02/01/2010
2 2 02/10/2011
2 2 02/10/2012
2 2 02/10/2013
2 2 02/10/2014
2 2 01/10/2015
;
run;
What I need:
ID Stock trandate type
1 1 10/15/2009 0
1 1 01/01/2010 1
1 1 01/10/2011 1
1 1 01/15/2012 1
1 1 01/01/2013 1
1 2 01/30/2011 1
1 2 01/30/2012 1
1 2 01/30/2012 1
1 2 01/30/2013 1
1 2 01/30/2014 1
1 2 01/30/2015 1
2 1 01/20/2010 0
2 1 01/15/2011 0
2 1 01/16/2012 0
2 1 02/01/2013 0
2 2 02/01/2010 1
2 2 02/10/2011 1
2 2 02/10/2012 1
2 2 02/10/2013 1
2 2 02/10/2014 1
2 2 01/10/2015 0
I used following code to achieve this:
proc sort data=have;
by id stock trandate;
run;
data have;
set have;
month=month(trandate);
year=year(trandate);
run;
proc sort data=have;
by id stock month year;
run;
data have;
set have;
by personid secid month year;
rungroup + (first.month or not first.month and year - lag(year) > 1);
run;
data temp;
do index = 1 by 1 until (last.rungroup);
set have;
by rungroup;
* distinct number of years in rungroup;
years_runlength = sum (years_runlength, first.rungroup or year ne lag(year));
end;
do index = 1 to index;
set have;
if years_runlength >=4 then output;
end;
run;
The above codes are used to identify traders with transactions in the past three consecutive years. Since I also need the subsequent transactions of these traders. The following codes are further applied.
proc sort data=temp;
by personid secid rungroup;
run;
data temp;
set temp;
by rungroup;
if first.rungroup then fyear=year;
run;
data temp(drop=fyear rename=(Locf=fyear));
do until (last.personid);
set temp;
by id stock;
locf=coalesce(fyear,locf);
output;
end;
run;
data temp;
set temp;
by rungroup;
if first.rungroup then fmonth=month;
run;
data temp;
set temp;
gap=year-fyear;
run;
proc means data=temp;
var gap;
run;
data temp;
set temp;
if gap=3 then type2=1;
type1=1;
run;
The above codes are used to mark the first transaction after the three consecutive years. In this context, when the identified transactions combine with the original dataset, all transactions in that same month below the marked transaction could be identified. Thereby, I can achieve the objective that "these three transactions and all his subsequent transactions in that same month in that company should be identified". The following codes are used to achieve this.
proc sort data=have;
by id stock rungroup;
run;
proc sort data=temp;
by id stock rungroup;
run;
data combine;
merge have temp;
by id stock rungroup;
run;
data combine;
set combine;
month=month(trandate);
run;
data combine1 (drop=fmonth rename=(Locf=fmonth));
do until (last.personid );
set combine;
by id stock;
locf=coalesce(fmonth,locf);
output;
end;
run;
data combine2 (drop=type2 rename=(Locf=type2));
do until (last.personid);
set combine1;
by id stock;
locf=coalesce(type2,locf);
output;
end;
run;
data combine2;
set combine2;
if month^=fmonth then type2=.;
run;
data combine2;
set combine2;
if type1=1 or type2=1 then type=1;
else type=0;
run;
I tried these codes, the results looks right, but I cannot 100% sure. Additionally, as you can see, my codes are relative long and complex. So could anyone give me some suggestions about the code?
Here is a bit of brute force way. For this example I just limited it to the years 2009 to 2015 in your example, but you could just expand the pattern to allow more years. You could use macro logic to generate the wallpaper aspects of the code.
First generate an array you can index by YEAR and MONTH and populate the variables with 1 when the month it represents has a trade. Then check if the series of values for the same month across the years ever has three 1's in a row. You can use two DOW loops to process the data. The first one populates the array and the second tests the array and sets the new flag variable.
data want ;
do until(last.stock) ;
set have ;
by id stock;
array months [1:12,2009:2015]
m1y2009-m1y2015 m2y2009-m2y2015 m3y2009-m3y2015 m4y2009-m4y2015
m5y2009-m5y2015 m6y2009-m6y2015 m7y2009-m7y2015 m8y2009-m8y2015
m9y2009-m9y2015 m10y2009-m10y2015 m11y2009-m11y2015 m12y2009-m12y2015
;
months[month(trandate),year(trandate)]=1;
end;
do until(last.stock);
set have;
by id stock;
select (month(trandate));
when (1) flag=0 ne index(cats(of m1y:),'111');
when (2) flag=0 ne index(cats(of m2y:),'111');
when (3) flag=0 ne index(cats(of m3y:),'111');
when (4) flag=0 ne index(cats(of m4y:),'111');
when (5) flag=0 ne index(cats(of m5y:),'111');
when (6) flag=0 ne index(cats(of m6y:),'111');
when (7) flag=0 ne index(cats(of m7y:),'111');
when (8) flag=0 ne index(cats(of m8y:),'111');
when (9) flag=0 ne index(cats(of m9y:),'111');
when (10) flag=0 ne index(cats(of m10y:),'111');
when (11) flag=0 ne index(cats(of m11y:),'111');
when (12) flag=0 ne index(cats(of m12y:),'111');
otherwise ;
end;
output;
end;
drop m: ;
run;
I am stuck with a problem where I have two tables, one at the months and one at the weeks. Here's the format of the tables:
Table1
Customer Date1 Sales
1 Jan2018 1110
1 Feb2018 1245
1 Mar2018 1320
1 Apr2018 1100
...
Table2
Customer Date2
1 01Jan2018
1 08Jan2018
1 15Jan2018
1 22Jan2018
1 29Jan2018
1 05Feb2018
1 12Feb2018
1 19Feb2018
1 26Feb2018
1 05Mar2018
...
I want to create a new column for sales in Table2 that will hold the disaggregated values of sales from Table1. I want to divide the sales by the number of days in that month and then assign the values to the weeks accordingly. Thus the sales in week 01Jan2018 is (1110/31)*7. The weeks that are in transition will get values from both the months. For example 29Jan2018 has 3 days in Jan2018 and 4 days in Feb2018. The sales of one day in Jan2018 is 1110/31 and the sales of one day in Feb2018 is 1245/28.
So the sales in week 29Jan2018 will be 3*(1110/31) + 4*(1245/28)
I want to do this for each distinct customer.
The resulting table should be
Result Table
Customer Date Sales
1 01Jan2018 250.6 i.e (1110/31)*7
1 08Jan2018 250.6
1 15Jan2018 250.6
1 22Jan2018 250.6
1 29Jan2018 282.27
1 05Feb2018 311.25
1 12Feb2018 311.25
1 19Feb2018 311.25
1 26Feb2018 133.39 + 170.32
Thanks!
In DATA Step programming you will be needing some 'FORWARD' data instead of some 'LAG' data. A forward value can be emulated by creating a view to the same data starting one observation forward (obs=2). After understanding the renaming semantics, it is only a matter of some easy 'bookkeeping'.
data customer_months;
attrib Customer length=8 Date1 informat=monyy. format=monyy7.; input
Customer Date1 Sales; datalines;
1 Jan2018 1110
1 Feb2018 1245
1 Mar2018 1320
1 Apr2018 1100
run;
* week data, also with computation for month the week is in;
data customer_weeks;
attrib Customer length=8 Date2 informat=date9. format=date9.; input
Customer Date2;
Date1 = intnx('month', Date2, 0);
datalines;
1 01Jan2018
1 08Jan2018
1 15Jan2018
1 22Jan2018
1 29Jan2018
1 05Feb2018
1 12Feb2018
1 19Feb2018
1 26Feb2018
1 05Mar2018
run;
* next months sales keyed on prior month value;
data customer_next_months_view / view=customer_next_months_view;
set customer_months;
Date1 = intnx('month',Date1,-1); * the month this record will be a forward for;
rename Sales=Sales_next_month;
if _n_ > 1;
run;
* merge original and forward data, rename for making clear the variable roles;
data combined;
length disag_sales 8;
merge
customer_months (rename=Sales=Sales_this_month)
customer_next_months_view
customer_weeks
;
by Date1;
days_in_this_month = intck('day',intnx('month',Date1,0),intnx('month',Date1,1));
days_in_next_month = intck('day',intnx('month',Date1,1),intnx('month',Date1,2));
day_rate_this_month = Sales_this_month / days_in_this_month;
day_rate_next_month = Sales_next_month / days_in_next_month;
if Date2 then
if month(Date2) = month(Date2+6) then
week_days_this_month = 7;
else
week_days_this_month = intck('day', Date2, intnx('month', Date2, 1));
week_days_next_month = 7 - week_days_this_month;
dollars_this_week_this_month = week_days_this_month * day_rate_this_month;
dollars_this_week_next_month = week_days_next_month * day_rate_next_month;
* desired estimated disaggregated sales;
disag_sales = sum (dollars_this_week_this_month,dollars_this_week_next_month);
run;
How to write the below query in stored proc in postgresql?
create table data1 as
select A.*,
case when score >=940 then 1
when score between 600 and 746 then 2
when bureau_score between 599 and 630 then 4 else 5 end as score_level,
case when band between -1 and 5 then 1
when band between 6 and 20 then 2
when band between 21 and 35 then 3 else 4 end as band_level
from data A;
Postgresql doen't have stored procedures as such, only functions, so.
If it's simple SQL you can simply wrap in in an SQL function definition.
create or replace function foo () returns void language sql as $$
create table data1 as
select A.*,
case when score >=940 then 1
when score between 600 and 746 then 2
when bureau_score between 599 and 630 then 4 else 5 end as score_level,
case when band between -1 and 5 then 1
when band between 6 and 20 then 2
when band between 21 and 35 then 3 else 4 end as band_level
from data A;
$$;
To call it do SELECT foo();
I have the following dataset (items) with transactions on any date and amount paid on the next business day.
The amount paid for each id on the next business day is $10 for the ids whose rate is >5
My task is to compare the number of instances where rate > 5 are in line with amount paid on the next business day (This will have a standard code 121)
For instance, there are four instances with rate > 5 on 4/14/2017' - The amount$40 (4*10)is paid on4/17/2017`
Date id rate code batch
4/14/2017 1 12 100 A1
4/14/2017 1 2 101 A1
4/14/2017 1 13 101 A1
4/14/2017 1 10 100 A1
4/14/2017 1 10 100 A1
4/17/2017 1 40 121
4/20/2017 2 12 100 A1
4/20/2017 2 2 101 A1
4/20/2017 2 3 101 A1
4/20/2017 2 10 100 A1
4/20/2017 2 10 100 A1
4/21/2017 2 30 121
My code
proc sql;
create table items2 as select
count(id) as id_count,
(case when code='121' then rate/10 else 0 end) as rate_count
from items
group by date,id;
quit;
This has not yielded the desired result and the challenge I have here is to check the transaction dates (4/14/2017 and 4/20/2017) and next business day dates (4/17/2017,4/21/2017).
Appreciate your help.
LAG function will do the trick here. As we can use lagged values to create the condition we want without having to use the rate>5 condition.
Here is the solution:-
Data items;
set items;
Lag_dt=LAG(Date);
Lag_id=LAG(id);
Lag_rate=LAG(rate);
if ((id=lag_id) and (code=121) and (Date>lag_dt)) then rate_count=(rate/lag_rate);
else rate_count=0;
Drop lag_dt lag_id lag_rate;
run;
Hope this helps.
I am dealing with medications in claim database. To make this easier to understand, lets take the following as an example:
patients id dx1
1 224
2 323
3 432
4 423
dataset 2
patients id date med_id
1 10/12/2005 54678
1 01/2/2005 09849
1 05/04/2004
1
2
2
2
3
4
4
4
4
My question is regarding merging the two datasets. The first one has one observation per id, the second one can have from 1-200 or more per id. What is the best way to combine both data, would you transpose before joining the two datasets?
This will be a full outer join - no row will be deleted from either side.
Proc sort data=d1 ;
By patient_id ;
Run ;
Proc sort data=d2 ;
By patient_id ;
Run ;
Data d3 ;
Merge d1 d2 ;
By patient_id ;
Run ;
If you want a left outer join - all rows from d1 and only their match, if any, from d2 - then use this data step instead.
Data d3 ;
Merge d1 (in=in1) d2 ;
By patient_id ;
If in1 ;
Run ;