I am dealing with medications in claim database. To make this easier to understand, lets take the following as an example:
patients id dx1
1 224
2 323
3 432
4 423
dataset 2
patients id date med_id
1 10/12/2005 54678
1 01/2/2005 09849
1 05/04/2004
1
2
2
2
3
4
4
4
4
My question is regarding merging the two datasets. The first one has one observation per id, the second one can have from 1-200 or more per id. What is the best way to combine both data, would you transpose before joining the two datasets?
This will be a full outer join - no row will be deleted from either side.
Proc sort data=d1 ;
By patient_id ;
Run ;
Proc sort data=d2 ;
By patient_id ;
Run ;
Data d3 ;
Merge d1 d2 ;
By patient_id ;
Run ;
If you want a left outer join - all rows from d1 and only their match, if any, from d2 - then use this data step instead.
Data d3 ;
Merge d1 (in=in1) d2 ;
By patient_id ;
If in1 ;
Run ;
Related
I am interested in generating a completely (damaged) randomized data where observations are selected randomly (with replacement) for each field and then combined. I will need to generate a new dummy id to represent the old id as I don't want to reconstruct the data. My goal is to create a simulated column-wise random dataset.
Here is a sample data:
Id Col1 Col2 Col3
11 A 0.01 David
12 B 0.04 Max
13 C 0.05 Tom
14 E 0.06 West
15 C 0.02 Mike
What I am interested in is something like this:
Id2 Col1 Col2 Col3
1 E 0.04 Mike
2 C 0.06 David
3 B 0.02 West
4 A 0.04 Tom
5 C 0.05 Max
I am looking for an organized way of doing this. Here is what I attempted so far but am not interested in doing many times over since I have a lot of columns in the real data.
proc sql;
create table newtable1 as
select monotonic() as id2, col1 from
(select col1 from Table1 order by ranuni(0));
quit;
Using the above code you generate separate random columns and then combine them using the new monotonic key.
I have to calculate date difference between first date at time = 0 and the dates after. I also have one variable = factor which has 2 categories : one ; two.
For example, here is one date :
A B TIME
10/11/2016 one T0
17/11/2016 two T0
05/01/2017 one T1
28/02/2017 two T1
06/07/2017 one T2
05/09/2017 two T2
I would like to calculate the difference between T0 and the dates for B="one" and B="two" in order to obtain :
DIFF
0
0
56
103
238
292
Calculating the diff as follows :
56 = T1-T0 for "one" = 05/01/2017 - 10/11/2016
103 = T1-T0 for "two" = 28/02/2017 - 17/11/2016
238 = T2-T0 for "one" = 06/07/2017 - 10/11/2016
292 = T2-T0 for "two" = 05/07/2017 - 17/11/2016
Could you help me do it in SAS?
Thanks a lot.
One way is to pull out the TIME='T0' records and merge them back with the other records.
First let's convert your table into a dataset.
data have ;
input b $ Time $ date :yymmdd.;
format date yymmdd10.;
cards;
one T0 2016-11-10
two T0 2016-11-17
one T1 2017-01-05
two T1 2017-02-28
one T2 2017-07-06
two T2 2017-09-05
;
Now let's re-order it so that we can merge by the grouping variable, B.
proc sort ;
by b time ;
run;
Here is a way to merge the data with itself.
data want ;
merge have(where=(time ne 'T0'))
have(keep=time b date rename=(time=time0 date=date0) where=(time0='T0'))
;
by b ;
diff = date - date0;
drop time0;
run;
Results:
Obs b Time date date0 diff
1 one T1 2017-01-05 2016-11-10 56
2 one T2 2017-07-06 2016-11-10 238
3 two T1 2017-02-28 2016-11-17 103
4 two T2 2017-09-05 2016-11-17 292
There are of course several ways to do this. Below are two alternatives. The first selects the first A for each B and merges this with the original data in a SQL-step. The second uses a DATA-step and by groups. The first A within each B is saved as firsttime, and retained so it can be used to calculate the difference.
data test;
input A ddmmyy10. #12 B $3.;
format A ddmmyy10.;
datalines;
10/11/2016 one
17/11/2016 two
05/01/2017 one
28/02/2017 two
06/07/2017 one
05/09/2017 two
;
/* Alt 1*/
proc sql;
create table test2 as
select t1.*, t1.A-t2.A as time
from test as t1 left join (select B, min(A) as A from test group by 1) as t2
on t1.B=t2.B
order by A;
/* Alt 2*/
proc sort data=test;
by B A;
run;
data test3;
set test;
by B;
retain firsttime;
if first.B then firsttime=A;
time=A-firsttime;
drop firsttime;
run;
I have the following dataset (items) with transactions on any date and amount paid on the next business day.
The amount paid for each id on the next business day is $10 for the ids whose rate is >5
My task is to compare the number of instances where rate > 5 are in line with amount paid on the next business day (This will have a standard code 121)
For instance, there are four instances with rate > 5 on 4/14/2017' - The amount$40 (4*10)is paid on4/17/2017`
Date id rate code batch
4/14/2017 1 12 100 A1
4/14/2017 1 2 101 A1
4/14/2017 1 13 101 A1
4/14/2017 1 10 100 A1
4/14/2017 1 10 100 A1
4/17/2017 1 40 121
4/20/2017 2 12 100 A1
4/20/2017 2 2 101 A1
4/20/2017 2 3 101 A1
4/20/2017 2 10 100 A1
4/20/2017 2 10 100 A1
4/21/2017 2 30 121
My code
proc sql;
create table items2 as select
count(id) as id_count,
(case when code='121' then rate/10 else 0 end) as rate_count
from items
group by date,id;
quit;
This has not yielded the desired result and the challenge I have here is to check the transaction dates (4/14/2017 and 4/20/2017) and next business day dates (4/17/2017,4/21/2017).
Appreciate your help.
LAG function will do the trick here. As we can use lagged values to create the condition we want without having to use the rate>5 condition.
Here is the solution:-
Data items;
set items;
Lag_dt=LAG(Date);
Lag_id=LAG(id);
Lag_rate=LAG(rate);
if ((id=lag_id) and (code=121) and (Date>lag_dt)) then rate_count=(rate/lag_rate);
else rate_count=0;
Drop lag_dt lag_id lag_rate;
run;
Hope this helps.
I have a UDF which returns table variable like
--
--
RETURNS #ElementTable TABLE
(
ElementID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
ElementValue VARCHAR(MAX)
)
AS
--
--
Is the order of data in this table variable guaranteed to be same as the order data is inserted into it. e.g. if I issue
INSERT INTO #ElementTable(ElementValue) VALUES ('1')
INSERT INTO #ElementTable(ElementValue) VALUES ('2')
INSERT INTO #ElementTable(ElementValue) VALUES ('3')
I expect data will always be returned in that order when I say
select ElementValue from #ElementTable --Here I don't use order by
EDIT:
If order by is not guaranteed then the following query
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross Apply dbo.MyFunc T2
order by t1.elementid
will not produce 9x9 matrix as
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
consistently.
Is there any possibility that it could be like
1 2
1 1
1 3
2 3
2 2
2 1
3 1
3 2
3 3
How to do it using my above function?
No, the order is not guaranteed to be the same.
Unless, of course you are using ORDER BY. Then it is guaranteed to be the same.
Given your update, you obtain it in the obvious way - you ask the system to give you the results in the order you want:
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross join dbo.MyFunc() T2
order by t1.elementid, t2.elementid
You are guaranteed that if you're using inefficient single row inserts within your UDF, that the IDENTITY values will match the order in which the individual INSERT statements were specified.
Order is not guaranteed.
But if all you want is just simply to get your records back in the same order you inserted them, then just order by your primary key. Since you already have that field setup as an auto-increment, it should suffice.
...or use a deterministic function
SELECT TOP 9
M1 = (ROW_NUMBER() OVER(ORDER BY id) + 2) / 3,
M2 = (ROW_NUMBER() OVER(ORDER BY id) + 2) % 3 + 1
FROM
sysobjects
M1 M2
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
I want to insert a row number in a records like counting rows in a specific number of range. example output:
RowNumber ID Name
1 20 a
2 21 b
3 22 c
1 23 d
2 24 e
3 25 f
1 26 g
2 27 h
3 28 i
1 29 j
2 30 k
I rather to try using the rownumber() over (partition by order by column name) but my real records are not containing columns that will count into 1-3 rownumber.
I already try to loop each of record to insert a row count 1-3 but this loop affects the performance of the query. The query will use for the RDL report, that is why as much as possible the performance of the query must be good.
any suggestions are welcome. Thanks
have you tried modulo-ing rownumber()?
SELECT
((row_number() over (order by ID)-1) % 3) +1 as RowNumber
FROM table