SAS merge multiple columns with same data - merge

I need help with a merge. I have two tables as below-
Table 1
ID ID1 ID2 ID3 ID4 ID5
1005 2005 3005 4005 5005 7105
3005 4005 5005 7105
4005 5005 7105
5005 7105
2005 3005 4005 5005 7105
7105
Table 2
ID Names
1005 John
3005 Rick
4005 Sam
5005 Harry
2005 Mary
7105 Deena
I need an efficient way to merge with columns in Table 1 with Table 2. I can merge in separate datasteps but is there a way I can do it in more efficient way?
proc sql;
create merge1 as
select *
from table1 a
left join table2 b on a.id = b.id;
quit;
proc sql;
create merge2 as
select *
from merge1 a
left join table2 b on a.id = b.id;
quit;
Result I want with all columns (examplebelow):
ID NamesID ID1 NamesID1 ID2 NamesID2 ID3
1005 John 2005 Mary 3005 Rick 4005
3005 Rick 4005 Sam 5005 Harry 7105
4005 Sam 5005 Harry 7105 Deena
5005 Harry 7105 Deena
2005 Mary 3005 Rick 4005 Sam 5005
7105 Deena
Thanks!

Here is the format-based solution:
data table1;
length id id1 id2 id3 id4 id5 8;
infile datalines missover;
input id id1 id2 id3 id4 id5;
cards;
1005 2005 3005 4005 5005 7105
3005 4005 5005 7105
4005 5005 7105
5005 7105
2005 3005 4005 5005 7105
7105
;
run;
data table2;
length id 8 names $ 10;
input id names;
cards;
1005 John
3005 Rick
4005 Sam
5005 Harry
2005 Mary
7105 Deena
;
run;
* Create a CNTLIN data set defining the required format;
data fmt_in;
set table2;
fmtname = 'names';
start = id;
label = names;
run;
* Run PROC FORMAT to generate the format from the CNTLIN data set;
proc format cntlin=fmt_in;
run;
* Apply the format to the input data set;
data out;
set table1;
namesID = put(id, names.);
namesID1 = put(id1, names.);
namesID2 = put(id2, names.);
namesID3 = put(id3, names.);
namesID4 = put(id4, names.);
namesID5 = put(id5, names.);
run;
This will be very efficient for large inputs because it doesn't require multiple sorts. In general, of course, your input data set table1 should be normalised to be tall and thin so that there is only one column holding IDs; that would have made the merge-based solution trivial, though probably still slower than using a format.

Related

Combine two rows into one rows in Sybase ASE

I have a resultset as below:
id fname lname
11 Tom Jerry
11 Kim Harry
Output I expected as below:
id fname lname
11 Tom,Kim Jerry,Harry
Appreciate your help.
thank you.

SAS: Separate date_from & date_to into separate lines

I've got an example like this:
data date_table;
stop;
length id $32.;
length name $32.;
length date_from date_to 8.;
format date_from date_to datetime19.;
run;
proc sql;
insert into date_table
values ('1', 'Mark', '13Jun2019 08:39:00'dt, '13Jun2019 11:39:00'dt)
values ('2', 'Bart', '13Jun2019 13:39:00'dt, '13Jun2019 17:39:00'dt);
quit;
I need some smart join (maybe with separate hour mapping table) to achieve something like this:
What I've been trying now was using mapping table
and join like:
proc sql;
create table testing as
select t1.id,
t1.name,
t1.date_from,
t1.date_to
from DATE_TABLE t1 inner join
WORK.CAL_TIME t2 on t1.date_from >= t2.Time and
t1.date_to <= t2.Time;
quit;
But of course the result is empty table because date dpoens't want t join. I might cut date_from and date_to to full hours but still such a join doens't work.
Help.
Looks like you are comparing apples (DATETIME) with oranges (TIME). The order of magnitude of those numbers are totally different.
684 data _null_;
685
686 dt = '13Jun2019 08:39:00'dt ;
687 tm = '08:00't ;
688 put (dt tm) (=comma20.);
689 run;
dt=1,876,034,340 tm=28,800
You probably just want to compare the time of day part of your datetime values to your time values. Also round your start times down and your end times up to the hour.
data date_table;
length id name $32 date_from date_to 8;
format date_from date_to datetime19.;
input id name (date:) (:datetime.);
cards;
1 Mark 13Jun2019:08:39:00 13Jun2019:11:39:00
2 Bart 13Jun2019:13:39:00 13Jun2019:17:39:00
;
data cal_time;
do time='08:00't to '21:00't by '01:00't ;
output;
end;
format time time5.;
run;
proc sql;
create table testing as
select t1.id
, t1.name
, max(t1.date_from,dhms(datepart(t1.date_from),0,0,t2.time))
as datetime_from format=datetime19.
, min(t1.date_to,dhms(datepart(t1.date_to),0,0,t2.time+'01:00't))
as datetime_to format=datetime19.
, t2.time
from DATE_TABLE t1
inner join WORK.CAL_TIME t2
on t2.time between intnx('hour',timepart(t1.date_from),0,'b')
and intnx('hour',timepart(t1.date_to),0,'e')
;
quit;
Result
Obs id name datetime_from datetime_to time
1 1 Mark 13JUN2019:08:39:00 13JUN2019:09:00:00 8:00
2 1 Mark 13JUN2019:09:00:00 13JUN2019:10:00:00 9:00
3 1 Mark 13JUN2019:10:00:00 13JUN2019:11:00:00 10:00
4 1 Mark 13JUN2019:11:00:00 13JUN2019:11:39:00 11:00
5 2 Bart 13JUN2019:13:39:00 13JUN2019:14:00:00 13:00
6 2 Bart 13JUN2019:14:00:00 13JUN2019:15:00:00 14:00
7 2 Bart 13JUN2019:15:00:00 13JUN2019:16:00:00 15:00
8 2 Bart 13JUN2019:16:00:00 13JUN2019:17:00:00 16:00
9 2 Bart 13JUN2019:17:00:00 13JUN2019:17:39:00 17:00

Using self joins sql query

I have a Table, TEmployee where SequenceId, Date, EmplId, ExtnNumber, FName are the attributes, where SequenceId is unique and there will be multiple entries for same EmplId like
1 1/1/2014 55323 8793 Ryan
2 1/2/2014 83723 9898 Roy
3 1/1/2014 88838 8823 Mark
4 1/2/2014 83723 9832 Roy
5 1/3/2014 32323 2223 Tina
6 1/1/2014 55323 8744 Ryan
select * from TEmployee where EmplId in ('55323','83723') with urlists me the following..
1 1/1/2014 55323 8793 Ryan
2 1/2/2014 83723 9898 Roy
4 1/2/2014 83723 9832 Roy
6 1/1/2014 55323 8744 Ryan
But, i want to list the latest entry to be displayed.. by latest i mean SequenceId.. only entries 4 & 6..
Any pointers would be of good help. Thanks in Advance.
#Jimmy Smith beat me to it with a correct answer, but mine is a correlated subselect, so there is no need to repeat the EMPLID IN ('55323', '83723') part.
SELECT *
FROM TEMPLOYEE AS A
WHERE EMPLID IN ('55323', '83723')
AND SEQUENCEID = (
SELECT MAX(SEQUENCEID)
FROM TEMPLOYEE AS B
WHERE A.EMPLID = B.EMPLID
)
WITH UR
One method may be via a subquery,
select * from TEmployee where EmplId in ('55323', '83723') and SequenceId in (Select Max(SequenceId) where EmplId in ('55323', '83723'))

How to Count consecutive dates

I'm look for a query in T-Sql to count the number of consecutive dates, backwards where the pop is the same, starting at the latest date and stopping when there is a gap in the date.
This is an example of the data:
Name village Population Date
Rob village1 432 01/07/2013
Rob village2 432 30/06/2013
Rob village3 432 29/06/2013
Rob village3 432 28/06/2013
Rob village3 432 27/06/2013
Rob village3 430 26/06/2013
Rob village3 430 25/06/2013
Rob village3 430 24/06/2013
Rob village3 430 23/06/2013
Rob village3 425 22/06/2013
Rob village3 422 21/06/2013
Rob village3 422 20/06/2013
Rob village3 411 19/06/2013
Harry Village1 123 01/07/2013
Harry Village2 123 30/06/2013
Harry Village3 122 29/06/2013
Pat Village1 123 01/07/2013
Pat Village2 123 30/06/2013
Pat Village3 123 29/06/2013
Pat Village4 100 20/06/2013
Tom Village1 123 01/07/2013
Tom Village2 123 30/06/2013
Tom Village3 123 29/06/2013
Tom Village4 123 28/06/2013
I would expect to get the following results:
Rob 5
Harry 2
Pat 3
Tom 3
The data should be more complex, but there will be 1000's of rows, 100's per person and groups of pop with consecutive dates, but i only want the first set of consecutive dates with the same pop, from the latest downwards.
with dd as
(
select distinct * from table
);
select name, max(count) + 1
from
(
select t1.name, t1.village, t1.pop, count(*) as count
from dd t1
join dd t2
on t2.village = t1.village
and t2.pop = t1.pop
and t2.pop = t1.pop
and t2.date = dateadd(day,-1,t1.date)
group by t1.name, t1.village, t1.pop
) dates
group by name
;with a as
(
select name, village, population, date, cast(date as datetime) + dense_rank() over(partition by Population, name order by date desc) grp
from <your table>
), b as
(
select name, village, population, date, dense_rank() over (partition by name order by grp desc) rk
from a
)
select name, count(distinct date) from b
where rk = 1
group by name

To calculate min and max of record for huge volume of record

My problem is to get max and minimum date for ABC. I have to do this for around 200 000 records which takes more time. How to improve the performance
ROW_ID DATE C value
----------------------------------------------
1 2012-08-01 00:00:00.0 ABC 87
2 2012-09-01 00:00:00.0 ABC 87
3 2012-10-01 00:00:00.0 ABC 87
4 2012-11-01 00:00:00.0 ABC 87
5 2012-12-01 00:00:00.0 ABC 87
6 2013-01-01 00:00:00.0 CBA 87
7 2013-02-01 00:00:00.0 ABC 87
8 2013-03-01 00:00:00.0 ABC 87
You should be able to do this easily using something like:
select c,
min(date) min_date,
max(date) max_date
from yt
where c='ABC'
group by c;
See SQL Fiddle with Demo.
Edit, since you are attempting to use this data to update another table in Sybase you have a few options. Sybase does not allow derived tables in UPDATE statements so I would suggest using a temp table to get the min/max date for each c and then use this table in your UPDATE with JOIN:
select c,
min(date) min_date,
max(date) max_date
into #temp2
from yt
where c='ABC'
group by c;
update t
set t.min_date = t1.min_date,
t.max_date = t1.max_date
from temp t
inner join #temp2 t1
on t.c = t1.c;
See SQL Fiddle with Demo