I'm new to hash objects, but I'd like to learn more about them. I'm trying to find ways to substitute proc sql with hash object. I have to two tables, when i have a proc sql with inner join and an equal with hash object it works, but when i have a proc sql with left join i don´t know how to make in hash object. Thankyou very much. Sorry by my English.
Table01.
data Principal;
input idd $ name $ Apellid1 $ valor $;
datalines;
1977 Arthur Pendrag 0001
1978 Minerva Athena 0001
2011 Noe Arca 0001
2014 Thor Hammer 0001
0001 Seiya Pegaso 0001
0002 Ikki Fenix 0001
0003 Shun Andromeda 0001
0004 Shiryu Dragon 0001
0005 Yoga Cisne 0001
0006 Naruto Konoha 0001
0007 Sasuke Kun 0001
;
Table02
data Secundarea;
input idd $ Apellid2 $ mival $;
datalines;
1977 Excalibu 0003
1978 Atenea 0004
2011 Biblia 0005
2014 Odin 0006
0001 Sagigario 0007
0002 Virgo 0008
0003 Piscis 0009
0004 Libra 0010
0005 Acuario 0011
0008 Aries 0012
;
Proc sql inner join
proc sql;
create table sqlinner as
select *
from principal as p inner join secundarea as s
on p.idd=s.idd;
quit;
Hash object (inner join) it works
data mihashInner;
declare hash h();
h.defineKey('IDD');
h.defineData ('IDD','APELLID2','MIVAL');
h.defineDone();
do until(fin1);
set SECUNDAREA end=fin1;
h.add();
end;
do until (fin2);
set PRINCIPAL end=fin2;
if h.find()=0 then
output;
end;
run;
Proc sql (left join)
proc sql;
create table sqlleft as
select *
from principal as p left join secundarea as s
on p.idd=s.idd;
quit;
How to make in hash object? I´m trying two ways.
data mihashLeft2;
declare hash h();
h.defineKey('IDD');
h.defineData ('IDD','APELLID2','MIVAL');
h.defineDone();
do until(fin1);
set SECUNDAREA end=fin1;
h.add();
end;
do until (fin2);
set PRINCIPAL end=fin2;
rc=h.find();
output;
end;
run;
Or this. But nothing. Thx.
data mihashLeft;
if 0 then set SECUNDAREA;
if _n_ =1 then do;
declare hash hhh(dataset: 'SECUNDAREA', multidata:'y');
hhh.DefineKey('IDD');
hhh.DefineData('IDD','APELLID2','MIVAL');
hhh.DefineDone();
set PRINCIPAL;
rc = hhh.find();
if rc ne 0 then do;
call missing(MIVAL);
output;
end;
else
do while(rc = 0);
output;
rc = hhh.find_next();
end;
end;
run;
You could try to do it like this:
data mihashLeft(drop=rc);
/*iterate left data set*/
set PRINCIPAL;
/*declare variables from hash set*/
length APELLID2 MIVAL $8 rc 8;
/*declare hash*/
if _n_=1 then do;
declare hash hhh(dataset: 'SECUNDAREA', multidata:'y');
hhh.DefineKey('IDD');
hhh.DefineData('APELLID2','MIVAL');
hhh.DefineDone();
end;
/*look for first row from hash set and output it even if it's not found*/
rc = hhh.find();
output;
/*loop to find other rows from the hash set*/
do while(rc=0);
rc = hhh.find_next();
/*output only if you found something*/
if rc=0 then output;
end;
run;
data mihashLeft;
set PRINCIPAL; /* left */
if _n_ = 1 then do;
if 0 then set SECUNDAREA;
dcl hash b (dataset: "SECUNDAREA", multidata: "y",ordered:'y');
b.definekey ("IDD");
b.definedata (all:'Y');
b.definedone ();
end;
if b.find() eq 0 then output;
/*if b.find() ne 0 then call missing(right_table_column);*/
/*if suppose you are pulling any column from right table then include above line*/
run;
Related
I have table lookup values as below
sno date
1 200101
2 200102
3 200103
4 200104
I wrote below macro
%let date=200102
proc sql;
select sno into :no from lookup where date=&date.;
quit;
I need a help on how to convert the entire table lookup into macro increment by creating first s.no and date as two macro variable then increment. So that i don’t need to update dates in my table lookup every time. So if i look up for date 201304 i need to get its corresponding s.no
Is there pattern to the SNO values? Are you basically numbering the months since 01JAN2001? If so then use INTCK() function.
data test;
input date yymmdd8. ;
format date yymmdd10. ;
sno = 1+intck('month','01JAN2001'd,date);
cards;
20010112
20010213
20010314
20010415
;
So you could create two macro variables. One with the base date and the other with the base SNO value.
36 %let basedate='01JAN2001'd ;
37 %let basesno=1;
38 %let date='01JAN2001'd ;
39 %let sno=%eval(&basesno + %sysfunc(intck(month,&basedate,&date)));
40 %put &=date &=sno;
DATE='01JAN2001'd SNO=1
41
42 %let date="%sysfunc(today(),date9)"d;
43 %let sno=%eval(&basesno + %sysfunc(intck(month,&basedate,&date)));
44 %put &=date &=sno;
DATE="16NOV2017"d SNO=203
If you want to simply translate one (unique) value into another. You can use (in)formats. They can do much more than just changing how data are read/displayed. They are easy to use, fast (in-memory) and don't depend on the table once created. Change the library to a permanent one if work (=> temporary library) doesn't suit your needs.
options fmtsearch=(formats,work);
data fmt(keep = fmtname type start end label hlo default);
length fmtname $10 type $1 start end $6 label 8 hlo $1 default 8;
fmtname = 'date_to_no';
type = 'I';
label=0;
do y = 2001 to 2099;
do m = 1 to 12;
start = put(y,4.) || put(m,z2.);
end = start;
label + 1;
default=50; /*default length of the string compared when informat is used. Should be higher than both start and end*/
output;
end;
end;
/*if you want to assign a value (=label) to inputs not found. In this case it's -2*/
hlo="O";
start = "";
end = start;
label= -2;
output;
run;
proc format library=work cntlin=fmt;
run;
data test;
no = input('200101',date_to_no.); output;
no = input('201710',date_to_no.); output;
no = input('201713',date_to_no.); output;
run;
Build a lookup table dynamically and create a macro variable for each row in the table. The macro variables will be named date_200101,date_200102,...and so on. They will contain a value equal to the corresponding sno value:
data lookup;
length var_name $20;
do sno = 1 to intck('month','01jan2001'd,date())+1;
date = input(put(intnx('month','01jan2001'd, sno-1, 'beginning'),yymmn6.),best.);
var_name = cats('date_',date);
call symput(var_name, cats(sno));
output;
end;
run;
You can then refer to the macro variables like so:
%let date =200103;
%put &&date_&date;
...or...
%put &date_200101;
The first usage example is using double macro resolution. Basically the macro processes needs to perform 2 iterations of the macro token &&date_&date in order to fully resolve it. On the first pass, it gets resolved to &date_200101. On the second pass, the macro token &date_200101 gets resolved to 1.
I'm using the %HASHMERGE macro found at http://www.sascommunity.org/mwiki/images/2/22/Hashmerge.sas and the following example datasets:
data working;
length IID TYPE $12;
input IID $ TYPE $;
datalines;
B 0
B 0
A 1
A 1
A 1
C 2
D 3
;
run;
data master;
length IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME $12;
input IID $ FIRST_NAME $ MIDDLE_NAME $ LAST_NAME $ SUFFIX_NAME;
datalines;
X John James Smith Sr
Z Sarah Marie Jones .
Y Tim William Miller Jr
C Nancy Lynn Brown .
B Carol Elizabeth Collins .
A Wayne Mark Rooney .
;
run;
On the working dataset, I'm trying to attach the _NAME variables from the master dataset using this hash merge. The output looks fine and IS the desired output. However, in my real-life scenario the master dataset is too large to fit into a hash object and the macro keeps placing it as the hash object. I'd ultimately like to flip these two datasets to where the working dataset is the hash object, but I cannot get the desired output when I flip the code. Below is the part of the macro that produces the desired output and needs adjusted, but I am unsure how to set this up:
data OUTPUT;
if 0 then set MASTER (keep=IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME)
WORKING (keep=IID);
declare hash h_merge(dataset:"MASTER"); /* I want WORKING to be the hash object since it's smaller! */
rc=h_merge.DefineKey("IID");
rc=h_merge.DefineData("FIRST_NAME","MIDDLE_NAME","LAST_NAME","SUFFIX_NAME");
rc=h_merge.DefineDone();
do while(not eof);
set WORKING (keep=IID) end=eof;
call missing(FIRST_NAME,MIDDLE_NAME,LAST_NAME,SUFFIX_NAME);
rc=h_merge.find();
output;
end;
drop rc;
stop;
run;
Desired output:
IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME
---------------------------------------------------
B Carol Elizabeth Collins
B Carol Elizabeth Collins
A Wayne Mark Rooney
A Wayne Mark Rooney
A Wayne Mark Rooney
C Nancy Lynn Brown
D
While it's feasible to do what you say, I doubt you'll get that from a non-purpose-built macro. That's because it's not the normal way to do that; typically you want to keep the main dataset in its form and put the relational dataset in the hash table. Usually the sizes are reversed of course - the relational table is usually smaller than the main table.
Personally I would not use hash for this particular case. I'd use a format (or three). Just as fast as a hash and has less of the size issues (since it doesn't have to fit in memory), though it eventually would slow down (but not break!) due to size.
Format solution:
data working;
length IID TYPE $12;
input IID $ TYPE $;
datalines;
B 0
B 0
A 1
A 1
A 1
C 2
D 3
;
run;
data master;
length IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME $12;
input IID $ FIRST_NAME $ MIDDLE_NAME $ LAST_NAME $ SUFFIX_NAME;
datalines;
X John James Smith Sr
Z Sarah Marie Jones .
Y Tim William Miller Jr
C Nancy Lynn Brown .
B Carol Elizabeth Collins .
A Wayne Mark Rooney .
;
run;
data for_fmt;
set master;
retain type 'char';
length fmtname $32
label $255
start $255
;
start=iid;
*first;
label=first_name;
fmtname='$FIRSTNAMEF';
output;
*last;
label=last_name;
fmtname='$LASTNAMEF';
output;
*middle;
label=middle_name;
fmtname='$MIDNAMEF';
output;
*suffix;
label=suffix_name;
fmtname='$SUFFNAMEF';
output;
if _n_=1 then do;
start=' ';
label=' ';
hlo='o';
fmtname='$FIRSTNAMEF';
output;
fmtname='$LASTNAMEF';
output;
fmtname='$MIDNAMEF';
output;
fmtname='$SUFFNAMEF';
output;
end;
run;
proc sort data=for_fmt;
by fmtname start;
run;
proc format cntlin=for_fmt;
quit;
data want;
set working;
first_name = put(iid,$FIRSTNAMEF.);
last_name = put(iid,$LASTNAMEF.);
middle_name = put(iid,$MIDNAMEF.);
suffix_name = put(iid,$SUFFNAMEF.);
run;
That said...
If you do want to do this in a hash table, what you'd need to do is, for each row in MASTER, do a FIND in the working table, then if successful a REPLACE, then FIND_NEXT and REPLACE until that fails.
The problem? You're doing at least one find per master row, which you yourself noted is very large. If WORKING is 100k and MASTER is 100M, then you're doing 1000 finds for each match. That's very expensive, and probably means you're better off with some other solution.
enter image description hereI am trying to find persons by id who have continuous, 12 months enrollment before the hospitalization date and another 12 months after the hospitalization date. Each member will have one row.
This is using claim database in US. Any help is appreciated.
Example of the dataset:
ID Enr_date End_Date hosp_date
1 1/5/2004 1/6/2008 2/2/2006
2 .... and so on
3
4
id start_e end_e date_h
1 1/1/2005 1/1/2006 2/8/2008
1 2/3/2006 4/5/2013
2 5/7/2005 8/8/2006 4/5/2007
2 1/1/2007 2/2/2012
3 5/9/2005 5/9/2007 1/1/2007
3 6/4/2008 7/7/2012
assuming my last comments have answers there are many ways you can do this. Starting out it may be difficult to get outer joins, cross joins etc working in a way that's easy to understand. With a SAS macro we can break the problem down so it's easy to understand and do any debugging that may be necessary. Here's one approach that may work for you:
%macro hdates;
/* get number of hosp_dates */
proc sql noprint;
select count(*) into: cnt
from date where hosp_date ne .;
quit;
%let cnt = &cnt;
/* place hdates and ids into macro vars */
proc sql noprint;
select enrolid, hosp_date into: id_1 - :id_&cnt, : hdate_1 - :hdate_&cnt
from date;
quit;
proc delete data= hcov; run;
/* for each hdate id pair go through the dataset and test for 12 mo coverage
%do i = 1 %to &cnt;
data new;
set date;
if (enrolid = &&id_&i) then do;
preDays = "&&hdate_&i"d - start_date ;
postDays = end_date - "&&hdate_&i"d;
if (preDays >= 365 and postDays >= 365) then output;
end;
run;
proc append base = hcov data=new;run;
%end;
%mend hdates;
%hdates;
I work in claims data and I think I understand what you are trying to ask. I recommend making one table with the "condensed" enrollment ranges and another with the hospitalization dates. Then you may merge them together and keep only those patients who meet your criteria. The following code will condense the enrollment ranges (assuming good records):
PROC SORT DATA=dset_in; BY id enr_date end_date; RUN;
DATA enrollment (KEEP=id enroll_start enroll_stop);
SET dset_in;
FORMAT enroll_start enroll_stop DATE9.;
BY id enr_date end_date;
RETAIN enroll_start enroll_stop;
IF first.id THEN DO;
enroll_start=enr_date;
enroll_stop=end_date;
END;
ELSE IF enr_date-enroll_stop <= 1 THEN enroll_stop=end_date;
ELSE DO;
OUTPUT;
enroll_start=enr_date;
enroll_stop=end_date;
END;
IF last.id THEN OUTPUT;
RUN;
Then this code will keep only those patients with a hospitalization and 365 days enrollment before and after. If the hosp_claims dataset has more than 1 hospitalization per patient, sort then take the first obs per id after this step:
PROC SQL;
CREATE TABLE hosp_enrolled AS
SELECT DISTINCT a.id, a.hosp_dt, b.enroll_start, b.enroll_stop
FROM hosp_claims AS a, enrollment AS b
WHERE a.id=b.id AND b.enroll_start+365 <= a.hosp_dt <= enroll_stop-365;
QUIT;
I have a data set that looks like this:
data have;
input name $ class $ time score;
cards;
chewbacca wookie 1 97
chewbacca wookie 2 100
chewbacca wookie 3 95
saruman wizard 1 79
saruman wizard 2 85
saruman wizard 3 40
gandalf wizard 1 22
gandalf wizard 2 50
gandalf wizard 3 87
bieber canadian 1 50
bieber canadian 2 45
bieber canadian 3 10
;
run;
I'm creating a program that does two things:
1. prints the data for each distinct class
2. creates a scatterplot x=time y=score for each name.
Executing the code below will illustrate my desired output:
data chewbacca saruman gandalf bieber;
set have;
if name='chewbacca' then output chewbacca;
else if name='saruman' then output saruman;
else if name='gandalf' then output gandalf;
else if name='bieber' then output bieber;
run;
title 'Report for wookie';
proc print data=have;
where class='wookie';
run;
title 'Plot Chewbacca';
proc sgplot data=chewbacca;
scatter x=time y=score;
run;
title 'Report for wizard';
proc print data=have;
where class='wizard';
run;
title 'Plot Saruman';
proc sgplot data=saruman;
scatter x=time y=score;
run;
title 'Plot Gandalf';
proc sgplot data=gandalf;
scatter x=time y=score;
run;
title 'Report for canadian';
proc print data=have;
where class='canadian';
run;
title 'Plot Bieber';
proc sgplot data=bieber;
scatter x=time y=score;
run;
Ideally, I'd like to automate this. I've been trying to set this up, but am missing something. Here is my attempt:
proc sql;
select count(distinct name) into :numname
from have;
%let numname=&numname;
select distinct name into :name1 - :name&numname
from have;
select count(distinct class) into :numclass
from have;
%let numclass=&numclass;
select distinct class into :class1 - :class&numclass
from have;
quit;
%macro printit;
%do i = 1 %to &numclass;
title 'Report for &&class&i';
proc print data=have;
where class=&&class&i;
run;
/*insert sgplot here*/
%end;
%mend;
%printit;
Please help here. Can't get the syntax sorted....
Thanks.
I see 4 issues.
Macros will only resolve inside double quotes. Single quotes mask the resolution. So change the title statement to:
title "Report for &&class&i";
The class variable is a string. You need to quote the string in the where clause:
where class="&&class&i";
You don't need to generate the separate data sets. You can add a where clause when you specify the data for SGPLOT
proc sgplot data=have(where=(name="&&name&i"));
The number of names and classes are different, so you need two loops.
EDIT: Also look at SGPANEL and/or SGRENDER. You can generate all the charts in 1 call.
The print procedure and most ODS procedures support by-group processing, which might be a lot simpler and save a lot of time depending on what you require.
proc sort data=have; by class;
proc print data=have;
by class;
run;
and
proc sort data=have; by name;
proc sgplot data=have;
by name;
scatter x=time y=score;
run;
I have two SAS data tables. The first has many millions of records, and each record is identified with a sequential record ID, like this:
Table A
Rec Var1 Var2 ... VarX
1 ...
2
3
The second table specifies which rows from Table A should be assigned a coding variable:
Table B
Code BegRec EndRec
AA 1200 4370
AX 7241 9488
BY 12119 14763
So the first row of Table B means any data in Table A that has rec between 1200 and 4370 should be assigned code AA.
I know how to accomplish this with proc sql, but I want to see how this is done with a hash object.
In SQL, it's just:
proc sql;
select b.code, a.*
from tableA a, tableB b
where b.begrec<=a.rec<=b.endrec;
quit;
My actual data contains hundreds of gigabytes of data, so I want to do the processing as efficiently as possible. My understanding is that using a hash object may help here, but I haven't been able to figure out how to map what I'm doing to use that way.
A hash object solution (data input code borrowed from #Rob_Penridge).
data big;
do rec = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
data created;
format code $4.;
format begrec endrec best8.;
if _n_=1 then do;
declare hash h(dataset:'lookup');
h.definekey('Code');
h.definedata('code','begrec','endrec');
h.definedone();
call missing(code,begrec,endrec);
declare hiter iter('h');
end;
set big;
iter.first();
do until (rc^=0);
if begrec <= rec <= endrec then do;
code_dup=code;
end;
rc=iter.next();
end;
keep rec code_dup;
run;
I'm not sure a hash table would even be the most efficient approach here. I would probably solve this problem using a SELECT statement as the conditional logic will be fast and it still only requires 1 parse through the data:
select;
when ( 1200 <= _n_ <=4370) code = 'AA';
...
otherwise;
end;
Assuming that you will need to run this code multiple times and the data may change each time you may not want to hardcode the select statement. So the best solution would dynamically build it using a macro. I have a utility macro I use for these kinds of situations (included at the bottom):
1) Create the data
data big;
do i = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
2) Save the contents of the smaller table into macro variables. You could also do this using call symput or other preferred method. This method assumes you don't have too many rows in your lookup table.
%table_parse(iDs=lookup, iField=code , iPrefix=code);
%table_parse(iDs=lookup, iField=begrec, iPrefix=begrec);
%table_parse(iDs=lookup, iField=endrec, iPrefix=endrec);
3) Dynamically build the SELECT statement.
%macro ds;
%local cnt;
data final;
set big;
select;
%do cnt=1 %to &code;
when (&&begrec&cnt <= _n_ <= &&endrec&cnt) code = "&&code&cnt";
%end;
otherwise;
end;
run;
%mend;
%ds;
Here is the utility macro:
/*****************************************************************************
** MACRO.TABLE_PARSE.SAS
**
** AS PER %LIST_PARSE BUT IT TAKES INPUT FROM A FIELD IN A TABLE.
** STORE EACH OBSERVATION'S FIELD'S VALUE INTO IT'S OWN MACRO VARIABLE.
** THE TOTAL NUMBER OF WORDS IN THE STRING IS ALSO SAVED IN A MACRO VARIABLE.
**
** THIS WAS CREATED BECAUSE %LIST_PARSE WOULD FALL OVER WITH VERY LONG INPUT
** STRINGS. THIS WILL NOT.
**
** EACH VALUE IS STORED TO ITS OWN MACRO VARIABLE. THE NAMES
** ARE IN THE FORMAT <PREFIX>1 .. <PREFIX>N.
**
** PARAMETERS:
** iDS : (LIB.DATASET) THE NAME OF THE DATASET TO USE.
** iFIELD : THE NAME OF THE FIELD WITHIN THE DATASET.
** iPREFIX : THE PREFIX TO USE FOR STORING EACH WORD OF THE ISTRING TO
** ITS OWN MACRO VARIABLE (AND THE TOTAL NUMBER OF WORDS).
** iDSOPTIONS : OPTIONAL. ANY DATSET OPTIONS YOU MAY WANT TO PASS IN
** SUCH AS A WHERE FILTER OR KEEP STATEMENT.
**
******************************************************************************
** HISTORY:
** 1.0 MODIFIED: 01-FEB-2007 BY: ROBERT PENRIDGE
** - CREATED.
** 1.1 MODIFIED: 27-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW UNMATCHED QUOTES ETC IN VALUES BEING RETURNED BY
** CHARACTER FIELDS.
** 1.2 MODIFIED: 30-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW BLANK CHARACTER VALUES AND ALSO REMOVED TRAILING
** SPACES INTRODUCED BY CHANGE 1.1.
** 1.3 MODIFIED: 31-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW PARENTHESES IN CHARACTER VALUES.
** 1.4 MODIFIED: 31-AUG-2010 BY: ROBERT PENRIDGE
** - ADDED SOME DEBUG VALUES TO DETERMINE WHY IT SOMETIMES LOCKS TABLES.
*****************************************************************************/
%macro table_parse(iDs=, iField=, iDsOptions=, iPrefix=);
%local dsid pos rc cnt cell_value type;
%let cnt=0;
/*
** OPEN THE TABLE (AND MAKE SURE IT EXISTS)
*/
%let dsid=%sysfunc(open(&iDs(&iDsOptions),i));
%if &dsid eq 0 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
/*
** GET THE POSITION OF THE FIELD (AND MAKE SURE IT EXISTS)
*/
%let pos=%sysfunc(varnum(&dsid,&iField));
%if &pos eq 0 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
%else %do;
/*
** DETERMINE THE TYPE OF THE FIELD
*/
%let type = %upcase(%sysfunc(vartype(&dsid,&pos)));
%end;
/*
** READ THROUGH EACH OBSERVATION IN THE TABLE
*/
%let rc=%sysfunc(fetch(&dsid));
%do %while (&rc eq 0);
%let cnt = %eval(&cnt + 1);
%if "&type" = "C" %then %do;
%let cell_value = %qsysfunc(getvarc(&dsid,&pos));
%if "%trim(&cell_value)" ne "" %then %do;
%let cell_value = %qsysfunc(cats(%nrstr(&cell_value)));
%end;
%end;
%else %do;
%let cell_value = %sysfunc(getvarn(&dsid,&pos));
%end;
%global &iPrefix.&cnt ;
%let &iPrefix.&cnt = &cell_value ;
%let rc=%sysfunc(fetch(&dsid));
%end;
/*
** CHECK FOR ABNORMAL TERMINATION OF LOOP
*/
%if &rc ne -1 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
/*
** ENSURE THE TABLE IS CLOSED SUCCESSFULLY
*/
%let rc=%sysfunc(close(&dsid));
%if &rc %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
%global &iPrefix;
%let &iPrefix = &cnt ;
%mend;
Other examples of calling this macro:
%table_parse(iDs=sashelp.class, iField=sex, iPrefix=myTable, iDsOptions=%str(where=(sex='F')));
%put &mytable &myTable1 &myTable2 &myTable3; *etc...;
I'd be tempted to use the direct access method POINT= here, this will only read the required row numbers rather than the whole dataset.
Here is the code, which uses the same create data code as in Rob's answer.
data want;
set lookup;
do i=begrec to endrec;
set big point=i;
output;
end;
drop begrec endrec;
run;
If you have the code column already in the big dataset and you just wanted to update the values from the lookup dataset, then you could do this using MODIFY.
data big;
set lookup (rename=(code=code1));
do i=begrec to endrec;
modify big point=i;
code=code1;
replace;
end;
run;
Here's my solution, using proc format. This is also done in-memory, much like a hash table, but requires less structural code to work.
(Data input code also borrowed from #Rob_Penridge.)
data big;
do rec = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
ZZ 0 20
JJ 40 60
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
data lookup_f;
set lookup;
rename
BegRec = start
EndRec = end
Code = label;
retain fmtname 'CodeRecFormat';
run;
proc format library = work cntlin=lookup_f; run;
data big_formatted;
format rec CodeRecFormat.;
format rec2 8.;
length code $5.;
set big;
code = putn(rec, "CodeRecFormat.");
rec2 = rec;
run;