Macro increment - macros

I have table lookup values as below
sno date
1 200101
2 200102
3 200103
4 200104
I wrote below macro
%let date=200102
proc sql;
select sno into :no from lookup where date=&date.;
quit;
I need a help on how to convert the entire table lookup into macro increment by creating first s.no and date as two macro variable then increment. So that i don’t need to update dates in my table lookup every time. So if i look up for date 201304 i need to get its corresponding s.no

Is there pattern to the SNO values? Are you basically numbering the months since 01JAN2001? If so then use INTCK() function.
data test;
input date yymmdd8. ;
format date yymmdd10. ;
sno = 1+intck('month','01JAN2001'd,date);
cards;
20010112
20010213
20010314
20010415
;
So you could create two macro variables. One with the base date and the other with the base SNO value.
36 %let basedate='01JAN2001'd ;
37 %let basesno=1;
38 %let date='01JAN2001'd ;
39 %let sno=%eval(&basesno + %sysfunc(intck(month,&basedate,&date)));
40 %put &=date &=sno;
DATE='01JAN2001'd SNO=1
41
42 %let date="%sysfunc(today(),date9)"d;
43 %let sno=%eval(&basesno + %sysfunc(intck(month,&basedate,&date)));
44 %put &=date &=sno;
DATE="16NOV2017"d SNO=203

If you want to simply translate one (unique) value into another. You can use (in)formats. They can do much more than just changing how data are read/displayed. They are easy to use, fast (in-memory) and don't depend on the table once created. Change the library to a permanent one if work (=> temporary library) doesn't suit your needs.
options fmtsearch=(formats,work);
data fmt(keep = fmtname type start end label hlo default);
length fmtname $10 type $1 start end $6 label 8 hlo $1 default 8;
fmtname = 'date_to_no';
type = 'I';
label=0;
do y = 2001 to 2099;
do m = 1 to 12;
start = put(y,4.) || put(m,z2.);
end = start;
label + 1;
default=50; /*default length of the string compared when informat is used. Should be higher than both start and end*/
output;
end;
end;
/*if you want to assign a value (=label) to inputs not found. In this case it's -2*/
hlo="O";
start = "";
end = start;
label= -2;
output;
run;
proc format library=work cntlin=fmt;
run;
data test;
no = input('200101',date_to_no.); output;
no = input('201710',date_to_no.); output;
no = input('201713',date_to_no.); output;
run;

Build a lookup table dynamically and create a macro variable for each row in the table. The macro variables will be named date_200101,date_200102,...and so on. They will contain a value equal to the corresponding sno value:
data lookup;
length var_name $20;
do sno = 1 to intck('month','01jan2001'd,date())+1;
date = input(put(intnx('month','01jan2001'd, sno-1, 'beginning'),yymmn6.),best.);
var_name = cats('date_',date);
call symput(var_name, cats(sno));
output;
end;
run;
You can then refer to the macro variables like so:
%let date =200103;
%put &&date_&date;
...or...
%put &date_200101;
The first usage example is using double macro resolution. Basically the macro processes needs to perform 2 iterations of the macro token &&date_&date in order to fully resolve it. On the first pass, it gets resolved to &date_200101. On the second pass, the macro token &date_200101 gets resolved to 1.

Related

SAS: How to reference a global macro variable to create new table or dataset?

I'm having some trouble referencing a global macro variable outside of the macro to create a new data set. The global variable was created to run a loop for creating several yearly data sets using a vector of specified years, as you can see in the code below:
%macro loopyear;
%global year;
%do year = 2004 %to 2017;
proc import datafile = "C:\Filepath\blah.txt"
dbms = dlm out = blah&year.; /*Creates a dataset for each year, e.g. blah2004, blah2005, etc.) */
delimiter = " ";
getnames = no;
run;
data blah&year.;
set blah&year.;
year = &year.;
run;
proc sql;
create table blah&year._rail as
select year, var1, var2, var3, var4
from blah&year.
where var2= "rail";
quit;
%end;
%mend loopyear;
%loopyear;
/*Merge all year datasets into one master set*/
data blah_total;
set blah&year._rail;
run;
When I try to create the master data set outside of the macro, however, I get the following error:
data blah;
set blah&year._rail;
run;
ERROR: File work.blah2018_rail.data does not exist
This is frustrating because I'm only trying to create the master set based on 2004-2017 data, as referenced in the macro variable. Can someone help me pinpoint my error -- is it in the way I defined the global variable, or am I missing a step somewhere? Any help is appreciated.
Thanks!
This is an interesting quirk of both macro and data step do-loops in SAS - the loop counter is incremented before the exit condition is checked, so after your loop has run it will be one increment past your stop value, e.g.:
%macro example;
%do i = 1 %to 3;
%put i = &i;
%end;
%put i = &i;
%mend;
%example;
Output:
i = 1
i = 2
i = 3
i = 4
For your final step you probably want the set statement to look like this:
set blah2004_rail ... blah2017_rail;
You could write a macro loop to generate the list and move the data step inside your macro, e.g.
set %do year = 2004 %to 2017; blah&year._rail %end;;
The second semi-colon is important! You need one to close the %end and one to terminate the set statement.
Change your naming structure. Have a common prefix and put the year at the end, then you can use the semi colon to short reference all the datasets at once.
%macro loopyear;
%global year;
%do year = 2004 %to 2017;
proc import datafile = "C:\Filepath\blah.txt"
dbms = dlm out = blah&year.; /*Creates a dataset for each year, e.g. blah2004, blah2005, etc.) */
delimiter = " ";
getnames = no;
run;
data blah&year.;
set blah&year.;
year = &year.;
run;
proc sql;
create table blah_rail_&year. as
select year, var1, var2, var3, var4
from blah&year.
where var2= "rail";
quit;
%end;
%mend loopyear;
%loopyear;
/*Merge all year datasets into one master set*/
data blah_total;
set blah_rail: ;
run;

Defining Fixed SAS Macro Variables

I am trying to have a macro run but I'm not sure if it will resolve since I don't have connection to my database for a little while. I want to know if the macro is written correctly and will resolve the states on each pass through the code (ie do it repetitively and create a table for each state).
The second thing I would like to know is if I can run a macro through a from statement. For example let entpr be the database that I'm pulling from. Would the following resolve correctly:
proc sql;
select * from entpr.&state.; /*Do I need the . after &state?*/
The rest of my code:
libname mdt "........."
%let state = ny il ar ak mi;
proc sql;
create table mdt.&state._members
as select
corp_ent_cd
,mkt_sgmt_admnstn_cd
,fincl_arngmt_cd
,aca_ind
,prod_type
,cvyr
,cvmo
,sum(1) as mbr_cnt
from mbrship1_&state.
group by 1,2,3,4,5,6,7;
quit;
If &state contains ny il ar ak mi then as it is written, the from statement in your code will resolve to: from mbrship1_ny il ar ak mi - which is invalid SQL syntax.
My guess is that you're wanting to run the SQL statement for each of the following tables:
mbrship1_ny
mbrship1_il
mbrship1_ar
mbrship1_ak
mbrship1_mi
In which case the simplest macro would look something like this:
%macro do_sql(state=);
proc sql;
create table mdt.&state._members
as select
...
from mbrship1_&state
group by 1,2,3,4,5,6,7;
quit;
%mend;
%do_sql(state=ny);
%do_sql(state=il);
%do_sql(state=ar);
%do_sql(state=ak);
%do_sql(state=mi);
As to your question regarding whether or not to include the . the rule is that if the character following your macro variable is not a-Z, 0-9, or the underscore, then the period is optional. Those characters are the list of valid characters for a macro variable name, so as long as it's not one of those you don't need it as SAS will be able to identify where the name of the macro finishes. Some people always include it, personally I leave it out unless it's required.
When selecting data from multiple tables, whose names themselves contain some data (in your case the state) you can stack the data with:
UNION ALL in SQL
SET in Data step
As long as you are stacking data, you should also add a new column to the query selection that tracks the state.
Consider this pattern for stacking in SQL
data one;
do index = 1 to 10; do _n_ = 1 to 2; output; end; end;
run;
data two;
do index = 101 to 110; do _n_ = 1 to 2; output; end; end;
run;
proc sql;
create table want as
select
source, index
from
(select 'one' as source, * from one)
union all
(select 'two' as source, * from two)
;
The pattern can be abstracted into a template for SQL source code that will be generated by macro.
%macro my_ultimate_selector (out=, inlib=, prefix= states=);
%local index n state;
%let n = %sysfunc(countw(&states));
proc sql;
create table &out as
select
state
, corp_ent_cd
, mkt_sgmt_admnstn_cd
, fincl_arngmt_cd
, aca_ind
, prod_type
, cvyr
, cvmo
, count(*) as state_7dim_level_cnt
from
%* ----- use the UNION ALL pattern for stacking data -----;
%do index = 1 %to &n;
%let state = %scan(&states, &index);
%if &index > 1 %then %str(UNION ALL);
(select "&state" as state, * from &inlib..&prefix.&state.)
%end;
group by 1,2,3,4,5,6,7,8 %* this seems to be to much grouping ?;
;
quit;
%mend;
%my_ultimate_selector (out=work.want, inlib=mdt, prefix=mbrship1_, states=ny il ar ak mi)
If the columns of the inlib tables are not identical with regard to column order and type, use a UNION ALL CORRESPONDING to have the SQL procedure line up the columns for you.

SAS Subquerying population

I wonder if there is a way to query in SAS to select a subgroup, just like select option in Postgres
SELECT *
FROM s.diagnoses
WHERE icd9code = ANY ('{2910,2911,2912,2913,2914,2915,3456,3457,3458}');
Also is there way to specify ranges instead of the actual value eg: between 2910-2915
The diagnosis codes are characters not numeric. I am using the SAS University Edition.
In case you want to specify range then you have to convert the character field into numeric and then give the range
/***** if you want to mention each icd9code*****/
data have;
set diagnoses (where=(icd9code in ('2910' '2911' '2912' '2913' '2914' '2915' '3456' '3457' '3458')));
run;
/***** if you want to give range *****/
data have;
set diagnoses;
if input(icd9code ,4.) >= 2910 and input(icd9code ,4.) <= 3458;
run;
Let me know in case of any queries.
If it is a character you cannot use the range. But you can use the in statement
SELECT * FROM s.diagnoses WHERE icd9code in ('2910','2911','2912');
To select range. You can define your own macro to generate strings of range like this
%macro range(start, stop);
%if &start. = &stop. %then %do;
"&stop."
%end;
%else %do;
"&start.", %range(%sysevalf(&start+1), &stop);
%end;
%mend range;
%put %range(2910, 2915);
* -> "2910", "2911", "2912", "2913", "2914", "2915"
Then assign it to a macro variable and use it in you where statement within proc sql
%let subset1 = %range(2910, 2915);
proc sql noprint;
create table want as
select *
from
have
where var_want in (&subset1.);
quit;
You can then define multiple subset variables with different ranges and combination them in where condition to achieve more complex subsetting.
For ranges you want to include in their entirety, you can use inequalities directly - no 'input' required, as long as you have leading zeros, and for the rest you can use in, e.g.
data example;
length char $1;
do i = 64 to 100;
char = byte(i);
output;
end;
run;
proc sql;
create table want as
select * from example where 'A' <= char <= 'Z' or char in ('[',']');
quit;

How to calculate conditional cumulative sum

I have a dataset like the one below, and I am trying to take a running total of events 2 and 3, with a slight twist. I only want to count these events when the Event_1_dt is less than the date in the current record. I'm currently using a macro %do loop to iterate through each record for that item type. While this produces the desired results, performance is slower than desirable. Each Item_Type may have up to 1250 records, and there are a couple thousand types. Is it possible to exit the loop before it cycles through all 1250 iterations? I am hesitant to try joins because there are some 30+ events to count up, but I'm open to suggestions. An additional complication is that even though Event_1_dt is always greater then Date, is does not have any other limitations.
Item_Type Date Event_1_dt Event_2_flg Event_3Flg Desired_Event_2_Cnt Desired_Event_3_Cnt
A 1/1/2014 1/2/2014 1 1 0 0
A 1/2/2014 1/2/2014 0 1 0 0
A 1/3/2014 1/8/2014 1 0 1 2
B 1/1/2014 1/2/2014 1 0 0 0
B 1/2/2014 1/5/2014 1 0 0 0
B 1/3/2014 1/4/2014 1 1 1 0
B 1/4/2014 1/5/2014 0 1 1 0
B 1/5/2014 . 1 1 2 1
B 1/6/2014 1/7/2014 1 1 3 2
Corresponding Code:
%macro History;
data y;
set x;
Event_1_Cnt = 0;
Event_2_Cnt = 0;
%do i = 1 %to 1250;
lag_Item_Type = lag&i(Item_Type);
lag_Event_2_flg = lag&i(Event_2_flg);
lag_Event_3_flg = lag&i(Event_3_flg);
lag_Event_1_dt = lag&i(Event_1_dt);
if Item_Type = lag_Item_Type and lag_Event_1_dt > . and lag_Event_1_dt < Date then do;
if lag_Event_2_flg = 1 then do;
Event_2_Cnt = Event_2_cnt + 1;
end;
if lag_Event_3_flg = 1 then do;
Event_3_Cnt = Event_3_cnt + 1;
end;
end;
%end;
run;
%mend;
%History;
Well, that's not a trivial task for SAS, but still it can be solved in one DATA-step, without merging. You can use hash objects. The idea is as follows.
Within each item type, going record by record, we 'collect' event flags into 'bins' in a hash object, where each bin is a certain date. All bins are ordered by date in ascending order. Simultaneously, we insert the Date of the current record into the same hash (into corresponding place by date) and then iterate 'up' from this place, summing up all gathered by this moment bins (which will have dates < then date of the current record, since we going up).
Here's the code:
data have;
informat Item_Type $1. Date Event_1_dt mmddyy9. Event_2_flg Event_3_flg 8.;
infile datalines dsd dlm=',';
format Date Event_1_dt date9.;
input Item_Type Date Event_1_dt Event_2_flg Event_3_flg;
datalines;
A,1/1/2014,1/2/2014,1,1
A,1/2/2014,1/2/2014,0,1
A,1/3/2014,1/8/2014,1,0
B,1/1/2014,1/2/2014,1,0
B,1/2/2014,1/5/2014,1,0
B,1/3/2014,1/4/2014,1,1
B,1/4/2014,1/5/2014,0,1
B,1/5/2014,,1,1
B,1/6/2014,1/7/2014,1,1
;
run;
proc sort data=have; by Item_Type; run;
data want;
set have;
by Item_Type;
if _N_=1 then do;
declare hash h(ordered:'a');
h.defineKey('Event_date','type');
h.defineData('event2_cnt','event3_cnt');
h.defineDone();
declare hiter hi('h');
end;
/*for each new Item_type we clear the hash completely*/
if FIRST.Item_Type then h.clear();
/*now if date of Event 1 exists we put it into corresponding */
/* (by date) place of our ordered hash. If such date is already*/
/*in the hash, we increase number of events for this date */
/*adding values of Event2 and Event3 flags. If no - just assign*/
/*current values of these flags.*/
if not missing(Event_1_dt) then do;
Event_date=Event_1_dt;type=1;
rc=h.find();
event2_cnt=coalesce(event2_cnt,0)+Event_2_flg;
event3_cnt=coalesce(event3_cnt,0)+Event_3_flg;
h.replace();
end;
/*now we insert Date of the record into the same oredered hash,*/
/*making type=0 to differ this item from items where date means*/
/*date of Event1 (not date of record)*/
Event_date=Date;
event2_cnt=0; event3_cnt=0; type=0;
h.replace();
Desired_Event_2_Cnt=0;
Desired_Event_3_Cnt=0;
/*now we iterate 'up' from just inserted item, i.e. looping */
/*through all items that have date < the date of the record. */
/*Items with date = the date of the record will be 'below' since*/
/*they have type=1 and our hash is ordered by dates first, and */
/*types afterwards (1's will be below 0's)*/
hi.setcur(key:Date,key:0);
rc=hi.prev();
do while(rc=0);
Desired_Event_2_Cnt+event2_cnt;
Desired_Event_3_Cnt+event3_cnt;
rc=hi.prev();
end;
drop Event_date type rc event2_cnt event3_cnt;
run;
I can't test it with your real number of rows, but I believe it should be pretty fast, since we loop only through a small hash object, which is entirely in memory, and we do only as many loops for each record as necessary (only earlier events) and don't do any IF-checks.
I dont think a Hash is neccessary for this - it seems like a simple data-step will do the trick. This might prevent you (or the next programmer who comes across your code) from needing to 're-read and do research' in order to understand it.
I think the following will work:
data have;
informat Item_Type $1. Date Event_1_dt mmddyy9. Event_2_flg Event_3_flg 8.;
infile datalines dsd dlm=',';
format Date Event_1_dt date9.;
input Item_Type Date Event_1_dt Event_2_flg Event_3_flg;
datalines;
A,1/1/2014,1/2/2014,1,1
A,1/2/2014,1/2/2014,0,1
A,1/3/2014,1/8/2014,1,0
B,1/1/2014,1/2/2014,1,0
B,1/2/2014,1/5/2014,1,0
B,1/3/2014,1/4/2014,1,1
B,1/4/2014,1/5/2014,0,1
B,1/5/2014,,1,1
B,1/6/2014,1/7/2014,1,1
;
data want2 (drop=_: );
set have;
by ITEM_Type;
length _Alldts_event2 _Alldts_event3 $20000;
retain _Alldts_event2 _Alldts_event3;
*Clear _ALLDTS for each ITEM_TYPE;
if first.ITEM_type then Do;
_Alldts_event2 = "";
_Alldts_event3 = "";
END;
*If event is flagged, concatenate the Event_1_dt to the ALLDTS variable;
if event_2_flg = 1 Then _Alldts_event2 = catx(" ", _Alldts_event2,Event_1_dt);
if event_3_flg = 1 Then _Alldts_event3 = catx(" ", _Alldts_event3,Event_1_dt);
_numWords2 = COUNTW(_Alldts_event2);
_numWords3 = COUNTW(_Alldts_event3);
*Loop through alldates, count the number that are < the current records date;
cnt2=0;
do _i = 1 to _NumWords2;
_tempDate = input(scan(_Alldts_event2,_i),Best12.);
if _tempDate < date Then cnt2=cnt2+1;
end;
cnt3=0;
do _i = 1 to _NumWords3;
_tempDate = input(scan(_Alldts_event3,_i),Best12.);
if _tempDate < date Then cnt3=cnt3+1;
end;
run;
I believe the Hash may be faster, but you'll have to decide on what tradeoff of comprehensibility/performance is appropriate.

Extracting certain rows from data using hash object in SAS

I have two SAS data tables. The first has many millions of records, and each record is identified with a sequential record ID, like this:
Table A
Rec Var1 Var2 ... VarX
1 ...
2
3
The second table specifies which rows from Table A should be assigned a coding variable:
Table B
Code BegRec EndRec
AA 1200 4370
AX 7241 9488
BY 12119 14763
So the first row of Table B means any data in Table A that has rec between 1200 and 4370 should be assigned code AA.
I know how to accomplish this with proc sql, but I want to see how this is done with a hash object.
In SQL, it's just:
proc sql;
select b.code, a.*
from tableA a, tableB b
where b.begrec<=a.rec<=b.endrec;
quit;
My actual data contains hundreds of gigabytes of data, so I want to do the processing as efficiently as possible. My understanding is that using a hash object may help here, but I haven't been able to figure out how to map what I'm doing to use that way.
A hash object solution (data input code borrowed from #Rob_Penridge).
data big;
do rec = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
data created;
format code $4.;
format begrec endrec best8.;
if _n_=1 then do;
declare hash h(dataset:'lookup');
h.definekey('Code');
h.definedata('code','begrec','endrec');
h.definedone();
call missing(code,begrec,endrec);
declare hiter iter('h');
end;
set big;
iter.first();
do until (rc^=0);
if begrec <= rec <= endrec then do;
code_dup=code;
end;
rc=iter.next();
end;
keep rec code_dup;
run;
I'm not sure a hash table would even be the most efficient approach here. I would probably solve this problem using a SELECT statement as the conditional logic will be fast and it still only requires 1 parse through the data:
select;
when ( 1200 <= _n_ <=4370) code = 'AA';
...
otherwise;
end;
Assuming that you will need to run this code multiple times and the data may change each time you may not want to hardcode the select statement. So the best solution would dynamically build it using a macro. I have a utility macro I use for these kinds of situations (included at the bottom):
1) Create the data
data big;
do i = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
2) Save the contents of the smaller table into macro variables. You could also do this using call symput or other preferred method. This method assumes you don't have too many rows in your lookup table.
%table_parse(iDs=lookup, iField=code , iPrefix=code);
%table_parse(iDs=lookup, iField=begrec, iPrefix=begrec);
%table_parse(iDs=lookup, iField=endrec, iPrefix=endrec);
3) Dynamically build the SELECT statement.
%macro ds;
%local cnt;
data final;
set big;
select;
%do cnt=1 %to &code;
when (&&begrec&cnt <= _n_ <= &&endrec&cnt) code = "&&code&cnt";
%end;
otherwise;
end;
run;
%mend;
%ds;
Here is the utility macro:
/*****************************************************************************
** MACRO.TABLE_PARSE.SAS
**
** AS PER %LIST_PARSE BUT IT TAKES INPUT FROM A FIELD IN A TABLE.
** STORE EACH OBSERVATION'S FIELD'S VALUE INTO IT'S OWN MACRO VARIABLE.
** THE TOTAL NUMBER OF WORDS IN THE STRING IS ALSO SAVED IN A MACRO VARIABLE.
**
** THIS WAS CREATED BECAUSE %LIST_PARSE WOULD FALL OVER WITH VERY LONG INPUT
** STRINGS. THIS WILL NOT.
**
** EACH VALUE IS STORED TO ITS OWN MACRO VARIABLE. THE NAMES
** ARE IN THE FORMAT <PREFIX>1 .. <PREFIX>N.
**
** PARAMETERS:
** iDS : (LIB.DATASET) THE NAME OF THE DATASET TO USE.
** iFIELD : THE NAME OF THE FIELD WITHIN THE DATASET.
** iPREFIX : THE PREFIX TO USE FOR STORING EACH WORD OF THE ISTRING TO
** ITS OWN MACRO VARIABLE (AND THE TOTAL NUMBER OF WORDS).
** iDSOPTIONS : OPTIONAL. ANY DATSET OPTIONS YOU MAY WANT TO PASS IN
** SUCH AS A WHERE FILTER OR KEEP STATEMENT.
**
******************************************************************************
** HISTORY:
** 1.0 MODIFIED: 01-FEB-2007 BY: ROBERT PENRIDGE
** - CREATED.
** 1.1 MODIFIED: 27-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW UNMATCHED QUOTES ETC IN VALUES BEING RETURNED BY
** CHARACTER FIELDS.
** 1.2 MODIFIED: 30-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW BLANK CHARACTER VALUES AND ALSO REMOVED TRAILING
** SPACES INTRODUCED BY CHANGE 1.1.
** 1.3 MODIFIED: 31-AUG-2010 BY: ROBERT PENRIDGE
** - MODIFIED TO ALLOW PARENTHESES IN CHARACTER VALUES.
** 1.4 MODIFIED: 31-AUG-2010 BY: ROBERT PENRIDGE
** - ADDED SOME DEBUG VALUES TO DETERMINE WHY IT SOMETIMES LOCKS TABLES.
*****************************************************************************/
%macro table_parse(iDs=, iField=, iDsOptions=, iPrefix=);
%local dsid pos rc cnt cell_value type;
%let cnt=0;
/*
** OPEN THE TABLE (AND MAKE SURE IT EXISTS)
*/
%let dsid=%sysfunc(open(&iDs(&iDsOptions),i));
%if &dsid eq 0 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
/*
** GET THE POSITION OF THE FIELD (AND MAKE SURE IT EXISTS)
*/
%let pos=%sysfunc(varnum(&dsid,&iField));
%if &pos eq 0 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
%else %do;
/*
** DETERMINE THE TYPE OF THE FIELD
*/
%let type = %upcase(%sysfunc(vartype(&dsid,&pos)));
%end;
/*
** READ THROUGH EACH OBSERVATION IN THE TABLE
*/
%let rc=%sysfunc(fetch(&dsid));
%do %while (&rc eq 0);
%let cnt = %eval(&cnt + 1);
%if "&type" = "C" %then %do;
%let cell_value = %qsysfunc(getvarc(&dsid,&pos));
%if "%trim(&cell_value)" ne "" %then %do;
%let cell_value = %qsysfunc(cats(%nrstr(&cell_value)));
%end;
%end;
%else %do;
%let cell_value = %sysfunc(getvarn(&dsid,&pos));
%end;
%global &iPrefix.&cnt ;
%let &iPrefix.&cnt = &cell_value ;
%let rc=%sysfunc(fetch(&dsid));
%end;
/*
** CHECK FOR ABNORMAL TERMINATION OF LOOP
*/
%if &rc ne -1 %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
/*
** ENSURE THE TABLE IS CLOSED SUCCESSFULLY
*/
%let rc=%sysfunc(close(&dsid));
%if &rc %then %do;
%put WARNING: MACRO.TABLE_PARSE.SAS: %sysfunc(sysmsg());
%end;
%global &iPrefix;
%let &iPrefix = &cnt ;
%mend;
Other examples of calling this macro:
%table_parse(iDs=sashelp.class, iField=sex, iPrefix=myTable, iDsOptions=%str(where=(sex='F')));
%put &mytable &myTable1 &myTable2 &myTable3; *etc...;
I'd be tempted to use the direct access method POINT= here, this will only read the required row numbers rather than the whole dataset.
Here is the code, which uses the same create data code as in Rob's answer.
data want;
set lookup;
do i=begrec to endrec;
set big point=i;
output;
end;
drop begrec endrec;
run;
If you have the code column already in the big dataset and you just wanted to update the values from the lookup dataset, then you could do this using MODIFY.
data big;
set lookup (rename=(code=code1));
do i=begrec to endrec;
modify big point=i;
code=code1;
replace;
end;
run;
Here's my solution, using proc format. This is also done in-memory, much like a hash table, but requires less structural code to work.
(Data input code also borrowed from #Rob_Penridge.)
data big;
do rec = 1 to 20000;
output;
end;
run;
data lookup;
input Code $ BegRec EndRec;
datalines;
ZZ 0 20
JJ 40 60
AA 1200 4370
AX 7241 9488
BY 12119 14763
;
run;
data lookup_f;
set lookup;
rename
BegRec = start
EndRec = end
Code = label;
retain fmtname 'CodeRecFormat';
run;
proc format library = work cntlin=lookup_f; run;
data big_formatted;
format rec CodeRecFormat.;
format rec2 8.;
length code $5.;
set big;
code = putn(rec, "CodeRecFormat.");
rec2 = rec;
run;