SAS hash merge -- smaller dataset as hash object

SAS hash merge -- smaller dataset as hash object - hash

I'm using the %HASHMERGE macro found at http://www.sascommunity.org/mwiki/images/2/22/Hashmerge.sas and the following example datasets:
data working;
length IID TYPE $12;
input IID $ TYPE $;
datalines;
B 0
B 0
A 1
A 1
A 1
C 2
D 3
;
run;
data master;
length IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME $12;
input IID $ FIRST_NAME $ MIDDLE_NAME $ LAST_NAME $ SUFFIX_NAME;
datalines;
X John James Smith Sr
Z Sarah Marie Jones .
Y Tim William Miller Jr
C Nancy Lynn Brown .
B Carol Elizabeth Collins .
A Wayne Mark Rooney .
;
run;
On the working dataset, I'm trying to attach the _NAME variables from the master dataset using this hash merge. The output looks fine and IS the desired output. However, in my real-life scenario the master dataset is too large to fit into a hash object and the macro keeps placing it as the hash object. I'd ultimately like to flip these two datasets to where the working dataset is the hash object, but I cannot get the desired output when I flip the code. Below is the part of the macro that produces the desired output and needs adjusted, but I am unsure how to set this up:
data OUTPUT;
if 0 then set MASTER (keep=IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME)
WORKING (keep=IID);
declare hash h_merge(dataset:"MASTER"); /* I want WORKING to be the hash object since it's smaller! */
rc=h_merge.DefineKey("IID");
rc=h_merge.DefineData("FIRST_NAME","MIDDLE_NAME","LAST_NAME","SUFFIX_NAME");
rc=h_merge.DefineDone();
do while(not eof);
set WORKING (keep=IID) end=eof;
call missing(FIRST_NAME,MIDDLE_NAME,LAST_NAME,SUFFIX_NAME);
rc=h_merge.find();
output;
end;
drop rc;
stop;
run;
Desired output:
IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME
---------------------------------------------------
B Carol Elizabeth Collins
B Carol Elizabeth Collins
A Wayne Mark Rooney
A Wayne Mark Rooney
A Wayne Mark Rooney
C Nancy Lynn Brown
D

While it's feasible to do what you say, I doubt you'll get that from a non-purpose-built macro. That's because it's not the normal way to do that; typically you want to keep the main dataset in its form and put the relational dataset in the hash table. Usually the sizes are reversed of course - the relational table is usually smaller than the main table.
Personally I would not use hash for this particular case. I'd use a format (or three). Just as fast as a hash and has less of the size issues (since it doesn't have to fit in memory), though it eventually would slow down (but not break!) due to size.
Format solution:
data working;
length IID TYPE $12;
input IID $ TYPE $;
datalines;
B 0
B 0
A 1
A 1
A 1
C 2
D 3
;
run;
data master;
length IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME $12;
input IID $ FIRST_NAME $ MIDDLE_NAME $ LAST_NAME $ SUFFIX_NAME;
datalines;
X John James Smith Sr
Z Sarah Marie Jones .
Y Tim William Miller Jr
C Nancy Lynn Brown .
B Carol Elizabeth Collins .
A Wayne Mark Rooney .
;
run;
data for_fmt;
set master;
retain type 'char';
length fmtname $32
label $255
start $255
;
start=iid;
*first;
label=first_name;
fmtname='$FIRSTNAMEF';
output;
*last;
label=last_name;
fmtname='$LASTNAMEF';
output;
*middle;
label=middle_name;
fmtname='$MIDNAMEF';
output;
*suffix;
label=suffix_name;
fmtname='$SUFFNAMEF';
output;
if _n_=1 then do;
start=' ';
label=' ';
hlo='o';
fmtname='$FIRSTNAMEF';
output;
fmtname='$LASTNAMEF';
output;
fmtname='$MIDNAMEF';
output;
fmtname='$SUFFNAMEF';
output;
end;
run;
proc sort data=for_fmt;
by fmtname start;
run;
proc format cntlin=for_fmt;
quit;
data want;
set working;
first_name = put(iid,$FIRSTNAMEF.);
last_name = put(iid,$LASTNAMEF.);
middle_name = put(iid,$MIDNAMEF.);
suffix_name = put(iid,$SUFFNAMEF.);
run;
That said...
If you do want to do this in a hash table, what you'd need to do is, for each row in MASTER, do a FIND in the working table, then if successful a REPLACE, then FIND_NEXT and REPLACE until that fails.
The problem? You're doing at least one find per master row, which you yourself noted is very large. If WORKING is 100k and MASTER is 100M, then you're doing 1000 finds for each match. That's very expensive, and probably means you're better off with some other solution.

Related

how to find non- Ascii key symbols in a string. DB2?

How to find non-ASCII symbols in a string ? (We are using DB2)
We have tried following select statement but it is not working.
SELECT columnname
FROM tablename
WHERE columnname LIKE '%[' + CHAR(127) + '-' + CHAR(255) + ']%'
COLLATE Latin1_General_100_BIN2

I guess you were trying to use CHR() function, instead of CHAR(), which is a data-type.
If you are using a newer db2 version, that has REGEXP functions, you can try using REGEXP_LIKE() function.
Follow an example from samble db:
SELECT EMPNO, LASTNAME FROM EMPLOYEE WHERE REGEXP_LIKE(LASTNAME,'[E-H]')
EMPNO LASTNAME
------ ---------------
000010 HAAS
000020 THOMPSON
000050 GEYER
000060 STERN
000090 HENDERSON
000100 SPENSER
000110 LUCCHESSI
000120 O'CONNELL
000140 NICHOLLS
000170 YOSHIMURA
000180 SCOUTTEN
000190 WALKER
000210 JONES
000230 JEFFERSON
000250 SMITH
000260 JOHNSON
000270 PEREZ
000280 SCHNEIDER
000290 PARKER
000300 SMITH
000310 SETRIGHT
000320 MEHTA
000330 LEE
000340 GOUNOT
200010 HEMMINGER
200220 JOHN
200240 MONTEVERDE
200280 SCHWARTZ
200310 SPRINGER
200330 WONG
30 record(s) selected.
All names selected contains letters from E to H, as specified by the search-pattern.
As I didn't have any row containing such ranges.. I updated one of the rows, adding chars 169 and 174 to it.
Update employee set LASTNAME = ('LEE' || chr(169) || chr(174)) WHERE LASTNAME = 'LEE'
and, using this REGEXP_LIKE function:
SELECT EMPNO, LASTNAME FROM EMPLOYEE WHERE REGEXP_LIKE(LASTNAME , '[' || CHR(127) || '-' || CHR(255) || ']')"
EMPNO LASTNAME
------ ---------------
000330 LEE©®
1 record(s) selected.
Regards

In SAS, how to concatenate multiple rows into 1 by some ID

I have a table like this
org_ID linenr text
811558672 10 Legevirksomhet.
811560782 10 Clavier Classics er et musikkselskap som produserer komposisjoner og
811560782 20 arrangementer av svÃ¦rt hÃ¸y kvalitet. De kombinerer den klassiske
811560782 30 musikktradisjonen med moderne teknikker og deres kunder spenner fra
811560782 40 individuelle musikere til ensembler, festivalarrangÃ¸rer, konserthus,
811560782 50 kulturinstitusjoner, eventskapere og mediaprodusenter.
811560812 10 Grafisk design, illustrasjon og nÃ¦rliggende virksomhet.
811561592 10 Sosial- og helsetjenesten. Konsulentvirksomhet: Veiledning til
811561592 20 foreldre, fosterhjem, skole og barnehage.
As you can see, for some org_ID, they appear multiple times because one line of text is not enough for them. When this happens, the linenr shows multiple numbers. Now I want to concatenate multiple lines of text into one when org_ID is the same. How shall I do this? Many thanks in advance.

Use SAS Retain functionality to concatenate the text and only output when an new org_ID is read.
Note: The two IF statements handles the cases of first row and last row; where there is no Previous ID or no Next ID.
Working Code: (Your Input data must be sorted)
data have;
infile datalines dlm=',' dsd;
length org_ID 8. linenr 8. text $200.;
input org_ID linenr text $;
datalines;
811558672,10, "Legevirksomhet."
811560782,10, "Clavier Classics er et musikkselskap som produserer komposisjoner og"
811560782,20, "arrangementer av svÃ¦rt hÃ¸y kvalitet. De kombinerer den klassiske"
811560782,30, "musikktradisjonen med moderne teknikker og deres kunder spenner fra"
811560782,40, "individuelle musikere til ensembler, festivalarrangÃ¸rer, konserthus,"
811560782,50, "kulturinstitusjoner, eventskapere og mediaprodusenter."
811560812,10, "Grafisk design, illustrasjon og nÃ¦rliggende virksomhet."
811561592,10, "Sosial- og helsetjenesten. Konsulentvirksomhet: Veiledning til"
811561592,20, "foreldre, fosterhjem, skole og barnehage."
;
run;
data want;
set have nobs=nobs;
retain longtext;
retain id;
if(_N_=1) then do; longtext=text; id=org_ID; end;
else if org_ID ne id then do; output; longtext=text; id=org_ID; end;
else longtext=cats(longtext,text);
if (_N_=nobs) then do; output; end;
keep org_ID longtext;
run;
Output:
org_ID=811560782 longtext=Legevirksomhet.
org_ID=811560812 longtext=Clavier Classics er et musikkselskap som produserer komposisjoner ogarrangementer av svÃ¦rt hÃ¸y kva
org_ID=811561592 longtext=Grafisk design, illustrasjon og nÃ¦rliggende virksomhet.
org_ID=811561592 longtext=Sosial- og helsetjenesten. Konsulentvirksomhet: Veiledning tilforeldre, fosterhjem, skole og barneha

A DOW loop can accumulate each line of text in the org_ID group into a final longtext. The longtext should be assigned a specific length in order to prevent truncations that may occur if default lengths are used. You may or may not want a space separator between lines that are concatenated.
data want(keep=org_ID longtext);
do until (last.org_ID);
set have;
by org_ID;
length longtext $2000;
longtext = catx(' ', longtext, text);
end;
run;
If the data is not sorted, but the org_ID rows are contiguous, you can use
by org_ID notsorted;
So what is happening ?
Longtext is a non dataset variable, so it is automatically reset to missing at the top of the data step.
The data step iterates for each row in the group until the last row in the group.
The length of the variable longtext is specified after the set statement so it be the last variable in program data vector (pdv), and thus be the second column of the kept variables.
catx is used accumulate the concatenations of the text data within the group. A space is used to separate the text data parts.
If you do not want the space separator, accumulate using
longtext = cats(longtext, text);

SAS proc sql left join to hash object

I'm new to hash objects, but I'd like to learn more about them. I'm trying to find ways to substitute proc sql with hash object. I have to two tables, when i have a proc sql with inner join and an equal with hash object it works, but when i have a proc sql with left join i don´t know how to make in hash object. Thankyou very much. Sorry by my English.
Table01.
data Principal;
input idd $ name $ Apellid1 $ valor $;
datalines;
1977 Arthur Pendrag 0001
1978 Minerva Athena 0001
2011 Noe Arca 0001
2014 Thor Hammer 0001
0001 Seiya Pegaso 0001
0002 Ikki Fenix 0001
0003 Shun Andromeda 0001
0004 Shiryu Dragon 0001
0005 Yoga Cisne 0001
0006 Naruto Konoha 0001
0007 Sasuke Kun 0001
;
Table02
data Secundarea;
input idd $ Apellid2 $ mival $;
datalines;
1977 Excalibu 0003
1978 Atenea 0004
2011 Biblia 0005
2014 Odin 0006
0001 Sagigario 0007
0002 Virgo 0008
0003 Piscis 0009
0004 Libra 0010
0005 Acuario 0011
0008 Aries 0012
;
Proc sql inner join
proc sql;
create table sqlinner as
select *
from principal as p inner join secundarea as s
on p.idd=s.idd;
quit;
Hash object (inner join) it works
data mihashInner;
declare hash h();
h.defineKey('IDD');
h.defineData ('IDD','APELLID2','MIVAL');
h.defineDone();
do until(fin1);
set SECUNDAREA end=fin1;
h.add();
end;
do until (fin2);
set PRINCIPAL end=fin2;
if h.find()=0 then
output;
end;
run;
Proc sql (left join)
proc sql;
create table sqlleft as
select *
from principal as p left join secundarea as s
on p.idd=s.idd;
quit;
How to make in hash object? I´m trying two ways.
data mihashLeft2;
declare hash h();
h.defineKey('IDD');
h.defineData ('IDD','APELLID2','MIVAL');
h.defineDone();
do until(fin1);
set SECUNDAREA end=fin1;
h.add();
end;
do until (fin2);
set PRINCIPAL end=fin2;
rc=h.find();
output;
end;
run;
Or this. But nothing. Thx.
data mihashLeft;
if 0 then set SECUNDAREA;
if _n_ =1 then do;
declare hash hhh(dataset: 'SECUNDAREA', multidata:'y');
hhh.DefineKey('IDD');
hhh.DefineData('IDD','APELLID2','MIVAL');
hhh.DefineDone();
set PRINCIPAL;
rc = hhh.find();
if rc ne 0 then do;
call missing(MIVAL);
output;
end;
else
do while(rc = 0);
output;
rc = hhh.find_next();
end;
end;
run;

You could try to do it like this:
data mihashLeft(drop=rc);
/*iterate left data set*/
set PRINCIPAL;
/*declare variables from hash set*/
length APELLID2 MIVAL $8 rc 8;
/*declare hash*/
if _n_=1 then do;
declare hash hhh(dataset: 'SECUNDAREA', multidata:'y');
hhh.DefineKey('IDD');
hhh.DefineData('APELLID2','MIVAL');
hhh.DefineDone();
end;
/*look for first row from hash set and output it even if it's not found*/
rc = hhh.find();
output;
/*loop to find other rows from the hash set*/
do while(rc=0);
rc = hhh.find_next();
/*output only if you found something*/
if rc=0 then output;
end;
run;

data mihashLeft;
set PRINCIPAL; /* left */
if _n_ = 1 then do;
if 0 then set SECUNDAREA;
dcl hash b (dataset: "SECUNDAREA", multidata: "y",ordered:'y');
b.definekey ("IDD");
b.definedata (all:'Y');
b.definedone ();
end;
if b.find() eq 0 then output;
/*if b.find() ne 0 then call missing(right_table_column);*/
/*if suppose you are pulling any column from right table then include above line*/
run;

In SAS, how do you collapse multiple rows into one row based on some ID variable?

The data I am working with is currently in the form of:
ID Sex Race Drug Dose FillDate
1 M White ziprosidone 100mg 10/01/98
1 M White ziprosidone 100mg 10/15/98
1 M White ziprosidone 100mg 10/29/98
1 M White ambien 20mg 01/07/99
1 M White ambien 20mg 01/14/99
2 F Asian telaprevir 500mg 03/08/92
2 F Asian telaprevir 500mg 03/20/92
2 F Asian telaprevir 500mg 04/01/92
And I would like to write SQL code to get the data in the form of:
ID Sex Race Drug1 DrugDose1 FillDate1_1 FillDate1_2 FillDate1_3 Drug2 DrugDose2 FillDate2_1 FillDate2_2 FillDate2_3
1 M White ziprosidone 100mg 10/01/98 10/15/98 10/29/98 ambien 20mg 01/07/99 01/14/99 null
2 F Asian telaprevir 500mg 03/08/92 03/20/92 04/01/92 null null null null null
I need just one row for each unique ID with all of the unique drug/dose/fill info in columns, not rows. I suppose it can be done using PROC TRANSPOSE, but I am not sure of the most efficient way of doing the multiple transposes. I should note that I have over 50,000 unique IDs, each with varying amounts of drugs, doses, and corresponding fill dates. I would like to return null/empty values for those columns that do not have data to fill in. Thanks in advance.

To some extent, the desired efficiency of this determines the best solution.
For example, assuming you know the maximum reasonable number of fill dates, you could use the following to very quickly get a transposed table - likely the fastest way to do that - but at the cost of needing a large amount of post-processing, as it will output a lot of data you don't really want.
proc summary data=have nway;
class id sex race;
output out=want (drop=_:)
idgroup(out[5] (drug dose filldate)=) / autoname;
run;
On the other side of things, the vertical-and-transpose is the "best" solution in terms of not requiring additional steps; though it might be slow.
data have_t;
set have;
by id sex race drug dose notsorted;
length varname value $64; *some reasonable maximum, particularly for the drug name;
if first.ID then do;
drugcounter=0;
end;
if first.dose then do;
drugcounter+1;
fillcounter=0;
varname = cats('Drug',drugcounter);
value = drug;
output;
varname = cats('DrugDose',drugcounter);
value = dose;
output;
end;
call missing(value);
fillcounter+1;
varname=cats('Filldate',drugcounter,'_',fillcounter);
value_n = filldate;
output;
run;
proc transpose data=have_t(where=(not missing(value))) out=want_c;
by id sex race ;
id varname;
var value;
run;
proc transpose data=have_t(where=(not missing(value_n))) out=want_n;
by id sex race ;
id varname;
var value_n;
run;
data want;
merge want_c want_n;
by id sex race;
run;
It's not crazy slow, really, and odds are it's fine for your 50k IDs (though you don't say how many drugs). 1 or 2 GB of data will work fine here, especially if you don't need to sort them.
Finally, there are some other solutions that are in between. You could do the transpose entirely using arrays in the data step, for one, which might be the best compromise; you have to determine in advance the maximum bounds for the arrays, but that's not the end of the world.
It all depends on your data, though, which is really the best. I would probably try the data step/transpose first: that's the most straightforward, and the one most other programmers will have seen before, so it's most likely the best solution unless it's prohibitively slow.

Consider the following query using two derived tables (inner and outer) that establishes an ordinal row count by the FillDate order. Then, using the row count, if/then or case/when logic is used for iterated columns. Outer query takes the max values grouped by id, sex, race.
The only caveat is knowing ahead how many expected or max number of rows per ID (i.e., another query our table browse). Hence, fill in ellipsis (...) as needed. Do note, missings will generate for columns that do not apply to a particular ID. And of course please adjust to actual dataset name.
proc sql;
CREATE TABLE DrugTableFlat AS (
SELECT id, sex, race,
Max(Drug_1) As Drug1, Max(Drug_2) As Drug2, Max(Drug_3) As Drug3, ...
Max(Dose_1) As Dose1, Max(Dose_2) As Dose2, Max(Dose_3) As Dose3, ...
Max(FillDate_1) As FillDate1, Max(FillDate_2) As FillDate2,
Max(FillDate_3) As FillDate3 ...
FROM
(SELECT id, sex, race,
CASE WHEN RowCount=1 THEN Drug END AS Drug_1,
CASE WHEN RowCount=2 THEN Drug END AS Drug_2,
CASE WHEN RowCount=3 THEN Drug END AS Drug_3,
...
CASE WHEN RowCount=1 THEN Dose END AS Dose_1,
CASE WHEN RowCount=2 THEN Dose END AS Dose_2,
CASE WHEN RowCount=3 THEN Dose END AS Dose_3,
...
CASE WHEN RowCount=1 THEN FillDate END AS FillDate_1,
CASE WHEN RowCount=2 THEN FillDate END AS FillDate_2,
CASE WHEN RowCount=3 THEN FillDate END AS FillDate_3,
...
FROM
(SELECT t1.id, t1.sex, t1.race, t1.drug, t1.dose, t1.filldate,
(SELECT Count(*) FROM DrugTable t2
WHERE t1.filldate >= t2.filldate AND t1.id = t2.id) As RowCount
FROM DrugTable t1) AS dT1
) As dT2
GROUP BY id, sex, race);

Here's my attempt at an array-based solution:
/* Import data */
data have;
input #2 ID #9 Sex $1. #18 Race $5. #31 Drug $11. #44 Dose $5. #58 FillDate mmddyy8.;
format filldate yymmdd10.;
cards;
1 M White ziprosidone 100mg 10/01/98
1 M White ziprosidone 100mg 10/15/98
1 M White ziprosidone 100mg 10/29/98
1 M White ambien 20mg 01/07/99
1 M White ambien 20mg 01/14/99
2 F Asian telaprevir 500mg 03/08/92
2 F Asian telaprevir 500mg 03/20/92
2 F Asian telaprevir 500mg 04/01/92
;
run;
/* Calculate array bounds - SQL version */
proc sql _method noprint;
select DATES into :MAX_DATES_PER_DRUG trimmed from
(select count(ID) as DATES from have group by ID, drug, dose)
having DATES = max(DATES);
select max(DRUGS) into :MAX_DRUGS_PER_ID trimmed from
(select count(DRUG) as DRUGS from
(select distinct DRUG, ID from have)
group by ID
)
;
quit;
/* Calculate array bounds - data step version */
data _null_;
set have(keep = id drug) end = eof;
by notsorted id drug;
retain max_drugs_per_id max_dates_per_drug;
if first.id then drug_count = 0;
if first.drug then do;
drug_count + 1;
date_count = 0;
end;
date_count + 1;
if last.id then max_drugs_per_id = max(max_drugs_per_id, drug_count);
if last.drug then max_dates_per_drug = max(max_dates_per_drug, date_count);
if eof then do;
call symput("max_drugs_per_id" ,cats(max_drugs_per_id));
call symput("max_dates_per_drug",cats(max_dates_per_drug));
end;
run;
/* Check macro vars */
%put MAX_DATES_PER_DRUG = "&MAX_DATES_PER_DRUG";
%put MAX_DRUGS_PER_ID = "&MAX_DRUGS_PER_ID";
/* Transpose */
data want;
if 0 then set have;
array filldates[&MAX_DRUGS_PER_ID,&MAX_DATES_PER_DRUG]
%macro arraydef;
%local i;
%do i = 1 %to &MAX_DRUGS_PER_ID;
filldates&i._1-filldates&i._&MAX_DATES_PER_DRUG
%end;
%mend arraydef;
%arraydef;
array drugs[&MAX_DRUGS_PER_ID] $11;
array doses[&MAX_DRUGS_PER_ID] $5;
drug_count = 0;
do until(last.id);
set have;
by ID drug dose notsorted;
if first.drug then do;
date_count = 0;
drug_count + 1;
drugs[drug_count] = drug;
doses[drug_count] = dose;
end;
date_count + 1;
filldates[drug_count,date_count] = filldate;
end;
drop drug dose filldate drug_count date_count;
format filldates: yymmdd10.;
run;
The data step code for calculating the array bounds is probably more efficient than the SQL version, but it's also bit more verbose. On the other hand, with the SQL version you also have to trim whitespace from the macro vars. Fixed - thanks Tom!
The transposing data step is probably also at the more efficient end of the scale compared to the proc transpose / proc sql options in the other answers, as it makes only 1 further pass through the dataset, but again it's also fairly complex.

SAS hash join in form LIKE or =:

is it possible to do a SAS hash lookup on a partial substring?
So the hash table key will contain: 'LongString' but my target table key has: 'LongStr'
(the target table key string length may vary)

You can but it's not pretty and you may not get the performance benefits you're looking for. Also, depending on the length of your strings and the size of your table you may not be able to fit all the hashtable elements into memory.
The trick is to first generate all of the possible substrings and then to use the 'multidata' option on the hashtable.
Create a dataset containing words we want to match against:
data keys;
length key $10 value $1;
input key;
cards;
LongString
LongOther
;
run;
Generate all possible substrings:
data abbreviations;
length abbrev $10;
set keys;
do cnt=1 to length(key);
abbrev = substr(key,1,cnt);
output;
end;
run;
Create a dataset containing terms we want to search for:
data match_attempts;
length abbrev $10;
input abbrev ;
cards;
L
Long
LongO
LongSt
LongOther
;
run;
Perform the lookup:
data match;
length abbrev key $10;
set match_attempts;
if _n_ = 1 then do;
declare hash h1(dataset:'abbreviations', multidata: 'y');
h1.defineKey('abbrev');
h1.defineData('abbrev', 'key');
h1.defineDone();
call missing(abbrev, key);
end;
if h1.find() eq 0 then do;
output;
h1.has_next(result: r);
do while(r ne 0);
h1.find_next();
output;
h1.has_next(result: r);
end;
end;
drop r;
run;
Output (notice how 'Long' returns 2 matches):
Obs abbrev key
=== ========= ==========
1 Long LongString
2 Long LongOther
3 LongO LongOther
4 LongSt LongString
5 LongOther LongOther
A few more notes. The reason the hash table will not support something like the like operator is because it 'hashes' the key prior to inserting a record into the hash table. When a lookup is performed the value to lookup is 'hashed' and then a match is performed on the hashed values. When a value is hashed even a small change in the value will yield a completely different result. Take the below example, hashing 2 almost identical strings yields 2 completely different values:
data _null_;
length hashed_value $16;
hashed_value = md5("String");
put hashed_value= hex32.;
hashed_value = md5("String1");
put hashed_value= hex32.;
run;
Output:
hashed_value=27118326006D3829667A400AD23D5D98
hashed_value=0EAB2ADFFF8C9A250BBE72D5BEA16E29
For this reason, the hash table cannot use the like operator.
Finally, thanks to #vasja for some sample data.

You have to use Iterator object to loop through the keys and do the matching by yourself.
data keys;
length key $10 value $1;
input key value;
cards;
LongString A
LongOther B
;
run;
proc sort data=keys;
by key;
run;
data data;
length short_key $10;
input short_key ;
cards;
LongStr
LongSt
LongOther
LongOth
LongOt
LongO
LongSt
LongOther
;
run;
data match;
set data;
length key $20 outvalue value $1;
drop key value rc;
if _N_ = 1 then do;
call missing(key, value);
declare hash h1(dataset:"work.keys", ordered: 'yes');
declare hiter iter ('h1');
h1.defineKey('key');
h1.defineData('key', 'value');
h1.defineDone();
end;
rc = iter.first();/* reset to beginning */
do while (rc = 0);/* loop through the long keys and find a match */
if index(key, trim(short_key)) > 0 then do;
outvalue = value;
iter.last(); /* leave after match */
end;
rc = iter.next();
end;
run;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

SAS hash merge -- smaller dataset as hash object - hash

Related

how to find non- Ascii key symbols in a string. DB2?

In SAS, how to concatenate multiple rows into 1 by some ID

SAS proc sql left join to hash object

In SAS, how do you collapse multiple rows into one row based on some ID variable?

SAS hash join in form LIKE or =:

Categories

Resources