SAS Hash Tables Merge Error --> Key Mismatch - hash

I've been having trouble using hash tables to merge my 2 data sets. The hash table I declared has 1 key and 2 data variables.
data final_table;
if 0 then set hash_data;
if _N_=1 then do;
declare hash hashlookup (dataset:'hash_data');
hashlookup.definekey('key');
hashlookup.definedata('ABC', 'XYZ');
hashlookup.definedone();
end;
set datatabletwo;
rc = hashlookup.find(key:'key');
run;
The key is a numeric variable of the same length. I have already tried to reformat both keys to character, but the log still returns the following error message: ERROR: Type mismatch for key variable KEY at line 57 column 7.
Hope someone can help. Thanks in advance.

The problem here is that this
rc = hashlookup.find(key:'key');
looks up the string 'key' and not the value in variable key. Therefore, do this instead
data final_table;
if 0 then set hash_data;
if _N_=1 then do;
declare hash hashlookup (dataset:'hash_data');
hashlookup.definekey('key');
hashlookup.definedata('ABC', 'XYZ');
hashlookup.definedone();
end;
set datatabletwo;
rc = hashlookup.find();
run;

Related

Extracting Double Values from Blob/Object to Rows

I have a query that is related to this topic:
https://developer.jboss.org/thread/277610
Prior to reaching the comma separated values stage, the values are actually stored as a blob.
There is a function fetchBlobtoString(Blob, string, VARIADIC start_end integer) returns String that actually takes the blob input and then converts to comma separated values as seen on the post.
The issue with this is string is limited to 4000 characters, hence it will decimate the data and not all values show up. What would be the best way to extract the values that are double and convert it to rows similar to the post.
Would converting it in to an object instead of string improve performance using following function as an example:
fetchElementValueFromBlob(protobufBlob Blob, origName string) returns object
I have tried iterating items in blob using getItem function, add to temp table, but its slow and I get following error If i go more that 15-20 iterations:
Error: TEIID30504 Remote org.teiid.core.TeiidProcessingException: TEIID30504 petrelDS: TEIID60000 javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:/petrelDS
SQLState: 50000
ErrorCode: 30504
BEGIN
DECLARE integer VARIABLES.counter = 0;
DECLARE integer VARIABLES.pts = 100;
WHILE (VARIABLES.counter < VARIABLES.pts)
BEGIN
select wellbore_uwi,getItem(fetchBlob(data, 'md'),VARIABLES.counter) INTO TEMP from DirectionalSurvey where wellbore_uwi='1234567890';
VARIABLES.counter = (VARIABLES.counter + 1);
END
SELECT TEMP.wb_uwi,TEMP.depth FROM TEMP;
END
If I remove the getItem() function, the error goes away.

RIGHT Function in UPDATE Statement w/ Integer Field

I am attempting to run a simple UPDATE script on an integer field, whereby the trailing 2 numbers are "kept", and the leading numbers are removed. For example, "0440" would be updated as "40." I can get the desired data in a SELECT statement, such as
SELECT RIGHT(field_name::varchar, 2)
FROM table_name;
However, I run into an error when I try to use this same functionality in an UPDATE script, such as:
UPDATE schema_name.table_name
SET field_name = RIGHT(field_name::varchar, 2);
The error I receive reads:
column . . . is of type integer but expression is of type text . . .
HINT: You will need to rewrite or cast the expression
You're casting the integer to varchar but you're not casting the result back to integer.
UPDATE schema_name.table_name
SET field_name = RIGHT(field_name::TEXT, 2)::INTEGER;
The error is quite straight forward - right returns textual data, which you cannot assign to an integer column. You could, however, explicitly cast it back:
UPDATE schema_name.table_name
SET field_name = RIGHT(field_name::varchar, 2)::int;
1 is a digit (or a number - or a string), '123' is a number (or a string).
Your example 0440 does not make sense for an integer value, since leading (insignificant) 0 are not stored.
Strictly speaking data type integer is no good to store the "trailing 2 numbers" - meaning digits - since 00 and 0 both result in the same integer value 0. But I don't think that's what you meant.
For operating on the numeric value, don't use string functions (which requires casting back and forth. The modulo operator % does what you need, exactly: field_name%100. So:
UPDATE schema_name.table_name
SET field_name = field_name%100
WHERE field_name > 99; -- to avoid empty updates

Outputting conditionally from merge

I want to update a history file in SAS. I have new observations, which may overlap with existing data lines.
What is needed, is a file, which would have lines from dataset (new_data) where they exist and in case the lines do not exist, then from old set (old_data). What I've come up is a clunky merge operation, which is conditional on the order of the datasets. (==Works only if New_data is after Old_data. :?)
data new_data;
input key value;
datalines;
1 10
1 11
2 20
2 21
;
run;
data old_data;
input key value;
datalines;
2 50
2 51
3 30
3 31
;
run;
So I'd like to have the following:
key value
1 10
1 11
2 20
2 21
3 30
3 31
However the following does not work. It produces the output below it.
data updated_history;
merge New_data(in=a) old_data(in=b) ;
by key;
if a or (b and not a );
run;
....
2 50
2 51
...
But for some reason this does:
data updated_history;
merge old_data(in=b) New_data(in=a);
by key;
if a or (b and not a );
run;
Question: Is there an intelligent way to manage from which dataset the values are select from. Something like: if a then value_from_dataset a;
The order in which you list the data sets in the MERGE is the order the data is taken. So when the order is old, new values from old are read and then values from new overwrite the values from old. This is why your second version works and the first does not.
Since you have multiple observations per key value you probably do NOT want to use MERGE to combine these files. You could do it using SET by reading the data twice using two DOW loops. In that case it won't matter the order of the dataset in the SET statement since the records are interleaved instead of being joined. This first loop will calculate which of the two input datasets have any observations for this KEY value.
data want ;
anyold=0;
anynew=0;
do until (last.key);
set old_data (in=inold) new_data(in=innew);
by key ;
if inold then anyold=1;
if innew then anynew=1;
end;
do until (last.key);
set old_data (in=inold) new_data(in=innew);
by key ;
if not (anyold and anynew and inold) then output;
end;
drop anyold anynew;
run;
This type of combination is probably easier to code using SQL.
proc sql ;
create table want as
select key,value from new_data
union
select key,value from old_data
where key in (select key from old_data except select key from new_data)
order by 1
;
quit;

SAS: Coding a dummy variable for a value of a variable by group within group

I have a dataset of CASE_ID (x y and z), a set of multiple dates (including duplicate dates) for each CASE_ID, and a variable VAR. I would like to create a dummy variable DUMMYVAR by group within a group whereby if VAR="C" for CASE_ID x on some specific date, then DUMMYVAR=1 for all observations corresponding to CASE_ID x on with that date.
I believe that a Classic 2XDOW would be the key here but this is my third week using SAS and having difficulty getting this by two BY groups here.
I have referenced and attempted to write a variation of Haikuo's code here:
PROC SORT have;
by CASE_ID DATE;
RUN;
data want;
do until (last.DATE);
set HAVE;
by date notsorted;
if var='c' then DUMMYVAR=1;
do until (last.DATE);
set HAVE;
by DATE notsorted;
if DATE=1 then ????????
end;
run;
Change your BY statements to match the grouping you are doing. And in the second loop add a simple OUTPUT; statement. Then your new dataset will have all the rows in your original dataset and the new variable DUMMYVAR.
data want;
do until (last.DATE);
set HAVE;
by case_id date;
if var='c' then DUMMYVAR=1;
end;
do until (last.DATE);
set HAVE;
by case_id date;
output;
end;
run;
This will create the variable DUMMYVAR with values of either 1 or missing. If you want the values to be 1 or 0 then you could either set it to 0 before the first DO loop. Or add if first.date then dummyvar=0; statement before the existing IF statement.

how to use identity and concatenate it into the other columns of the table?

If I generate an identity for a table on the column cust-id, I want the next column userid to be cust-id+CID.
E.g. 000000001CID, 0000000002CID
What sql do I include for this?
Similarly if I have 00001 in the column Cust-id and abcd in the column section, the 3rd column must have value 00001abcd
Please let me know the solutions
You just need to create a trigger. Something like
CREATE TRIGGER A
BEFORE INSERT ON TABLE B
REFERENCING NEW AS N
FOR EACH ROW
BEGIN
SET N.userid = N.CUST_ID + N.CID ;
IF (N.CUST_ID = '00001' AND N.SECTION = 'abcd') THEN
SET N.THIRD = N.CUST_ID + N.SECTION
END IF;
END #
By the way, generating values in column shows that your module is not normalize, and sometimes this is a source of errors.