Merging two SAS datasets by ID - merge

I want to merge two sas datasets that have the same ID but there is always something wrong.
I checked that the variable baseid in two datasets are all character, so I tried to adjust the id formats for both datasets by using the same code like this
data a;
set a;
baseidtemp = put(baseid,12);
drop baseid;
rename baseidtemp = baseid;
run;
After that I sorted both datasets by baseid.
But I don't know why when I used proc compare to compare their ID, all of the obs are unequal even if their values are all the same.
I merged them in this way
data A;
merge A (in = a) B;
by baseid;
if a;
run;
They just cannot be merged correctly.
I am very confused about it, so could someone help me with how to solve this issue?
Thank you in advance!

It would be helpful to see the content of your data to know how they aren't merging correctly. But you could try this. I generally don't overwrite a data sets so made some new data names
data newa;
set a;
baseidtemp = baseid;
drop baseid;
run;
proc sort data=newa out=outA;
by baseidtemp;
run;
proc sort data=b out=outB;
by baseid;
run;
data new;
merge outA outB (rename = (baseid = baseidtemp));
by baseidtemp;
run;

Related

SAS Calling ProcSQL-Macro in Data Step

I cannot run my PROC SQL function by calling the macro in my Data step.
The SQL function alone works, but I need to let it run for every Security Group.
%macro adding;
proc sql;
insert into have (Time,seconds) values
("9:10:00"t,33000);
insert into have (Time,seconds) values
("16:50:00"t,60600);
quit;
%mend;
data have;
set have;
by security;
if first.security then %adding;
if seconds=33000 then date=lag(date);
if seconds=60600 then date=lag(date);
run;
The error is:
1 proc sql; insert into have (Time,seconds)
values
---- ------
180 180 180 180 1 ! ("9:10:00"t,33000); insert into have (Time,seconds) values 1 !
("16:50:00"t,60600); quit; ERROR 180-322: Statement is
not valid or it is used out of proper order.
I don't know what to change that I can use it...
Thankful for any help! Best
Use call execute to call the macro.
If first.security then call execute('%adding');
However, the macro will run AFTER the data step, not during.
Also, trying to change the data in place that many ways could lead to difficulties in debugging. Your DATA, SET, and SQL all reference the same data set.
If you're trying to change the data in your proc and add records you may want to consider using explicit OUTPUT statements within the data step itself. You could use a macro to generate these statements if desired.
If first.security then do;
Time=...;
Seconds=....;
Output;
Time=....;
Seconds=....;
Output;
End;
*rest of SAS code;
Output; Add an explicit output if required;
Run;
You also shouldn't be calculating a lagged value conditionally, as the lag is a queue. You'll get unexpected behaviour. I haven't highlighted all the issues with your process but this should be enough to help you find the rest.

how to put the name of a dataset as a column in SAS Macro

I'm trying to write a macro. The Marco is about output duplicates. And then append all of them into one dataset. This dataset will have two columns: table name (which I select in one library), primary keys.
So, how can I get all the table names in Macro?
I thought I can do: dataset='&data.' as a new column into this dataset. But the macro will treat all of them as &data. instead of swap to the table names.
Thank you
It was not easy to identify exactly what you are asking for but here are two starting points you can use.
You can use the :into option, to make a macro variable that holds a list of many things separated by a delimiter:
PROC SQL NOPRINT;
SELECT EMPID,
INTO :E1 SEPERATED BY “ , ”
FROM dset;
QUIT;
%PUT &E1;
You could also save numbered macro variables and a &varnum (number of datasets saved as a variable) and do an append within a loop (say you have the datasets listed var1 to varx):
*needs to be in a macro;
%macro loop_through();
%do i = 1 %to &Varnum;
/* proc append code here with data = &&var&i
(etc, &&var&i will resolve to &var1, &var2 and so on) */
%end;
%mend;
%loop_through();

Breaking down a SAS macro into pseudocode

I need to break down this SAS macro that adds suffixes to some number of variables into pseudocode, but there are some parts of it I don't fully understand.
%macro add_suffix(lib,dsn, suffix);
options pageno=1 nodate;
OPTIONS OBS= 1;
DATA GRIDWORK.TMP;
SET &lib..&dsn.;
RUN;
proc sql noprint;
select nvar into :num_vars
from dictionary.tables
where libname="GRIDWORK" and
memname="TMP";
select distinct(name) into :var1-
:var%TRIM(%LEFT(&num_vars))
from dictionary.columns
where libname="GRIDWORK" and
memname="TMP";
quit;
run;
OPTIONS OBS= MAX;
proc datasets library=&LIB;
modify &DSN;
rename
%do i=1 %to &num_vars;
&&var&i=&&var&i..&suffix
%end;
;
quit;
run;
proc datasets library=&LIB;
modify &DSN;
rename pers_gen_key&suffix = pers_gen_key;
quit;
run;
proc sql;
drop table gridwork.tmp;
quit;
%mend add_suffix;
1) In this part of the code:
DATA GRIDWORK.TMP;
SET &lib..&dsn.;
RUN;
How can you have setting a dataset equal to two values? Is it setting GRIDWORK.TMP to the concatenation of &lib and &dsn? What exactly do the multiple periods mean here?
2) I understand that this section is storing variables in an array:
proc sql noprint;
select nvar into :num_vars
from dictionary.tables
where libname="GRIDWORK" and
memname="TMP";
select distinct(name) into :var1-
:var%TRIM(%LEFT(&num_vars))
from dictionary.columns
where libname="GRIDWORK" and
memname="TMP";
quit;
How exactly do dictionary.tables and dictionary.columns work, and how do they differ from eachother in this context? Here is the documentation, I read through it but am still having trouble understanding what exactly is going on in this section of the code.
3) Towards the end of the macro we have:
OPTIONS OBS= MAX;
proc datasets library=&LIB;
modify &DSN;
rename
%do i=1 %to &num_vars;
&&var&i=&&var&i..&suffix
%end;
;
quit;
run;
Here is the documentation for the proc datasets procedure. It says it names the library that the procedure processes. Does this mean that &dsn is part of the &lib library? I guess I am unsure of how libraries work in SAS. Are they built in, or user-defined? Why are they necessary, couldn't we just modify &DSN on its own?
SAS has two level references, library name and then data set name. The first macro variable points to the library and the second to the data set name. A period tells the macro processor where the macro variable ends and the second period is used to separate the libname from the data set name.
Its not storing in arrays, its creating macro variables. The Dictionary tables are metadata about your tables. I would recommend actually looking at them. The difference between the tables is that TABLES has information on your dataset and COLUMNS has information on the variables in each table.
A library is simply a directory/folder where SAS datasets are stored. This allows SAS to reference different directories to save files, and allows users to implement organizational systems on their data. &dsn is a data set in the &lib folder.
I highly recommend you look into the %put statement and place it in various parts of the code to see exactly what the code is doing.

How do I convert a number column like 200012 to a SAS date variable using PROC SQL?

I have a SAS data set with a text field customer_id and a numeric field month in the format YYYYMM. I'm creating a view of these data, and I want to convert the data to a standard SAS date that will (hopefully) be preserved on export. For example:
proc sql;
create view my_view as
select customer_id, month from raw_dataset;
quit;
proc export data = my_view
file = "out.dta"
dbms = stata replace;
quit;
Looking at the date documentation, it looks like the number is in the form (although not the data type) YYMMN., but I want it in a format that SAS can work with as a date, not just a number e.g. with proc expand.
I've seen a lot of questions using combinations of put and datepart, but since I don't want the variable as a string and don't already have a datetime variable, I'm not sure how to apply those.
How do I convert this column to a SAS date data type when I run this SQL query?
YYMMN. is the right informat to use, and input is exactly how you get it there so it's a date. Try it!
data want;
x='201403';
y=input(x,YYMMN6.);
put _all_;
run;
Of course, y now probably should be formatted nicely if you want to look at it, but it doesn't need to be.
In PROC SQL this works just the same way.
proc sql;
create view want_v as
select x, y, input(x,yymmn6.) as z format=date9.
from want;
quit;
This also adds a format so it looks readable, but it's still a date variable underneath identical to y.

Convert sybase string column into SAS Date

I'm trying to convert a string column which is in sybase in the below format into SAS date.
The sybase table has string values like this
2015-04-23 04:04:46.517
2015-04-22 04:04:35.162
2015-04-21 04:04:43.646
I need to get the max of these values and store it in a max_tmsp variable and get the records where last_updt_tmsp > max_tmsp.
I referred to this link and tried to write some code but it is not working.
All this code is in Precode before the job starts.
proc sql noprint;
SELECT
select max(input(PROPERTY_VAL, MDYAMPMw.d)) into :last_updt_tmsp
from sybase_lib.prop_vals where property_key='last.update.date';
quit;
format &last_updt_tmsp. DATETIME18.;
data _null_;
call symput('lst_cre_dttm',"'"||"&last_updt_tmsp."||"'dt");
run;
%put lst_cre_dttm=&lst_cre_dttm
you can do this in a data step, try the following:
data datetime;
format new_date datetime24.3;
a="2015-04-23 04:04:46.517";
new_date=input(a, anydtdtm24.);
run;
Using proc sql you can try:
proc sql;
select max(input(a,anydtdtm24.)) format datetime24.3 into: max_date
from table1;
quit;
%put &max_date;
the point to remember is max of a character variable will not give you consistent results as compared to max of a numeric variable. You want the latter.
Putting together answer and comment about the answer
Data HAVE;
length PROPERTY_VAL $23;
Input PROPERTY_VAL $ 1-23;
Datalines;
2015-04-23 04:04:46.517
2015-04-22 04:04:35.162
2015-04-21 04:04:43.646
;
Run;
proc sql noprint;
select max(input(PROPERTY_VAL, anydtdtm24.)),
max(input(PROPERTY_VAL, anydtdtm24.)) format=datetime22.3
into :last_updt_tmsp, :last_updt_tmsp_f
from HAVE;
quit;
%Put LAST_UPDT_TMSP: &last_updt_tmsp (&last_updt_tmsp_f);
DCR and Carolina thank you very much. It worked. Can you please tell me the difference between max(input(PROPERTY_VAL, anydtdtm24.)) format=datetime22.3 and max(input(a,anydtdtm24.)) format datetime24.3 .
DCR i will keep your point in mind and let my team know and see if we can switch to the last_updt_dt column
Also is it possible to see what value is stored in &last_updt_tmsp in the log
I took carolina explanation though i'm sure DCR yours works too .. I did not change the bottom part
proc sql noprint;
SELECT
select max(input(PROPERTY_VAL, anydtdtm24.)) format=datetime22.3 into :last_updt_tmsp
from sybase_lib.prop_vals where property_key='last.update.date';
quit;
format &last_updt_tmsp. DATETIME18.;
data null;
call symput('lst_cre_dttm',"'"||"&last_updt_tmsp."||"'dt");
run;
%put lst_cre_dttm=&lst_cre_dttm