I need to break down this SAS macro that adds suffixes to some number of variables into pseudocode, but there are some parts of it I don't fully understand.
%macro add_suffix(lib,dsn, suffix);
options pageno=1 nodate;
OPTIONS OBS= 1;
DATA GRIDWORK.TMP;
SET &lib..&dsn.;
RUN;
proc sql noprint;
select nvar into :num_vars
from dictionary.tables
where libname="GRIDWORK" and
memname="TMP";
select distinct(name) into :var1-
:var%TRIM(%LEFT(&num_vars))
from dictionary.columns
where libname="GRIDWORK" and
memname="TMP";
quit;
run;
OPTIONS OBS= MAX;
proc datasets library=&LIB;
modify &DSN;
rename
%do i=1 %to &num_vars;
&&var&i=&&var&i..&suffix
%end;
;
quit;
run;
proc datasets library=&LIB;
modify &DSN;
rename pers_gen_key&suffix = pers_gen_key;
quit;
run;
proc sql;
drop table gridwork.tmp;
quit;
%mend add_suffix;
1) In this part of the code:
DATA GRIDWORK.TMP;
SET &lib..&dsn.;
RUN;
How can you have setting a dataset equal to two values? Is it setting GRIDWORK.TMP to the concatenation of &lib and &dsn? What exactly do the multiple periods mean here?
2) I understand that this section is storing variables in an array:
proc sql noprint;
select nvar into :num_vars
from dictionary.tables
where libname="GRIDWORK" and
memname="TMP";
select distinct(name) into :var1-
:var%TRIM(%LEFT(&num_vars))
from dictionary.columns
where libname="GRIDWORK" and
memname="TMP";
quit;
How exactly do dictionary.tables and dictionary.columns work, and how do they differ from eachother in this context? Here is the documentation, I read through it but am still having trouble understanding what exactly is going on in this section of the code.
3) Towards the end of the macro we have:
OPTIONS OBS= MAX;
proc datasets library=&LIB;
modify &DSN;
rename
%do i=1 %to &num_vars;
&&var&i=&&var&i..&suffix
%end;
;
quit;
run;
Here is the documentation for the proc datasets procedure. It says it names the library that the procedure processes. Does this mean that &dsn is part of the &lib library? I guess I am unsure of how libraries work in SAS. Are they built in, or user-defined? Why are they necessary, couldn't we just modify &DSN on its own?
SAS has two level references, library name and then data set name. The first macro variable points to the library and the second to the data set name. A period tells the macro processor where the macro variable ends and the second period is used to separate the libname from the data set name.
Its not storing in arrays, its creating macro variables. The Dictionary tables are metadata about your tables. I would recommend actually looking at them. The difference between the tables is that TABLES has information on your dataset and COLUMNS has information on the variables in each table.
A library is simply a directory/folder where SAS datasets are stored. This allows SAS to reference different directories to save files, and allows users to implement organizational systems on their data. &dsn is a data set in the &lib folder.
I highly recommend you look into the %put statement and place it in various parts of the code to see exactly what the code is doing.
Related
I want to merge two sas datasets that have the same ID but there is always something wrong.
I checked that the variable baseid in two datasets are all character, so I tried to adjust the id formats for both datasets by using the same code like this
data a;
set a;
baseidtemp = put(baseid,12);
drop baseid;
rename baseidtemp = baseid;
run;
After that I sorted both datasets by baseid.
But I don't know why when I used proc compare to compare their ID, all of the obs are unequal even if their values are all the same.
I merged them in this way
data A;
merge A (in = a) B;
by baseid;
if a;
run;
They just cannot be merged correctly.
I am very confused about it, so could someone help me with how to solve this issue?
Thank you in advance!
It would be helpful to see the content of your data to know how they aren't merging correctly. But you could try this. I generally don't overwrite a data sets so made some new data names
data newa;
set a;
baseidtemp = baseid;
drop baseid;
run;
proc sort data=newa out=outA;
by baseidtemp;
run;
proc sort data=b out=outB;
by baseid;
run;
data new;
merge outA outB (rename = (baseid = baseidtemp));
by baseidtemp;
run;
New here, so if I did something wrong, I apologize. I'm new user of SAS as well.
I created a macro that calls first a proc sql that creates a certain table that I want to pass it to another macro (inside the first macro).
%Macro Mc_Copy_Table (TABLE_NAME);
proc sql;
create table &TABLE_NAME as
select *
from OR_IN.&TABLE_NAME;
connect using OR_OUT;
execute (truncate table &TABLE_NAME) By OR_OUT;
disconnect from OR_OUT;
quit;
%MC_obsnvars(&TABLE_NAME);
%put &Nobs;
%if &Nobs > 100000 %then
%do; /* use of the sql loader */
proc append base = OR_OU. &TABLE_NAME (&BULKLOAD_OPTION)
data = &TABLE_NAME;
run;
%end;
%else
%do;
proc append base = OR_OU. &TABLE_NAME (Insertbuff=10000)
data = &TABLE_NAME;
run;
%end;
%Mend Mc_Copy_Table;
The Mc_Obsnvars macro use the attrn function to get the number of observations from the given dataset (it opens the dataset first). Depending on the number of observations, my program either use the sqlloader or not. OR_IN and OR_OUT are libnames (oracle engine).
When The macro Mc_Copy_Table is executed, with let's say TABLE1 as argument, the Mc_Obsnvars is executed first which tries to open TABLE1 which doesn't exist yet. The proc sql is executed afterwards.
Why the macro is executed before the proc sql ? and is there any way to have the proc sql be executed first ? putting the proc sql part in a macro doesn't solve the problem. Thanks :)
I think you have a syntax issue, as Quentin alludes to in his comment. This works OK for me:
%macro copy_table(intable, outtable);
proc sql noprint;
create table &outtable as
select * from &intable;
%count_obs(&outtable);
%put NOBS:&nobs;
quit;
%mend;
%macro count_obs(table);
%global nobs;
select count(*) into :nobs trimmed from &table;
%mend;
data test;
do i=1 to 10;
output;
end;
run;
%copy_table(test,test2);
Note however, you don't have to do the count. There is an automatic variable from PROC SQL called &sqlobs with the number of records returned from the last query.
So this gives you what you are looking for, I think:
%macro copy_table(intable, outtable);
proc sql noprint;
create table &outtable as
select * from &intable
where i < 5;
%let nobs=&sqlobs;
%put NOBS:&nobs;
quit;
%mend;
%copy_table(test,test2);
I cannot run my PROC SQL function by calling the macro in my Data step.
The SQL function alone works, but I need to let it run for every Security Group.
%macro adding;
proc sql;
insert into have (Time,seconds) values
("9:10:00"t,33000);
insert into have (Time,seconds) values
("16:50:00"t,60600);
quit;
%mend;
data have;
set have;
by security;
if first.security then %adding;
if seconds=33000 then date=lag(date);
if seconds=60600 then date=lag(date);
run;
The error is:
1 proc sql; insert into have (Time,seconds)
values
---- ------
180 180 180 180 1 ! ("9:10:00"t,33000); insert into have (Time,seconds) values 1 !
("16:50:00"t,60600); quit; ERROR 180-322: Statement is
not valid or it is used out of proper order.
I don't know what to change that I can use it...
Thankful for any help! Best
Use call execute to call the macro.
If first.security then call execute('%adding');
However, the macro will run AFTER the data step, not during.
Also, trying to change the data in place that many ways could lead to difficulties in debugging. Your DATA, SET, and SQL all reference the same data set.
If you're trying to change the data in your proc and add records you may want to consider using explicit OUTPUT statements within the data step itself. You could use a macro to generate these statements if desired.
If first.security then do;
Time=...;
Seconds=....;
Output;
Time=....;
Seconds=....;
Output;
End;
*rest of SAS code;
Output; Add an explicit output if required;
Run;
You also shouldn't be calculating a lagged value conditionally, as the lag is a queue. You'll get unexpected behaviour. I haven't highlighted all the issues with your process but this should be enough to help you find the rest.
I'm trying to write a macro. The Marco is about output duplicates. And then append all of them into one dataset. This dataset will have two columns: table name (which I select in one library), primary keys.
So, how can I get all the table names in Macro?
I thought I can do: dataset='&data.' as a new column into this dataset. But the macro will treat all of them as &data. instead of swap to the table names.
Thank you
It was not easy to identify exactly what you are asking for but here are two starting points you can use.
You can use the :into option, to make a macro variable that holds a list of many things separated by a delimiter:
PROC SQL NOPRINT;
SELECT EMPID,
INTO :E1 SEPERATED BY “ , ”
FROM dset;
QUIT;
%PUT &E1;
You could also save numbered macro variables and a &varnum (number of datasets saved as a variable) and do an append within a loop (say you have the datasets listed var1 to varx):
*needs to be in a macro;
%macro loop_through();
%do i = 1 %to &Varnum;
/* proc append code here with data = &&var&i
(etc, &&var&i will resolve to &var1, &var2 and so on) */
%end;
%mend;
%loop_through();
I'm trying to convert a string column which is in sybase in the below format into SAS date.
The sybase table has string values like this
2015-04-23 04:04:46.517
2015-04-22 04:04:35.162
2015-04-21 04:04:43.646
I need to get the max of these values and store it in a max_tmsp variable and get the records where last_updt_tmsp > max_tmsp.
I referred to this link and tried to write some code but it is not working.
All this code is in Precode before the job starts.
proc sql noprint;
SELECT
select max(input(PROPERTY_VAL, MDYAMPMw.d)) into :last_updt_tmsp
from sybase_lib.prop_vals where property_key='last.update.date';
quit;
format &last_updt_tmsp. DATETIME18.;
data _null_;
call symput('lst_cre_dttm',"'"||"&last_updt_tmsp."||"'dt");
run;
%put lst_cre_dttm=&lst_cre_dttm
you can do this in a data step, try the following:
data datetime;
format new_date datetime24.3;
a="2015-04-23 04:04:46.517";
new_date=input(a, anydtdtm24.);
run;
Using proc sql you can try:
proc sql;
select max(input(a,anydtdtm24.)) format datetime24.3 into: max_date
from table1;
quit;
%put &max_date;
the point to remember is max of a character variable will not give you consistent results as compared to max of a numeric variable. You want the latter.
Putting together answer and comment about the answer
Data HAVE;
length PROPERTY_VAL $23;
Input PROPERTY_VAL $ 1-23;
Datalines;
2015-04-23 04:04:46.517
2015-04-22 04:04:35.162
2015-04-21 04:04:43.646
;
Run;
proc sql noprint;
select max(input(PROPERTY_VAL, anydtdtm24.)),
max(input(PROPERTY_VAL, anydtdtm24.)) format=datetime22.3
into :last_updt_tmsp, :last_updt_tmsp_f
from HAVE;
quit;
%Put LAST_UPDT_TMSP: &last_updt_tmsp (&last_updt_tmsp_f);
DCR and Carolina thank you very much. It worked. Can you please tell me the difference between max(input(PROPERTY_VAL, anydtdtm24.)) format=datetime22.3 and max(input(a,anydtdtm24.)) format datetime24.3 .
DCR i will keep your point in mind and let my team know and see if we can switch to the last_updt_dt column
Also is it possible to see what value is stored in &last_updt_tmsp in the log
I took carolina explanation though i'm sure DCR yours works too .. I did not change the bottom part
proc sql noprint;
SELECT
select max(input(PROPERTY_VAL, anydtdtm24.)) format=datetime22.3 into :last_updt_tmsp
from sybase_lib.prop_vals where property_key='last.update.date';
quit;
format &last_updt_tmsp. DATETIME18.;
data null;
call symput('lst_cre_dttm',"'"||"&last_updt_tmsp."||"'dt");
run;
%put lst_cre_dttm=&lst_cre_dttm