SAS: formatting multiple proc freq using macros - macros

I don't have another analyst on my team at work and have a question about the most efficient way to run several proc freq concurrently.
My goal is to run about 160 different frequencies, and include formatting for all of them. I assume a macro is the fastest way, but I only have experience with basic macros. Below is my thought process assuming the data was already formatted:
%macro survey(question, formatA formatB);
proc freq;
table &question;
format &formatA &formatB;
%mend;
%survey (question, formatA, formatB);
"question", "formatA" and "formatB" will be strings of data for example:
-"question" would be KCI_1 KCI_2 through KCI_80
- "formatA" would be KCI_1fmt KCI_2fmt through KCI_80fmt
- "formatB" would be KCI_1fmt. KCI_2fmt. through KCI_80fmt.

Danielle:
You can use macro to assign known formats to variables that are not already formatted. The rest of the FREQ does not have to be macro-ized.
* make some survey data with unformatted responses;
data have;
do respondent_id = 1 to 10000;
array responses KCI_1-KCI_80;
do _n_ = 1 to dim(responses);
responses(_n_) = ceil(4*ranuni(123));
end;
output;
end;
run;
* make some format data for each question;
data responseMeanings;
length questionID 8 responseValue 8 responseMeaning $50;
do questionID = 1 to 80;
fmtname = cats('Q',questionID,'_fmt');
peg = ranuni (1234); drop peg;
do responseValue = 1 to 4;
select;
when (peg < 0.4) responseMeaning = scan('Never,Seldom,Often,Always', responseValue);
when (peg < 0.8) responseMeaning = scan('Yes,No,Don''t Ask,Don''t Tell', responseValue);
otherwise responseMeaning = scan('Nasty,Sour,Sweet,Tasty', responseValue);
end;
output;
end;
end;
run;
* create a custom format for the responses of each question;
proc format cntlin=responseMeanings(rename=(responseValue=start responseMeaning=label));
run;
* macro to associate variables with the corresponding custom format;
%macro format_each_response;
%local i;
format
%do i = 1 %to 80;
KCI_&i Q&i._fmt.
%end;
;
%mend;
* compute frequency counts;
proc freq data=have;
table KCI_1-KCI_80;
%format_each_response;
run;

Related

how to change a date format to a new one using proc format with the picture statement

I am trying to foramt an existing date format 07/06/2020 (DDMMYYYY) to 07_06_2020 and that the output will be a string, not an int.
my code:
%LET Run_Date = %SYSFUNC(TODAY(), MMDDYY8.) ;
PROC FORMAT ;
PICTURE Runner low-high = '99_99_9999' ;
RUN ;
DATA _NULL ;
Run_Date_2 = PUT(Run_Date, Runner.) ;
CALL SYMPUT('Run_Date_2 ', Run_Date_2) ;
RUN ;
%PUT %Run_Date_2 . ;
**output**: error.
Thanks
Try this. Remember to use the datatype=date option.
proc format;
picture dtfmt (default=10)
low - high = '%0d_%0m_%Y' (datatype=date)
;
run;
data test;
dt = "07jun2020"d;
dt_char = put(dt, dtfmt.);
format dt ddmmyy10.;
run;

Modification of a SAS macro to print dichotomous variable information

I'm trying to modify the following SAS macro so that it includes includes percentages for the variable CHD when it is equal to both 0 and 1. Currently this macro is only set up to print out the results of baseline variables when the CHD (chronic heart disease) is equal to 1. I think the modification needs to occur within the data routfreq&i step but I'm not quite sure how to set it up. I would then also need an additional column to print out 'No Coronary Heart Disease * % (n)".
%macro categ(pred,i);
proc freq data = heart;
tables &pred * chd / chisq sparse outpct out = outfreq&i ;
output out = stats&i chisq;
run;
proc sort data = outfreq&i;
by &pred;
run;
proc means data = outfreq&i noprint;
where chd ne . and &pred ne .;
by &pred;
var COUNT;
output out=moutfreq&i(keep=&pred total rename=(&pred=variable)) sum=total;
run;
data routfreq&i(rename = (&pred = variable));
set outfreq&i;
length varname $20.;
if chd = 1 and &pred ne .;
rcount = put(count,8.);
rcount = "(" || trim(left(rcount)) || ")";
pctnum = round(pct_row,0.1) || " " || (rcount);
index = &i;
varname = vlabel(&pred);
keep &pred pctnum index varname;
run;
data rstats&i;
set stats&i;
length p_value $8.;
if P_PCHI <= 0.05 then do;
p_value = round(P_PCHI,0.0001) || "*";
if P_PCHI < 0.0001 then p_value = "<0.0001" || "*";
end;
else p_value = put(P_PCHI,8.4);
keep p_value index;
index = &i;
run;
data _null_;
set heart;
call symput("fmt",vformat(&pred));
run;
proc sort data = moutfreq&i;
by variable;
run;
proc sort data = routfreq&i;
by variable;
run;
data temp&i;
merge moutfreq&i routfreq&i;
by variable;
run;
data final&i;
merge temp&i rstats&i;
by index;
length formats $20.;
formats=put(variable,&fmt);
if not first.index then do;
varname = " ";
p_value = " ";
end;
drop variable;
run;
%mend;
%categ(gender,1);
%categ(smoke,2);
%categ(age_group,3);
%macro names(j,k,dataname);
%do i=&j %to &k;
&dataname&i
%end;
%mend names;
data categ_chd;
set %names(1,3,final);
label varname = "Demographic Characteristic"
total = "Total"
pctnum = "Coronary Heart Disease * % (n)"
p_value = "p-value * (2 sided)"
formats = "Category";
run;
ods listing close;
ods rtf file = "c:\nesug\table1a.rtf" style = forNESUG;
proc report data = categ_chd nowd split = "*";
column index varname formats total pctnum p_value;
define index /group noprint;
compute before index;
line ' ';
endcomp;
define varname / order = data style(column) = [just=left] width = 40;
define formats / order = data style(column) = [just=left];
define total / order = data style(column) = [just=center];
define pctnum / order = data style(column) = [just=center];
define p_value / order = data style(column) = [just=center];
title1 " NESUG PRESENTATION: TABLE 1A (NESUG 2004)";
title2 " CROSSTABS OF CATEGORICAL VARIABLES WITH CORONARY HEART DISEASE OUTCOME";
run;
ods rtf close;
ods listing;
Also, this code has the following error when it is run:
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
1:2
NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).
3:111
I think this macro needs to be modified so that it doesn't crash when it runs with categorical/character variables.
The line
if chd = 1 and &pred ne .;
Is what is causing your output to only have CHD = "1".. You would change that to:
if chd = 1 and &pred ne .;
I do not understand your request for an additional column. Perhaps post an example of the current output and the output that you want?
As for the "errors" (actually notes as they do not cause the system to stop processing), the occur when a variable is automatically converted from numeric to character or vice-versa. It provides the code line where it is happening and how many times it happened. I prefer to eliminate these notes as often as possible to avoid unintended consequences of inappropriate coercion. To do this, you would make use of the PUT and INPUT functions.

Hash Merge Macro - using a file record indicator "HASH + point = Key"

Looking to update this macro to be HASH + point = key. We have started to exceed our memory limits with our current version of this macro for one of our data runs. The reason I'm asking for help is because I don't have a lot of time and have never really analyzed this code since it wasn't part of my process until recently.
What I don't really understand from, https://www.lexjansen.com/nesug/nesug11/ld/ld01.pdf, is how does the RID get set and how to incorporate it into our macro. I actually don't even know if it is possible to do it this way with our current macro.
Any help would be greatly appreciated.
%macro hashmerge2(varnm,onto,from,byvars,obsqty);
%let data_vars = %trim (&varnm);
%let data_vars_a = %sysfunc(tranwrd(&data_vars.,%str( ),%str(" , ")));
%let data_vars_b = %sysfunc(tranwrd(&data_vars.,%str( ), %str(,)));
%let data_key = %trim (&byvars);
%let data_key = %sysfunc(tranwrd(&data_key.,%str( ), %str(" , ")));
%if %index(&varnm,' ') > 0 %then %let varnm3=%substr(%substr(&varnm,1,%index(&varnm,' ')),1,4);
%else %let varnm3=%substr(&varnm,1,4);
data &onto(drop=rc) miss&varnm3(drop=rc);
if 0 then set &onto &from(keep=&varnm. &byvars.);
declare hash h_merge (dataset: "&from.");
rc = h_merge.DefineKey ("&data_key.");
rc = h_merge.DefineData ("&data_vars_a.");
rc = h_merge.DefineDone ();
do until (eof);
set &onto end = eof;
call missing(&data_vars_b.);
rc = h_merge.find ();
if rc = 0 then do;
output &onto;
from = "&from.";
end;
else do;
output miss&varnm3 &onto;
from = "&onto.";
end;
end;
stop;
run;
%mend;
So I think this is what you are looking for, but it still needs to load all of the key values from the "lookup" table into the hash object. But it could save space by instead of also loading the non-key variables it just needs to load the observation number that matches the key variables.
%macro hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm /* Space delimited list of variable to retrieve */
,onto /* Dataset to update */
,from /* Dataset to get values from */
,byvars /* Space delimited list of key variables to match on */
);
%local missds key_vars;
%let missds=%scan(&varnm,1,%str( ));
%let missds=miss%substr(&missds,1,%sysfunc(min(28,%length(&missds))));
%let key_vars="%sysfunc(tranwrd(%sysfunc(compbl(&byvars)),%str( )," "))";
data &onto(drop=rc) &missds(drop=rc);
if 0 then set &onto &from(keep=&varnm. &byvars.);
declare hash h_merge ();
rc = h_merge.DefineKey (&key_vars);
rc = h_merge.DefineData ('_point');
rc = h_merge.DefineDone ();
do _point=1 to _nobs;
set &from(keep=&byvars) point=_point nobs=_nobs;
rc = h_merge.add();
end;
do until (eof);
set &onto end = eof;
rc = h_merge.find ();
if rc = 0 then do;
set &from (keep=&varnm) point=_point;
from = "&from.";
output &onto;
end;
else do;
call missing(of &varnm);
from = "&onto.";
output ;
end;
end;
stop;
run;
%mend hash_merge_point;
So here is an trivial example:
data lookup;
input id age sex $1.;
cards;
1 10 F
2 20 .
4 30 M
;
data master ;
input id wt ;
cards;
1 100
2 150
3 180
4 200
;
%hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm=age sex /* Space delimited list of variable to retrieve */
,onto=master /* Dataset to update */
,from=lookup /* Dataset to get values from */
,byvars=id /* Space delimited list of key variables to match on */
);
If the target table already has the variables being created by the merge (so you just want to overwrite the current values) then you can use the MODIFY statement instead of the SET statement to modify the dataset in place. But you might want to make sure you have a backup of the table before trying this. Also note that if you want flag for the source, the from variable, then that variable also needs to exist.
So with this updated master table:
data master ;
input id wt ;
length age 8 sex $1 from $50;
cards;
1 100
2 150
3 180
4 200
;
And this version of the macro:
%macro hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm /* Space delimited list of variable to retrieve */
,onto /* Dataset to update */
,from /* Dataset to get values from */
,byvars /* Space delimited list of key variables to match on */
);
%local key_vars;
%let key_vars="%sysfunc(tranwrd(%sysfunc(compbl(&byvars)),%str( )," "))";
data &onto;
if 0 then set &onto (keep=&byvars.);
declare hash h_merge ();
rc = h_merge.DefineKey (&key_vars);
rc = h_merge.DefineData ('_point');
rc = h_merge.DefineDone ();
do _point=1 to _nobs;
set &from(keep=&byvars) point=_point nobs=_nobs;
rc = h_merge.add();
end;
do until (eof);
modify &onto end = eof;
rc = h_merge.find ();
if rc = 0 then do;
set &from (keep=&varnm) point=_point;
from = "&from.";
end;
else from = "&onto.";
replace;
end;
stop;
run;
%mend hash_merge_point;
If you run this code:
proc print data=master;
title 'BEFORE';
run;
%hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm=age sex /* Space delimited list of variable to retrieve */
,onto=master /* Dataset to update */
,from=lookup /* Dataset to get values from */
,byvars=id /* Space delimited list of key variables to match on */
);
proc print data=master;
title 'AFTER';
run;
You get this result:

Count consecutive consonants in e-mail address in SAS SQL

I would like to identify the max number of consecutive consonants and vowels in an e-mail address, using SAS SQL (proc sql). The output should look like the one below in columns Max of consecutive consonants and max of consecutive vowels (I listed characters in first row for illustrative purposes only).
A few things to note:
treat special and numeric characters as a count terminator (e.g. 3rd email is a good example where you've got 3 consonants (hf) then numbers (98) and then again 2 consonants (jl). The output should be just 2 (hf).
I am only interested in the first part of the email (before #).
How do I achieve this, dear community?
E-mail Max of consecutive consonants Max of consecutive vowels
asifhajhtysiofh#gmail.com 5 (jhtys) 2 (io)
chris.nashfield#hotmail.com 3 2
ahf98jla#gmail.com 2 1
There is a routine called prxnext that proves very handy here.
Generate sample data
data emails;
input email $32.;
datalines;
asifhajhtysiofh#gmail.com
chris.nashfield#hotmail.com
ahf98jla#gmail.com
;
Do the counting
data checkEmails(keep = email maxCons maxVow);
set emails;
* Consonants;
re = prxparse("/[bcdfghjklmnpqrstvwxyz]+/");
start = 1;
stop = index(email,"#");
do until (pos = 0);
call prxnext(re,start,stop,email,pos,len);
maxCons = max(maxCons, len);
end;
* Vowels;
re = prxparse("/[aeiouy]+/");
start = 1;
stop = index(email,"#");
do until (pos = 0);
call prxnext(re,start,stop,email,pos,len);
maxVow = max(maxVow, len);
end;
run;
Results
Email MaxCons MaxVow
asifhajhtysiofh#gmail.com 5 2
chris.nashfield#hotmail.com 3 2
ahf98jla#gmail.com 2 1
This was much trickier than I expected it to be, but I have a solution using macro loops that roughly follows the logic in #DaBigNikoladze's comment:
data temp;
input email $40.;
datalines;
asifhajhtysiofh#gmail.com
chris.nashfield#hotmail.com
ahf98jla#gmail.com
;
run;
proc sql noprint;
select max(length(email)) into: max_email_length from temp;
quit;
%let vowels = "a" "e" "i" "o" "u";
%let consonants = "q" "w" "r" "t" "y" "p" "s" "d" "f" "g" "h" "j" "k" "l" "z" "x" "c" "v" "b" "n" "m";
%macro counter;
data temp_count;
set temp;
/* limit email to just the part before the #*/
email_short = substrn(email, 0, find(email, "#"));
email_vowels_only = email_short;
email_consonants_only = email_short;
/* keep only the vowels or consonants, respectively*/
%do i = 1 %to &max_email_length.;
if substr(email_vowels_only, &i., 1) notin(&vowels.) then substr(email_vowels_only, &i., 1) = " ";
if substr(email_consonants_only, &i., 1) notin(&consonants.) then substr(email_consonants_only, &i., 1) = " ";
%end;
run;
/* determine the max number of strings we have to scan through*/
proc sql noprint;
select max(max(countw(email_vowels_only)), max(countw(email_consonants_only))) into: loops from temp_count;
quit;
/* separate each string out into its own variable, find the max length of those variables, and drop those variables*/
proc sql;
create table temp_count_expand (drop = vowel_word: consonant_word:) as select
*,
%do j = 1 %to &loops.; scan(email_vowels_only, &j.) as vowel_word&j., %end;
%do k = 1 %to &loops.; scan(email_consonants_only, &k.) as consonant_word&k., %end;
max(%do j = 1 %to &loops.; length(calculated vowel_word&j.), %end; .) as max_vowels,
max(%do k = 1 %to &loops.; length(calculated consonant_word&k.), %end; .) as max_consonants
from temp_count;
quit;
%mend counter;
%counter;
I'm not sure why you specify proc sql for this task. A data step is much more suitable as you can loop through the email, treating everything that is either a non-consonant or a non-vowel as a delimiter. I've used a regular expression (prxchange) to remove the # portion of the email, although substr works just as well.
data have;
input Email $50.;
datalines;
asifhajhtysiofh#gmail.com
chris.nashfield#hotmail.com
ahf98jla#gmail.com
;
run;
data want;
set have;
length _w1 _w2 $50;
_short_email=prxchange('s/#.+//',-1,email); /* remove everything from # onwards */
do _i = 1 by 1 until (_w1=''); /* loop through email, using everything other than consonants as the delimiter */
_w1 = scan(_short_email,_i,'bcdfghjklmnpqrstvwxyz','ki');
consonant = max(consonant,ifn(missing(_w1),0,length(_w1))); /* keep longest value */
end;
do _j = 1 by 1 until (_w2=''); /* loop through email, using everything other than vowels as the delimiter */
_w2 = scan(_short_email,_j,'aeiou','ki');
vowel = max(vowel,ifn(missing(_w2),0,length(_w2))); /* keep longest value */
end;
drop _: ; /* drop temprorary variables */
run;

doing a do loop within macro sas

I have the following code:
%macro initial (first=, second=, third=, fourth=, final=);
data &first;
set wtnodup.&first;
DATE1 = INPUT(PUT(Date,8.),YYMMDD8.);
format DATE1 monyy7.;
RUN;
proc freq data=&first order= freq;
tables date1*jobboardid / list out=&second (drop = percent rename=
(Count=CountNew));
run;
data &third;
set &second (firstobs=2);
if countnew le 49 then delete;
run;
proc sort data = &third;
by jobboardid Date1;
run;
data &fourth (keep = countnew oldcountnew Date1 rate from till jobboardid
rate);
set &third;
by jobboardid Date1;
format From Till monyy7.;
from = lag12(Date1);
oldcountnew = lag12(countnew);
if lag12(jobboardid) EQ jobboardid and
INTCK('month', from, Date1) EQ 12 then do;
till = Date1;
rate = ((countnew/oldcountnew)-1)*100;
output;
end;
run;
proc sort data = &fourth;
by Date1 rate;
proc means data=&fourth noprint;
by Date1;
output out=Result.&final median(rate)=medianRate;
run;
%mend initial;
%initial (first = Alabama, second = AlabamaOne, third =AlabamaTwo,
fourth = AlabamaThree, final=AL_10);
%initial (first = Alaska, second = AlaskaOne, third =AlaskaTwo,
fourth = AlaskaThree, final=AK_10);
%initial (first = Arizona, second = ArizonaOne, third =ArizonaTwo,
fourth = ArizonaThree, final=AZ);
%initial (first = Arkansas, second = ArkansasOne, third =ArkansasTwo,
fourth= ArkansasThree, final=AR_10);
What I am trying to do is that in the part that puts the condition:
if countnew < 10 then delete;
I want to create a sort of do-loop that would delete the data when countnew is <10,20,30....until 70, and creates a separate data-set for each of of the iteration of when countnew is <10, 20, etc.
So I would have a final data-set for of the different iteration of when countnew
What is the best way about doing this?
Why not do-looping, ten by ten, and adding the iteration extension to the dataset name like this?
** Sample dataset;
data try;
do i=1 to 1000;
value=1+ranuni(12345)*100;
output;
end;
drop i;
run;
** Macro iterator:
%macro iter(ds=);
%do i=10 %to 70 %by 10;
data &ds._&i;
set &ds;
if value le &i then delete;
run;
%end;
%mend;
%iter (ds=try)
you will have 7 dataset named try_10--try_70 where try will be replaced with the dataset name.