I'm trying to modify the following SAS macro so that it includes includes percentages for the variable CHD when it is equal to both 0 and 1. Currently this macro is only set up to print out the results of baseline variables when the CHD (chronic heart disease) is equal to 1. I think the modification needs to occur within the data routfreq&i step but I'm not quite sure how to set it up. I would then also need an additional column to print out 'No Coronary Heart Disease * % (n)".
%macro categ(pred,i);
proc freq data = heart;
tables &pred * chd / chisq sparse outpct out = outfreq&i ;
output out = stats&i chisq;
run;
proc sort data = outfreq&i;
by &pred;
run;
proc means data = outfreq&i noprint;
where chd ne . and &pred ne .;
by &pred;
var COUNT;
output out=moutfreq&i(keep=&pred total rename=(&pred=variable)) sum=total;
run;
data routfreq&i(rename = (&pred = variable));
set outfreq&i;
length varname $20.;
if chd = 1 and &pred ne .;
rcount = put(count,8.);
rcount = "(" || trim(left(rcount)) || ")";
pctnum = round(pct_row,0.1) || " " || (rcount);
index = &i;
varname = vlabel(&pred);
keep &pred pctnum index varname;
run;
data rstats&i;
set stats&i;
length p_value $8.;
if P_PCHI <= 0.05 then do;
p_value = round(P_PCHI,0.0001) || "*";
if P_PCHI < 0.0001 then p_value = "<0.0001" || "*";
end;
else p_value = put(P_PCHI,8.4);
keep p_value index;
index = &i;
run;
data _null_;
set heart;
call symput("fmt",vformat(&pred));
run;
proc sort data = moutfreq&i;
by variable;
run;
proc sort data = routfreq&i;
by variable;
run;
data temp&i;
merge moutfreq&i routfreq&i;
by variable;
run;
data final&i;
merge temp&i rstats&i;
by index;
length formats $20.;
formats=put(variable,&fmt);
if not first.index then do;
varname = " ";
p_value = " ";
end;
drop variable;
run;
%mend;
%categ(gender,1);
%categ(smoke,2);
%categ(age_group,3);
%macro names(j,k,dataname);
%do i=&j %to &k;
&dataname&i
%end;
%mend names;
data categ_chd;
set %names(1,3,final);
label varname = "Demographic Characteristic"
total = "Total"
pctnum = "Coronary Heart Disease * % (n)"
p_value = "p-value * (2 sided)"
formats = "Category";
run;
ods listing close;
ods rtf file = "c:\nesug\table1a.rtf" style = forNESUG;
proc report data = categ_chd nowd split = "*";
column index varname formats total pctnum p_value;
define index /group noprint;
compute before index;
line ' ';
endcomp;
define varname / order = data style(column) = [just=left] width = 40;
define formats / order = data style(column) = [just=left];
define total / order = data style(column) = [just=center];
define pctnum / order = data style(column) = [just=center];
define p_value / order = data style(column) = [just=center];
title1 " NESUG PRESENTATION: TABLE 1A (NESUG 2004)";
title2 " CROSSTABS OF CATEGORICAL VARIABLES WITH CORONARY HEART DISEASE OUTCOME";
run;
ods rtf close;
ods listing;
Also, this code has the following error when it is run:
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
1:2
NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).
3:111
I think this macro needs to be modified so that it doesn't crash when it runs with categorical/character variables.
The line
if chd = 1 and &pred ne .;
Is what is causing your output to only have CHD = "1".. You would change that to:
if chd = 1 and &pred ne .;
I do not understand your request for an additional column. Perhaps post an example of the current output and the output that you want?
As for the "errors" (actually notes as they do not cause the system to stop processing), the occur when a variable is automatically converted from numeric to character or vice-versa. It provides the code line where it is happening and how many times it happened. I prefer to eliminate these notes as often as possible to avoid unintended consequences of inappropriate coercion. To do this, you would make use of the PUT and INPUT functions.
Related
Looking to update this macro to be HASH + point = key. We have started to exceed our memory limits with our current version of this macro for one of our data runs. The reason I'm asking for help is because I don't have a lot of time and have never really analyzed this code since it wasn't part of my process until recently.
What I don't really understand from, https://www.lexjansen.com/nesug/nesug11/ld/ld01.pdf, is how does the RID get set and how to incorporate it into our macro. I actually don't even know if it is possible to do it this way with our current macro.
Any help would be greatly appreciated.
%macro hashmerge2(varnm,onto,from,byvars,obsqty);
%let data_vars = %trim (&varnm);
%let data_vars_a = %sysfunc(tranwrd(&data_vars.,%str( ),%str(" , ")));
%let data_vars_b = %sysfunc(tranwrd(&data_vars.,%str( ), %str(,)));
%let data_key = %trim (&byvars);
%let data_key = %sysfunc(tranwrd(&data_key.,%str( ), %str(" , ")));
%if %index(&varnm,' ') > 0 %then %let varnm3=%substr(%substr(&varnm,1,%index(&varnm,' ')),1,4);
%else %let varnm3=%substr(&varnm,1,4);
data &onto(drop=rc) miss&varnm3(drop=rc);
if 0 then set &onto &from(keep=&varnm. &byvars.);
declare hash h_merge (dataset: "&from.");
rc = h_merge.DefineKey ("&data_key.");
rc = h_merge.DefineData ("&data_vars_a.");
rc = h_merge.DefineDone ();
do until (eof);
set &onto end = eof;
call missing(&data_vars_b.);
rc = h_merge.find ();
if rc = 0 then do;
output &onto;
from = "&from.";
end;
else do;
output miss&varnm3 &onto;
from = "&onto.";
end;
end;
stop;
run;
%mend;
So I think this is what you are looking for, but it still needs to load all of the key values from the "lookup" table into the hash object. But it could save space by instead of also loading the non-key variables it just needs to load the observation number that matches the key variables.
%macro hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm /* Space delimited list of variable to retrieve */
,onto /* Dataset to update */
,from /* Dataset to get values from */
,byvars /* Space delimited list of key variables to match on */
);
%local missds key_vars;
%let missds=%scan(&varnm,1,%str( ));
%let missds=miss%substr(&missds,1,%sysfunc(min(28,%length(&missds))));
%let key_vars="%sysfunc(tranwrd(%sysfunc(compbl(&byvars)),%str( )," "))";
data &onto(drop=rc) &missds(drop=rc);
if 0 then set &onto &from(keep=&varnm. &byvars.);
declare hash h_merge ();
rc = h_merge.DefineKey (&key_vars);
rc = h_merge.DefineData ('_point');
rc = h_merge.DefineDone ();
do _point=1 to _nobs;
set &from(keep=&byvars) point=_point nobs=_nobs;
rc = h_merge.add();
end;
do until (eof);
set &onto end = eof;
rc = h_merge.find ();
if rc = 0 then do;
set &from (keep=&varnm) point=_point;
from = "&from.";
output &onto;
end;
else do;
call missing(of &varnm);
from = "&onto.";
output ;
end;
end;
stop;
run;
%mend hash_merge_point;
So here is an trivial example:
data lookup;
input id age sex $1.;
cards;
1 10 F
2 20 .
4 30 M
;
data master ;
input id wt ;
cards;
1 100
2 150
3 180
4 200
;
%hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm=age sex /* Space delimited list of variable to retrieve */
,onto=master /* Dataset to update */
,from=lookup /* Dataset to get values from */
,byvars=id /* Space delimited list of key variables to match on */
);
If the target table already has the variables being created by the merge (so you just want to overwrite the current values) then you can use the MODIFY statement instead of the SET statement to modify the dataset in place. But you might want to make sure you have a backup of the table before trying this. Also note that if you want flag for the source, the from variable, then that variable also needs to exist.
So with this updated master table:
data master ;
input id wt ;
length age 8 sex $1 from $50;
cards;
1 100
2 150
3 180
4 200
;
And this version of the macro:
%macro hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm /* Space delimited list of variable to retrieve */
,onto /* Dataset to update */
,from /* Dataset to get values from */
,byvars /* Space delimited list of key variables to match on */
);
%local key_vars;
%let key_vars="%sysfunc(tranwrd(%sysfunc(compbl(&byvars)),%str( )," "))";
data &onto;
if 0 then set &onto (keep=&byvars.);
declare hash h_merge ();
rc = h_merge.DefineKey (&key_vars);
rc = h_merge.DefineData ('_point');
rc = h_merge.DefineDone ();
do _point=1 to _nobs;
set &from(keep=&byvars) point=_point nobs=_nobs;
rc = h_merge.add();
end;
do until (eof);
modify &onto end = eof;
rc = h_merge.find ();
if rc = 0 then do;
set &from (keep=&varnm) point=_point;
from = "&from.";
end;
else from = "&onto.";
replace;
end;
stop;
run;
%mend hash_merge_point;
If you run this code:
proc print data=master;
title 'BEFORE';
run;
%hash_merge_point
/*-----------------------------------------------------------------------------
Merge variables ONTO large table FROM small table using POINT= dataset option.
-----------------------------------------------------------------------------*/
(varnm=age sex /* Space delimited list of variable to retrieve */
,onto=master /* Dataset to update */
,from=lookup /* Dataset to get values from */
,byvars=id /* Space delimited list of key variables to match on */
);
proc print data=master;
title 'AFTER';
run;
You get this result:
I don't have another analyst on my team at work and have a question about the most efficient way to run several proc freq concurrently.
My goal is to run about 160 different frequencies, and include formatting for all of them. I assume a macro is the fastest way, but I only have experience with basic macros. Below is my thought process assuming the data was already formatted:
%macro survey(question, formatA formatB);
proc freq;
table &question;
format &formatA &formatB;
%mend;
%survey (question, formatA, formatB);
"question", "formatA" and "formatB" will be strings of data for example:
-"question" would be KCI_1 KCI_2 through KCI_80
- "formatA" would be KCI_1fmt KCI_2fmt through KCI_80fmt
- "formatB" would be KCI_1fmt. KCI_2fmt. through KCI_80fmt.
Danielle:
You can use macro to assign known formats to variables that are not already formatted. The rest of the FREQ does not have to be macro-ized.
* make some survey data with unformatted responses;
data have;
do respondent_id = 1 to 10000;
array responses KCI_1-KCI_80;
do _n_ = 1 to dim(responses);
responses(_n_) = ceil(4*ranuni(123));
end;
output;
end;
run;
* make some format data for each question;
data responseMeanings;
length questionID 8 responseValue 8 responseMeaning $50;
do questionID = 1 to 80;
fmtname = cats('Q',questionID,'_fmt');
peg = ranuni (1234); drop peg;
do responseValue = 1 to 4;
select;
when (peg < 0.4) responseMeaning = scan('Never,Seldom,Often,Always', responseValue);
when (peg < 0.8) responseMeaning = scan('Yes,No,Don''t Ask,Don''t Tell', responseValue);
otherwise responseMeaning = scan('Nasty,Sour,Sweet,Tasty', responseValue);
end;
output;
end;
end;
run;
* create a custom format for the responses of each question;
proc format cntlin=responseMeanings(rename=(responseValue=start responseMeaning=label));
run;
* macro to associate variables with the corresponding custom format;
%macro format_each_response;
%local i;
format
%do i = 1 %to 80;
KCI_&i Q&i._fmt.
%end;
;
%mend;
* compute frequency counts;
proc freq data=have;
table KCI_1-KCI_80;
%format_each_response;
run;
I would like to identify the max number of consecutive consonants and vowels in an e-mail address, using SAS SQL (proc sql). The output should look like the one below in columns Max of consecutive consonants and max of consecutive vowels (I listed characters in first row for illustrative purposes only).
A few things to note:
treat special and numeric characters as a count terminator (e.g. 3rd email is a good example where you've got 3 consonants (hf) then numbers (98) and then again 2 consonants (jl). The output should be just 2 (hf).
I am only interested in the first part of the email (before #).
How do I achieve this, dear community?
E-mail Max of consecutive consonants Max of consecutive vowels
asifhajhtysiofh#gmail.com 5 (jhtys) 2 (io)
chris.nashfield#hotmail.com 3 2
ahf98jla#gmail.com 2 1
There is a routine called prxnext that proves very handy here.
Generate sample data
data emails;
input email $32.;
datalines;
asifhajhtysiofh#gmail.com
chris.nashfield#hotmail.com
ahf98jla#gmail.com
;
Do the counting
data checkEmails(keep = email maxCons maxVow);
set emails;
* Consonants;
re = prxparse("/[bcdfghjklmnpqrstvwxyz]+/");
start = 1;
stop = index(email,"#");
do until (pos = 0);
call prxnext(re,start,stop,email,pos,len);
maxCons = max(maxCons, len);
end;
* Vowels;
re = prxparse("/[aeiouy]+/");
start = 1;
stop = index(email,"#");
do until (pos = 0);
call prxnext(re,start,stop,email,pos,len);
maxVow = max(maxVow, len);
end;
run;
Results
Email MaxCons MaxVow
asifhajhtysiofh#gmail.com 5 2
chris.nashfield#hotmail.com 3 2
ahf98jla#gmail.com 2 1
This was much trickier than I expected it to be, but I have a solution using macro loops that roughly follows the logic in #DaBigNikoladze's comment:
data temp;
input email $40.;
datalines;
asifhajhtysiofh#gmail.com
chris.nashfield#hotmail.com
ahf98jla#gmail.com
;
run;
proc sql noprint;
select max(length(email)) into: max_email_length from temp;
quit;
%let vowels = "a" "e" "i" "o" "u";
%let consonants = "q" "w" "r" "t" "y" "p" "s" "d" "f" "g" "h" "j" "k" "l" "z" "x" "c" "v" "b" "n" "m";
%macro counter;
data temp_count;
set temp;
/* limit email to just the part before the #*/
email_short = substrn(email, 0, find(email, "#"));
email_vowels_only = email_short;
email_consonants_only = email_short;
/* keep only the vowels or consonants, respectively*/
%do i = 1 %to &max_email_length.;
if substr(email_vowels_only, &i., 1) notin(&vowels.) then substr(email_vowels_only, &i., 1) = " ";
if substr(email_consonants_only, &i., 1) notin(&consonants.) then substr(email_consonants_only, &i., 1) = " ";
%end;
run;
/* determine the max number of strings we have to scan through*/
proc sql noprint;
select max(max(countw(email_vowels_only)), max(countw(email_consonants_only))) into: loops from temp_count;
quit;
/* separate each string out into its own variable, find the max length of those variables, and drop those variables*/
proc sql;
create table temp_count_expand (drop = vowel_word: consonant_word:) as select
*,
%do j = 1 %to &loops.; scan(email_vowels_only, &j.) as vowel_word&j., %end;
%do k = 1 %to &loops.; scan(email_consonants_only, &k.) as consonant_word&k., %end;
max(%do j = 1 %to &loops.; length(calculated vowel_word&j.), %end; .) as max_vowels,
max(%do k = 1 %to &loops.; length(calculated consonant_word&k.), %end; .) as max_consonants
from temp_count;
quit;
%mend counter;
%counter;
I'm not sure why you specify proc sql for this task. A data step is much more suitable as you can loop through the email, treating everything that is either a non-consonant or a non-vowel as a delimiter. I've used a regular expression (prxchange) to remove the # portion of the email, although substr works just as well.
data have;
input Email $50.;
datalines;
asifhajhtysiofh#gmail.com
chris.nashfield#hotmail.com
ahf98jla#gmail.com
;
run;
data want;
set have;
length _w1 _w2 $50;
_short_email=prxchange('s/#.+//',-1,email); /* remove everything from # onwards */
do _i = 1 by 1 until (_w1=''); /* loop through email, using everything other than consonants as the delimiter */
_w1 = scan(_short_email,_i,'bcdfghjklmnpqrstvwxyz','ki');
consonant = max(consonant,ifn(missing(_w1),0,length(_w1))); /* keep longest value */
end;
do _j = 1 by 1 until (_w2=''); /* loop through email, using everything other than vowels as the delimiter */
_w2 = scan(_short_email,_j,'aeiou','ki');
vowel = max(vowel,ifn(missing(_w2),0,length(_w2))); /* keep longest value */
end;
drop _: ; /* drop temprorary variables */
run;
I have the following code:
%macro initial (first=, second=, third=, fourth=, final=);
data &first;
set wtnodup.&first;
DATE1 = INPUT(PUT(Date,8.),YYMMDD8.);
format DATE1 monyy7.;
RUN;
proc freq data=&first order= freq;
tables date1*jobboardid / list out=&second (drop = percent rename=
(Count=CountNew));
run;
data &third;
set &second (firstobs=2);
if countnew le 49 then delete;
run;
proc sort data = &third;
by jobboardid Date1;
run;
data &fourth (keep = countnew oldcountnew Date1 rate from till jobboardid
rate);
set &third;
by jobboardid Date1;
format From Till monyy7.;
from = lag12(Date1);
oldcountnew = lag12(countnew);
if lag12(jobboardid) EQ jobboardid and
INTCK('month', from, Date1) EQ 12 then do;
till = Date1;
rate = ((countnew/oldcountnew)-1)*100;
output;
end;
run;
proc sort data = &fourth;
by Date1 rate;
proc means data=&fourth noprint;
by Date1;
output out=Result.&final median(rate)=medianRate;
run;
%mend initial;
%initial (first = Alabama, second = AlabamaOne, third =AlabamaTwo,
fourth = AlabamaThree, final=AL_10);
%initial (first = Alaska, second = AlaskaOne, third =AlaskaTwo,
fourth = AlaskaThree, final=AK_10);
%initial (first = Arizona, second = ArizonaOne, third =ArizonaTwo,
fourth = ArizonaThree, final=AZ);
%initial (first = Arkansas, second = ArkansasOne, third =ArkansasTwo,
fourth= ArkansasThree, final=AR_10);
What I am trying to do is that in the part that puts the condition:
if countnew < 10 then delete;
I want to create a sort of do-loop that would delete the data when countnew is <10,20,30....until 70, and creates a separate data-set for each of of the iteration of when countnew is <10, 20, etc.
So I would have a final data-set for of the different iteration of when countnew
What is the best way about doing this?
Why not do-looping, ten by ten, and adding the iteration extension to the dataset name like this?
** Sample dataset;
data try;
do i=1 to 1000;
value=1+ranuni(12345)*100;
output;
end;
drop i;
run;
** Macro iterator:
%macro iter(ds=);
%do i=10 %to 70 %by 10;
data &ds._&i;
set &ds;
if value le &i then delete;
run;
%end;
%mend;
%iter (ds=try)
you will have 7 dataset named try_10--try_70 where try will be replaced with the dataset name.
I am stuck with a problem in SAS. I have a bunch of monthly weather data in individual txt-files. My current goal is to read those in and create a separate data set for each. Alternatively, I could see it being possible to skip this step and go closer to end goal of merging all these data sets to another data set by the date and time. Below was my try at the problem. I thought a macro would work that iterates through the file names and creates matching data set names, but apparently it does not. Also, to make it more efficient the if/else if statements I think can be replaced by a DO loop but I could not figure it out. Help is much appreciated!
%macro loop;
%do i = 11 %to 13;
%do j = 01 %to 12;
%let year = i;
%let month = j;
data _&year&month ;
infile "&path\hr_pit_&year..&month..txt" firstobs=27;
length Time $ 4 Month $ 3 Day $ 2 Year $ 4 temp 3;
input time $ Month $ 10-13 Day Year temp 32-34;
Date = Day||Month||Year;
if time = '12AM' then time = 2400;
else if time = '1AM ' then time = 100;
else if time = '2AM ' then time = 200;
else if time = '3AM ' then time = 300;
else if time = '4AM ' then time = 400;
else if time = '5AM ' then time = 500;
else if time = '6AM ' then time = 600;
else if time = '7AM ' then time = 700;
else if time = '8AM ' then time = 800;
else if time = '9AM ' then time = 900;
else if time = '10AM' then time = 1000;
else if time = '11AM' then time = 1100;
else if time = '12PM' then time = 1200;
else if time = '1PM ' then time = 1300;
else if time = '2PM ' then time = 1400;
else if time = '3PM ' then time = 1500;
else if time = '4PM ' then time = 1600;
else if time = '5PM ' then time = 1700;
else if time = '6PM ' then time = 1800;
else if time = '7PM ' then time = 1900;
else if time = '8PM ' then time = 2000;
else if time = '9PM ' then time = 2100;
else if time = '10PM' then time = 2200;
else if time = '11PM' then time = 2300;
_time = input(time,4.);
time = _time;
drop month day year;
run;
%end;
%end;
%mend;
%loop; run:
In case anyone is wondering this is how a typical txt file looks: http://www.erh.noaa.gov/pbz/hourlywx/hr_pit_13.01
Here is a list of txt files in the same shape and form:
http://www.erh.noaa.gov/pbz/hourlyclimate.htm
First fixes in:
%let year = &i;
%let month = %sysfunc(putn(&j, z2.));
to use macro variables and add leading zero to month.
The rest of changes is just dealing with AM/PM.
Also the Date is now numeric.
Full code:
%macro loop;
%do i = 11 %to 13;
%do j = 1 %to 12;
%let year = &i;
%let month = %sysfunc(putn(&j, z2.));
data _&year&month ;
length Date 5 _Time $4 Time 8 Month $3 Day $2 Year $4 temp 3;
format Date DATE9.;
infile "&path\hr_pit_&year..&month..txt" firstobs=27;
input _time $ Month $ 10-13 Day Year temp 32-34;
_time = right(_time);
Date = input(Day||Month||Year, date9.);
if _time = '12AM' or (_time ne '12PM' and index(_time, 'PM') > 1 )
then time=input(_time, 2.) + 12;
else time=input(_time, 2.);
time = time * 100;
drop month day year;
run;
/* gather all data in one table */
proc append base=work.all_data data=work._&year&month;
run;
%end;
%end;
%mend;
proc sql;
drop table work.all_data;
quit;
%let path=E:;
%loop;
Sounds like the best answer may be to read them all into one dataset and then merge them to the final dataset from there. I think you also are served better by using a real time value, rather than 100-2400 (and an inconsistent 2400, that really should be 000 if you're doing that) - then you can just use input.
Anyway, if you just read the text files in like so:
data my_text_files;
infile "c:\mydirectory\*.txt" lrecl=whatever eov=eovmark;
*firstobs=27 is only respected for the first file - so we have to track with eovmark;
if eovmark then do;
eovmark=0;
linecounter=0;
end;
linecounter+1;
if linecounter ge 27 then do;
input (input statement);
(any other code you want to execute here);
output;
end;
run;
Then merge by (whatever). If you need to know some information about the filename you can use the filename option to get access to that in the infile statement.