I have the following code:
%macro initial (first=, second=, third=, fourth=, final=);
data &first;
set wtnodup.&first;
DATE1 = INPUT(PUT(Date,8.),YYMMDD8.);
format DATE1 monyy7.;
RUN;
proc freq data=&first order= freq;
tables date1*jobboardid / list out=&second (drop = percent rename=
(Count=CountNew));
run;
data &third;
set &second (firstobs=2);
if countnew le 49 then delete;
run;
proc sort data = &third;
by jobboardid Date1;
run;
data &fourth (keep = countnew oldcountnew Date1 rate from till jobboardid
rate);
set &third;
by jobboardid Date1;
format From Till monyy7.;
from = lag12(Date1);
oldcountnew = lag12(countnew);
if lag12(jobboardid) EQ jobboardid and
INTCK('month', from, Date1) EQ 12 then do;
till = Date1;
rate = ((countnew/oldcountnew)-1)*100;
output;
end;
run;
proc sort data = &fourth;
by Date1 rate;
proc means data=&fourth noprint;
by Date1;
output out=Result.&final median(rate)=medianRate;
run;
%mend initial;
%initial (first = Alabama, second = AlabamaOne, third =AlabamaTwo,
fourth = AlabamaThree, final=AL_10);
%initial (first = Alaska, second = AlaskaOne, third =AlaskaTwo,
fourth = AlaskaThree, final=AK_10);
%initial (first = Arizona, second = ArizonaOne, third =ArizonaTwo,
fourth = ArizonaThree, final=AZ);
%initial (first = Arkansas, second = ArkansasOne, third =ArkansasTwo,
fourth= ArkansasThree, final=AR_10);
What I am trying to do is that in the part that puts the condition:
if countnew < 10 then delete;
I want to create a sort of do-loop that would delete the data when countnew is <10,20,30....until 70, and creates a separate data-set for each of of the iteration of when countnew is <10, 20, etc.
So I would have a final data-set for of the different iteration of when countnew
What is the best way about doing this?
Why not do-looping, ten by ten, and adding the iteration extension to the dataset name like this?
** Sample dataset;
data try;
do i=1 to 1000;
value=1+ranuni(12345)*100;
output;
end;
drop i;
run;
** Macro iterator:
%macro iter(ds=);
%do i=10 %to 70 %by 10;
data &ds._&i;
set &ds;
if value le &i then delete;
run;
%end;
%mend;
%iter (ds=try)
you will have 7 dataset named try_10--try_70 where try will be replaced with the dataset name.
Related
I am trying to foramt an existing date format 07/06/2020 (DDMMYYYY) to 07_06_2020 and that the output will be a string, not an int.
my code:
%LET Run_Date = %SYSFUNC(TODAY(), MMDDYY8.) ;
PROC FORMAT ;
PICTURE Runner low-high = '99_99_9999' ;
RUN ;
DATA _NULL ;
Run_Date_2 = PUT(Run_Date, Runner.) ;
CALL SYMPUT('Run_Date_2 ', Run_Date_2) ;
RUN ;
%PUT %Run_Date_2 . ;
**output**: error.
Thanks
Try this. Remember to use the datatype=date option.
proc format;
picture dtfmt (default=10)
low - high = '%0d_%0m_%Y' (datatype=date)
;
run;
data test;
dt = "07jun2020"d;
dt_char = put(dt, dtfmt.);
format dt ddmmyy10.;
run;
I'm trying to modify the following SAS macro so that it includes includes percentages for the variable CHD when it is equal to both 0 and 1. Currently this macro is only set up to print out the results of baseline variables when the CHD (chronic heart disease) is equal to 1. I think the modification needs to occur within the data routfreq&i step but I'm not quite sure how to set it up. I would then also need an additional column to print out 'No Coronary Heart Disease * % (n)".
%macro categ(pred,i);
proc freq data = heart;
tables &pred * chd / chisq sparse outpct out = outfreq&i ;
output out = stats&i chisq;
run;
proc sort data = outfreq&i;
by &pred;
run;
proc means data = outfreq&i noprint;
where chd ne . and &pred ne .;
by &pred;
var COUNT;
output out=moutfreq&i(keep=&pred total rename=(&pred=variable)) sum=total;
run;
data routfreq&i(rename = (&pred = variable));
set outfreq&i;
length varname $20.;
if chd = 1 and &pred ne .;
rcount = put(count,8.);
rcount = "(" || trim(left(rcount)) || ")";
pctnum = round(pct_row,0.1) || " " || (rcount);
index = &i;
varname = vlabel(&pred);
keep &pred pctnum index varname;
run;
data rstats&i;
set stats&i;
length p_value $8.;
if P_PCHI <= 0.05 then do;
p_value = round(P_PCHI,0.0001) || "*";
if P_PCHI < 0.0001 then p_value = "<0.0001" || "*";
end;
else p_value = put(P_PCHI,8.4);
keep p_value index;
index = &i;
run;
data _null_;
set heart;
call symput("fmt",vformat(&pred));
run;
proc sort data = moutfreq&i;
by variable;
run;
proc sort data = routfreq&i;
by variable;
run;
data temp&i;
merge moutfreq&i routfreq&i;
by variable;
run;
data final&i;
merge temp&i rstats&i;
by index;
length formats $20.;
formats=put(variable,&fmt);
if not first.index then do;
varname = " ";
p_value = " ";
end;
drop variable;
run;
%mend;
%categ(gender,1);
%categ(smoke,2);
%categ(age_group,3);
%macro names(j,k,dataname);
%do i=&j %to &k;
&dataname&i
%end;
%mend names;
data categ_chd;
set %names(1,3,final);
label varname = "Demographic Characteristic"
total = "Total"
pctnum = "Coronary Heart Disease * % (n)"
p_value = "p-value * (2 sided)"
formats = "Category";
run;
ods listing close;
ods rtf file = "c:\nesug\table1a.rtf" style = forNESUG;
proc report data = categ_chd nowd split = "*";
column index varname formats total pctnum p_value;
define index /group noprint;
compute before index;
line ' ';
endcomp;
define varname / order = data style(column) = [just=left] width = 40;
define formats / order = data style(column) = [just=left];
define total / order = data style(column) = [just=center];
define pctnum / order = data style(column) = [just=center];
define p_value / order = data style(column) = [just=center];
title1 " NESUG PRESENTATION: TABLE 1A (NESUG 2004)";
title2 " CROSSTABS OF CATEGORICAL VARIABLES WITH CORONARY HEART DISEASE OUTCOME";
run;
ods rtf close;
ods listing;
Also, this code has the following error when it is run:
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
1:2
NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).
3:111
I think this macro needs to be modified so that it doesn't crash when it runs with categorical/character variables.
The line
if chd = 1 and &pred ne .;
Is what is causing your output to only have CHD = "1".. You would change that to:
if chd = 1 and &pred ne .;
I do not understand your request for an additional column. Perhaps post an example of the current output and the output that you want?
As for the "errors" (actually notes as they do not cause the system to stop processing), the occur when a variable is automatically converted from numeric to character or vice-versa. It provides the code line where it is happening and how many times it happened. I prefer to eliminate these notes as often as possible to avoid unintended consequences of inappropriate coercion. To do this, you would make use of the PUT and INPUT functions.
I don't have another analyst on my team at work and have a question about the most efficient way to run several proc freq concurrently.
My goal is to run about 160 different frequencies, and include formatting for all of them. I assume a macro is the fastest way, but I only have experience with basic macros. Below is my thought process assuming the data was already formatted:
%macro survey(question, formatA formatB);
proc freq;
table &question;
format &formatA &formatB;
%mend;
%survey (question, formatA, formatB);
"question", "formatA" and "formatB" will be strings of data for example:
-"question" would be KCI_1 KCI_2 through KCI_80
- "formatA" would be KCI_1fmt KCI_2fmt through KCI_80fmt
- "formatB" would be KCI_1fmt. KCI_2fmt. through KCI_80fmt.
Danielle:
You can use macro to assign known formats to variables that are not already formatted. The rest of the FREQ does not have to be macro-ized.
* make some survey data with unformatted responses;
data have;
do respondent_id = 1 to 10000;
array responses KCI_1-KCI_80;
do _n_ = 1 to dim(responses);
responses(_n_) = ceil(4*ranuni(123));
end;
output;
end;
run;
* make some format data for each question;
data responseMeanings;
length questionID 8 responseValue 8 responseMeaning $50;
do questionID = 1 to 80;
fmtname = cats('Q',questionID,'_fmt');
peg = ranuni (1234); drop peg;
do responseValue = 1 to 4;
select;
when (peg < 0.4) responseMeaning = scan('Never,Seldom,Often,Always', responseValue);
when (peg < 0.8) responseMeaning = scan('Yes,No,Don''t Ask,Don''t Tell', responseValue);
otherwise responseMeaning = scan('Nasty,Sour,Sweet,Tasty', responseValue);
end;
output;
end;
end;
run;
* create a custom format for the responses of each question;
proc format cntlin=responseMeanings(rename=(responseValue=start responseMeaning=label));
run;
* macro to associate variables with the corresponding custom format;
%macro format_each_response;
%local i;
format
%do i = 1 %to 80;
KCI_&i Q&i._fmt.
%end;
;
%mend;
* compute frequency counts;
proc freq data=have;
table KCI_1-KCI_80;
%format_each_response;
run;
Im using the following loop to generate the sums of some columns using a class statement:
%macro do_mean;
%do i = 50 %to 100;
%let string1 = %eval(100-&i);
**%if string1 = 5 %then %let string1 = "05";**
%if &i = 95 or &i = 90 or &i = 80 or &i = 70 or &i = 60 or &i = 50 %then %do;
proc means data = risiko.risiko_Haus sum nway noprint;
var HA_Max_Neg HA_Max_Pers;
class C_ze_Risiko_&i._2014_&string1._2015;
output out=test_ze_Risiko_&i._2014_&string1._2015 (drop=_type_ _freq_)
sum=C_Risiko_&i._2014_05_2015_Max_Neg C_Risiko_&string1._2014_05_2015_Max_Per;
run;
%end;
%end;
%mend do_mean;
%do_mean;here
The names columns I want to use as a class are "C_ze_Risiko_50_2014_50_2015" "C_ze_Risiko_60_2014_40_2015" and so on.
Unfortunately the code produces "C_ZE_RISIKO_95_2014_5_2015" but I need "C_ZE_RISIKO_95_2014_05_2015" instead. I marked the line where I tried to change this. Unfortunately this doesnt work. Can somebody tell me why and suggest a solution?
Thanks in advance.
An alternative to Gregory's answer is to use putn and the z2. format, e.g.
%LET STRING1 = %SYSFUNC(putn(%EVAL(100-&I),z2.)) ;
What you can do is always append "0" first and then take only the last 2 characters of your string:
%let string1 = "0".%eval(100-&i);
%let string1 = %substr(&string1,%length(&string1)-1);
I am stuck with a problem in SAS. I have a bunch of monthly weather data in individual txt-files. My current goal is to read those in and create a separate data set for each. Alternatively, I could see it being possible to skip this step and go closer to end goal of merging all these data sets to another data set by the date and time. Below was my try at the problem. I thought a macro would work that iterates through the file names and creates matching data set names, but apparently it does not. Also, to make it more efficient the if/else if statements I think can be replaced by a DO loop but I could not figure it out. Help is much appreciated!
%macro loop;
%do i = 11 %to 13;
%do j = 01 %to 12;
%let year = i;
%let month = j;
data _&year&month ;
infile "&path\hr_pit_&year..&month..txt" firstobs=27;
length Time $ 4 Month $ 3 Day $ 2 Year $ 4 temp 3;
input time $ Month $ 10-13 Day Year temp 32-34;
Date = Day||Month||Year;
if time = '12AM' then time = 2400;
else if time = '1AM ' then time = 100;
else if time = '2AM ' then time = 200;
else if time = '3AM ' then time = 300;
else if time = '4AM ' then time = 400;
else if time = '5AM ' then time = 500;
else if time = '6AM ' then time = 600;
else if time = '7AM ' then time = 700;
else if time = '8AM ' then time = 800;
else if time = '9AM ' then time = 900;
else if time = '10AM' then time = 1000;
else if time = '11AM' then time = 1100;
else if time = '12PM' then time = 1200;
else if time = '1PM ' then time = 1300;
else if time = '2PM ' then time = 1400;
else if time = '3PM ' then time = 1500;
else if time = '4PM ' then time = 1600;
else if time = '5PM ' then time = 1700;
else if time = '6PM ' then time = 1800;
else if time = '7PM ' then time = 1900;
else if time = '8PM ' then time = 2000;
else if time = '9PM ' then time = 2100;
else if time = '10PM' then time = 2200;
else if time = '11PM' then time = 2300;
_time = input(time,4.);
time = _time;
drop month day year;
run;
%end;
%end;
%mend;
%loop; run:
In case anyone is wondering this is how a typical txt file looks: http://www.erh.noaa.gov/pbz/hourlywx/hr_pit_13.01
Here is a list of txt files in the same shape and form:
http://www.erh.noaa.gov/pbz/hourlyclimate.htm
First fixes in:
%let year = &i;
%let month = %sysfunc(putn(&j, z2.));
to use macro variables and add leading zero to month.
The rest of changes is just dealing with AM/PM.
Also the Date is now numeric.
Full code:
%macro loop;
%do i = 11 %to 13;
%do j = 1 %to 12;
%let year = &i;
%let month = %sysfunc(putn(&j, z2.));
data _&year&month ;
length Date 5 _Time $4 Time 8 Month $3 Day $2 Year $4 temp 3;
format Date DATE9.;
infile "&path\hr_pit_&year..&month..txt" firstobs=27;
input _time $ Month $ 10-13 Day Year temp 32-34;
_time = right(_time);
Date = input(Day||Month||Year, date9.);
if _time = '12AM' or (_time ne '12PM' and index(_time, 'PM') > 1 )
then time=input(_time, 2.) + 12;
else time=input(_time, 2.);
time = time * 100;
drop month day year;
run;
/* gather all data in one table */
proc append base=work.all_data data=work._&year&month;
run;
%end;
%end;
%mend;
proc sql;
drop table work.all_data;
quit;
%let path=E:;
%loop;
Sounds like the best answer may be to read them all into one dataset and then merge them to the final dataset from there. I think you also are served better by using a real time value, rather than 100-2400 (and an inconsistent 2400, that really should be 000 if you're doing that) - then you can just use input.
Anyway, if you just read the text files in like so:
data my_text_files;
infile "c:\mydirectory\*.txt" lrecl=whatever eov=eovmark;
*firstobs=27 is only respected for the first file - so we have to track with eovmark;
if eovmark then do;
eovmark=0;
linecounter=0;
end;
linecounter+1;
if linecounter ge 27 then do;
input (input statement);
(any other code you want to execute here);
output;
end;
run;
Then merge by (whatever). If you need to know some information about the filename you can use the filename option to get access to that in the infile statement.